cs146a Project 2: Blocking Web Proxy

Brandeis University, Fall 2007

Due : 2007-10-26 before 11:59 PM

This project must be written in C and must compile and run on one of the RHEL boxes in the berry patch (click here for a list of available machines).

Introduction

You will create a simple web proxy that forwards requests from a client (your browser) to remote web servers, and then forwards the response back to the client. Your web proxy will be single-threaded and so can only service one request at a time.

Both Server and Client

From the perspective of a user of your proxy (i.e., you while you are testing), the web browser is the client and the web proxy is the server. From the perspective of your web proxy, however, the proxy is the client and the remote web server is the server. In this way, your web proxy will behave as both client and server.

Setting up a Proxy

Web browsers can be configured to use a web proxy; but, the configuration is a bit different for different web browsers. You may find testing with links to be convenient. You can fetch webpage source with links by entering the following command:

$ http_proxy=http://localhost:8888/ links -source http://www.brandeis.edu

This assumes of course that your proxy has been started on port 8888 (this will be explained below). Also, if your shell isn't Bash (but it probably is, type echo $SHELL to find out), you may need to specify the proxy environment variable differently.

Because you must ensure that images work too, you will want to test in a graphical browser as well. To use a proxy in Firefox, do the following:

  1. Visit the page about:config
  2. Scroll down until you see the Preference Names beginning with "network" (the Filter box can help)
  3. Make the following changes to your settings:
  4. Go to the Firefox preferences, and under the Advanced tab click the connections settings button and change the proxy to manual config. Set the server to localhost and the port to whatever you start your proxy on (e.g., 8888).

Be aware that since your proxy will not implement all of HTTP/1.0 some sites and activities will not work correctly, but GET-ing HTML, Javascript, and image data should all work.

Network Connections

Like your previous project (project 1), your proxy will use the socket abstraction for building network applications. Unlike your previous project, you will be responsible for both client- and server-side socket programming.

Resources

Project Requirements

Proxy Functionality

Your proxy will work with a useful subset of the HTTP/1.0 spec (see RFC 1945). Specifically, your proxy must be able to handle the following:

Please read these requirements carefully!

Non-Requirements

Your proxy does not need to do the following:

Parsing the Request

When a browser sends an HTTP/1.0 request to a proxy server, it sends the same format request that it would send directly to a web server, except that it includes the domain name in the resource locator.

Here is an example of an HTTP/1.0 request sent directly to a web server:

GET / HTTP/1.0
Host: www.brandeis.edu
Accept: text/html, text/plain
Accept-Encoding: gzip, compress
Accept-Language: en
User-Agent: Lynx/2.8.4rel.1 libwww-FM/2.14

And here is an example of how the browser requests the same resource through a proxy server:

GET http://www.brandeis.edu/ HTTP/1.0
Host: www.brandeis.edu
Accept: text/html, text/plain
Accept-Encoding: gzip, compress
Accept-Language: en
User-Agent: Lynx/2.8.4rel.1 libwww-FM/2.14

As you can see, the only difference is in the first line, the full URL including the protocol and hostname are given.

It is up to you to parse the first line of any HTTP/1.0 request sent to your proxy server and

  1. Find out the host name (and port if given, remember that in a URL the port is specified in the form "http://www.example.com:8080"; i.e., separated from the domain name by a colon); and,
  2. Rewrite the request to only include the path part, not the protocol, domain name, or port (essentially, you need to follow the rewriting example shown above).

The results of this parse will be a rewritten request that can be sent to the remote web server, the hostname (or IP address) of the remote web server, and the port to use to connect to the remote web server. These results are sufficient to forward the request and proxy the response back to the client.

Invocation

Your client program will be named proxy, and will be invoked on the command line like this:

$ ./proxy port

For example:

$ ./proxy 8888

Where it Must Run

We recommend that you develop on a Linux machine. You can log into any of the Brandeis CS department public machines to work, or you can work in the berry patch. You are also free to develop on your own computer, but be aware that in order to receive credit your assignment must compile and run on one of the RHEL machines in the berry patch.

Tips and Suggestions

Extra Credit

You may earn 10% extra credit on this assignment by augmenting your code to use fork() to service multiple requests at once. In order to earn extra credit you must do the following:

  1. Submit both versions of your code: the single-process version and the multi-process version that uses fork
  2. Demonstrate that your multi-process version maintains the correctness properties of your single-process version
  3. Submit a short experiment demonstrating the improved speed of your proxy when using multiple processes

Your next assignment will give you an opportunity to program with concurrency, so do not feel that you will be missing out on concepts by not completing the extra credit. The extra credit is intended for students who really enjoy programming and are looking for a greater challenge.

Collaboration

How to Hand In

Follow the instructions in the FAQ.