Brandeis University, Fall 2007
Due : 2007-10-26 before 11:59 PM
This project must be written in C and must compile and run on one of the RHEL boxes in the berry patch (click here for a list of available machines).
You will create a simple web proxy that forwards requests from a client (your browser) to remote web servers, and then forwards the response back to the client. Your web proxy will be single-threaded and so can only service one request at a time.
From the perspective of a user of your proxy (i.e., you while you are testing), the web browser is the client and the web proxy is the server. From the perspective of your web proxy, however, the proxy is the client and the remote web server is the server. In this way, your web proxy will behave as both client and server.
Web browsers can be configured to use a web proxy; but, the configuration is a bit different for different web browsers. You may find testing with links to be convenient. You can fetch webpage source with links by entering the following command:
$ http_proxy=http://localhost:8888/ links -source http://www.brandeis.edu
This assumes of course that your proxy has been started on port 8888 (this will be explained below). Also, if your shell isn't Bash (but it probably is, type echo $SHELL to find out), you may need to specify the proxy environment variable differently.
Because you must ensure that images work too, you will want to test in a graphical browser as well. To use a proxy in Firefox, do the following:
Be aware that since your proxy will not implement all of HTTP/1.0 some sites and activities will not work correctly, but GET-ing HTML, Javascript, and image data should all work.
Like your previous project (project 1), your proxy will use the socket abstraction for building network applications. Unlike your previous project, you will be responsible for both client- and server-side socket programming.
Your proxy will work with a useful subset of the HTTP/1.0 spec (see RFC 1945). Specifically, your proxy must be able to handle the following:
Please read these requirements carefully!
GET /path/to/resourceNote that this command only needs to be terminated by a single carriage-return (\r\n).
GET /path/to/resource HTTP/1.0 Host: www.example.com Accept: text/html text/plain Accept-Encoding: gzip, compress Accept-Language: enNote that this command must be terminated by two carriage returns (\r\n\r\n). These are just a few examples of HTTP headers. You should pass all headers through the proxy verbatim as part of the request you make to the remote web server.
Your proxy does not need to do the following:
When a browser sends an HTTP/1.0 request to a proxy server, it sends the same format request that it would send directly to a web server, except that it includes the domain name in the resource locator.
Here is an example of an HTTP/1.0 request sent directly to a web server:
GET / HTTP/1.0 Host: www.brandeis.edu Accept: text/html, text/plain Accept-Encoding: gzip, compress Accept-Language: en User-Agent: Lynx/2.8.4rel.1 libwww-FM/2.14
And here is an example of how the browser requests the same resource through a proxy server:
GET http://www.brandeis.edu/ HTTP/1.0 Host: www.brandeis.edu Accept: text/html, text/plain Accept-Encoding: gzip, compress Accept-Language: en User-Agent: Lynx/2.8.4rel.1 libwww-FM/2.14
As you can see, the only difference is in the first line, the full URL including the protocol and hostname are given.
It is up to you to parse the first line of any HTTP/1.0 request sent to your proxy server and
The results of this parse will be a rewritten request that can be sent to the remote web server, the hostname (or IP address) of the remote web server, and the port to use to connect to the remote web server. These results are sufficient to forward the request and proxy the response back to the client.
Your client program will be named proxy, and will be invoked on the command line like this:
$ ./proxy port
For example:
$ ./proxy 8888
We recommend that you develop on a Linux machine. You can log into any of the Brandeis CS department public machines to work, or you can work in the berry patch. You are also free to develop on your own computer, but be aware that in order to receive credit your assignment must compile and run on one of the RHEL machines in the berry patch.
Cannot bind to TCP port 8888 (Address already in use)If you occurs you can just chose another port.
signal(SIGPIPE, SIG_IGN);This is discussed on page 18 of the TCP Through Sockets handout.
You may earn 10% extra credit on this assignment by augmenting your code to use fork() to service multiple requests at once. In order to earn extra credit you must do the following:
Your next assignment will give you an opportunity to program with concurrency, so do not feel that you will be missing out on concepts by not completing the extra credit. The extra credit is intended for students who really enjoy programming and are looking for a greater challenge.
Follow the instructions in the FAQ.