cs146a Project 3: Asynchronous Caching Web Proxy

This project is optional. See the page for assignment 10 for details.

Brandeis University, Fall 2007

Preliminary Design : 2007-11-30 before 11:59 PM

Implementation Due : 2007-12-06 before 11:59 PM

Project Paper Due : 2007-12-07 before 11:59 PM

NO EXTENSIONS WHATSOEVER WILL BE GIVEN!

This project must be written in C and must compile and run on one of the RHEL boxes in the berry patch (click here for a list of available machines).

You may work in groups of up to 3 if you wish.

Introduction

In project 2, you made a synchronous web proxy which used blocking I/O. For this project, you will employ non-blocking I/O to make a fully asynchronous web proxy. Optionally, you will also use local storage on your proxy to improve performance with caching. You will have to make design decisions regarding what features to include and what tradeoffs to make between conflicting design criteria. In addition to an implementation, you will write a preliminary design document (due a week before the implementation) and a final design document.

Please refer to Project 2 for a refresher on Proxy terminology, and how to set up a web browser to use a web proxy. You will also want to borrow the parser from your Project 2 implementation.

Project Requirements

Your proxy will work with the same useful subset of the HTTP/1.0 spec as the proxy you wrote for Project 2 (please see the Project 2 requirements). This project has the following new requirements:

Requirements

Extra Credit

Asynchronous I/O

Your web proxy will tolerate many simultaneous requests without employing multiple threads or processes (e.g., you may not use fork). Additionally, your proxy should handle clients and servers which may crash, i.e., your web proxy must not hang or leak memory because a client or server refuses to read or write data on its connection, or tries to read or write more data than you expect.

How to Achieve it?

Using TCP through Sockets provides an example of how to set up asynchronous I/O. You will need to follow this example carefully to set your sockets up correctly.

You will employ an event loop which uses the select system call to wait for I/O on a set of sockets. You will have pending I/Os for incoming connections from clients, outgoing connections to remote servers, and to disk for (if you choose to use a web cache). Your first task will be to design a data structure which allows you to track all these different activities (and their associated file descriptors) and make sure that they read from and write to the correct buffer.

Event loops can be tricky. Be sure to work slowly and carefully, regularly testing the behavior of your proxy. Be sure to test with multiple simultaneous requests to ensure that buffers are not overwritten (an easy way to test many simultaneous requests is with search results from images.google.com, remember that each image is retrieved with a separate request).

Where to Start?

Write your design document carefully, making sure to detail the specific algorithms that you will use. If your preliminary design document is detailed enough, we may be able to provide helpful hints before the due date of your implementation.

Begin with the code for your blocking proxy, and start carefully modifying it to use asynchronous I/O. You will find programming and debugging much easier if you implement this feature completely before attempting to include caching.

Web Cache

Making your web proxy cache resources is optional. A completely functioning web cache is worth up to 20% extra credit on this assignment, but we cannot assign extra credit if your asynchronous proxy does not work, so be sure to do the extra credit only after you have completed the rest of the assignment.

If you choose to implement caching, then your web proxy will cache web resources (web pages, images, etc.) locally so that subsequent requests for those resources can be fulfilled without actually requesting them from the remote server. This can greatly improve the proxy's performance.

You must decide how you will keep your web cache coherent. Clients dislike stale pages; however, a stale page may still have some value, so there is a trade-off between coherency and performance. Document your decision and why you made it in the preliminary design document.

Some examples of cache coherency in a web proxy: cache resources for a fixed period of time before allowing them to be refreshed from the remote server; use the If-Modified-Since header to conditionally request a resource from the remote server; or, respect the Expires header sent by the remote server.

Search RFC 1945 for any warnings about proxy behavior.

Collaboration

This is designed to be a group project (up to 3 members per group). The materials you turn in must be your group's; but, you are otherwise free (and encouraged) to discuss the project with other members of the class. Please acknowledge help you receive from students who are not in your group in your final design document.

How to Hand In

Follow the instructions in the FAQ.

Questions?

If you have questions about the assignment, please do not hesitate to contact the TA or Professor ASAP.

One last piece of advice: Start the project as soon as you possibly can, do not wait until the last week to begin!

References