cs146a Project 3: Extra Credit

Due: 2011-12-13 (Tuesday) by 11:59pm

This project is extra credit. It will not hurt your grade to not do this project. It will also not hurt your grade to turn in a failed attempt (although it may not help your grade, either; only successful attempts will earn extra credit). Performance tests have a way of exposing bugs that you wouldn't catch otherwise, so although this project does not add any new features to your code, you may have to do some further debugging when experiments fail to run.

Introduction

Performance is an important aspect of systems design and systems development. We can made educated guesses as to how our systems will perform based on performance models, but it's also important to learn how to make real measurements. This process is called benchmarking. In this project, you will measure the performance of your proxy server in an attempt to characterize its performance and isolate the bottleneck where furture implementation efforts could be made to improve its performance.

Step 1: Model the performance of your system

Write down the ways in which the modules of your system impact performance. Enumerate the different kinds of inputs that might expose different performance characteristics of your system. These will be the parameters that you adjust during your experiments. Good parameters include the degree of concurrency in the workload and the size of requests and responses; see if you can think of others.

Step 2: Baseline

Establishing a baseline for the performance of your system is important. A good baseline for your web proxy is the performance of web traffic when no proxy is involved. You can then characterize the impact of your web proxy relative to the baseline.

Note that other points of comparison are also interesting. For example, you could compare the performance of your proxy to the performance of an existing proxy (such as nginx). However, this is a secondary concern to a baseline measurement in this project.

Setting up a web server

You will need to test with a web server. This will allow you to control its performance, and will help avoid getting you in trouble by hammering somebody else's webserver while you benchmark. This will also help to keep network performance constant; you can run all the experiments in the Vertica lounge on the internal network and avoid the messiness of performance over the wider Internet.

I recommend you use Apache. Apache is already installed on the machines in the Vertica lounge. You can run Apache yourself using a custom configuration file. Here is a config file (usually called httpd.conf) you can start with. Notice that it listens on port 3000. You can change this if you want, keeping in mind that it has to be a high-numbered port since you don't have root on the Vertica lounge machines.

Listen 3000
ServerRoot /path/to/your/homedir/benchmark
ErrorLog /path/to/your/homedir/tmp/apache.log
PidFile /path/to/your/homedir/tmp/apache.pid
LockFile /tmp/apache.lock

DocumentRoot "/tmp/username-docroot"

You would replace "/path/to/your/homedir" with an absolute path to your home directory, and "username" with your username. You would then create the document root, which is the place where Apache will read files from (if you request /foo/bar/baz/index.html from the web server, then Apache will translate that path to be relative to the docroot, e.g., /tmp/username-docroot/foo/bar/baz/index.html. Note that we put the docroot on the hard disk rather than in your home directory (which is accessed over NFS). You must create the docroot or else Apache will complain when you try to start it.

mkdir /tmp/${USER}-docroot

You'll also need a tmp directory in your homedir.

mkdir $HOME/tmp

To start Apache, make the docroot and put httpd.conf somewhere, and then invoke the Apache binary (called httpd on the machines you will be working on). Notice that you must provide an absolute path for the configuration file.

/usr/sbin/httpd -f ${HOME}/cs146a/project3/httpd.conf -k start

You can make sure it's started with ps.

ps -Al | grep httpd

You can also check the log for errors.

tail $HOME/tmp/apache.log

Always, always remember to shut down Apache when you are done. If you forget, the network administrator will be cross. Do that with this command:

/usr/sbin/httpd -f /home/l/rshaull/tmp/benchmark/httpd.conf -k stop

Get some files

Download or create some files of varying sizes (I recommend at least a tiny text file and a large image). Put them in your docroot. Test that everything is working by requesting them through your web browser (while Apache is running). Try to do this from a different machine in the Vertica lounge.

Run Apache bench

You can use the Apache bench program (called ab) to test the performance of Apache. Here you can read about how to invoke it:

man ab

The basic idea is that you want to test workloads that meet the various parameters you wrote down in step 1. You can construct these workloads from the files that you choose to request from the webserver and the workload characteristics you ask ab to create. Here is how to request an image file named image.jpg a total of 1000 times with a concurrency degree of 10, assuming your webserver is running on canticle.cs.brandeis.edu:3000:

ab -n 1000 -c 2 http://canticle.cs.brandeis.edu:3000/image.jpg

It is up to you to interpret the results from ab. It will give you measures of throughput and latency. Look for tradeoffs between the two when you run with different parameters. Also keep in mind that you will be limited by hardware and network bandwidth.

Step 3: Measure the proxy server

Now it's time to compare the performance of the workloads when run directly against the webserver to the performance when run via the proxy server. Start your proxy server and then run Apache bench. Here is how to invoke ab so that it uses a proxy server:

ab -n 1000 -c 2 -X localhost:8888 http://canticle.cs.brandeis.edu:3000/image.jpg

Of course, this assumes that the proxy and ab are running on the same machine. If this is not the case, put the name of the host where your proxy is running in place of "localhost" in the option passed to the -X flag.

Repeat the measures you ran in Step 2, only this time run them via your proxy. Try disabling threads in your proxy and run the tests again.

Step 4: Analyze results

Plot your results and see how performance varies depending the different parameters in your system. See if this matches the performance model your thought about in Step 1. Speculate on the bottleneck that could be addressed in your system to improve performance.