This project must be written in C and must compile and run on one of the RHEL boxes in the berry patch (click here for a list of available machines).
You will create a basic HTTP/0.9 webserver. HTTP/0.9 is the first version of HTTP from 1991. This simple protocol allowed web clients (called browsers) to request documents formatted in HTML from a server one-at-a-time. HTTP requires a reliable connection, and so is implemented as a layer over TCP. Your server will support multiple simultaneous clients using TCP sockets and the fork() mechanism from your UNIX/Linux OS.
Your server will also work with HTTP/1.0 clients and you will be able to test using a conventional web browser like Firefox. You will have an extra credit opportunity at the end of this assignment to implement some of HTTP/1.0 and allow images as well as HTML to be loaded from your server.
You will employ socket I/O, file I/O, and set up a socket to listen for incoming connections. You will also use the fork() system call. We assume that you covered file I/O in a previous class.
Setting up a socket to listen for connections and how to use fork() is described thoroughly in Using TCP Through Sockets by David Mazières, Frank Dabek, and Eric Petererson (section 3.4).
Your webserver will use HTTP/0.9 to communicate with clients. HTTP/0.9 is a request/response protocol, where a single TCP connection handles exactly one request/response pair. You have seen an example of an HTTP/0.9 request in Project 1: "GET /". The format of an HTTP/0.9 request is as follows:
GET document-name
Here are some examples of HTTP/0.9 requests:
GET /index.html GET / GET /foo/bar/baz
There aren't many modern web clients that actually send HTTP/0.9 requests. Luckily, HTTP/1.0 was designed to be backwards compatible with HTTP/0.9. For your server to service HTTP/1.0 requests, you should simply ignore any part of the request that appears after the verb GET and the document name. For example, here is what a request from Firefox running in HTTP/1.0 mode might look like:
GET /foo/bar/baz HTTP/1.0 Host: localhost:8080 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.6) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive
Your webserver should only look at the first two words in this request, ignoring the rest; i.e.:
GET /foo/bar/baz
Your webserver serves the contents of text files to clients. These files will be stored on the local file system of your webserver. In order to send the contents of these files, you must use file I/O to read them, then network socket I/O to send the contents to the client.
The document names requested by clients are absolute paths which you must transform to be relative to the document root. For example, if your document root is /home/ross/.www, then the document name /foo/bar/baz/index.html will be found in the file /home/ross/.www/foo/bar/baz/index.html.
If the client requests a directory instead of a file name, you should supply the file index.html inside that directory by default. For example, assume that the document root is:
/home/ross/.www
And that the following file exists:
/home/ross/.www/foo/bar/baz/index.html
If the client makes the following request:
GET /foo/bar/baz
Then your webserver should recognize that /home/ross/.www/foo/bar/baz is a directory, and return in the response the contents of the following file:
/home/ross/.www/foo/bar/baz/index.html
Your webserver does not need to handle binary data such as images, just plain text (as this is all that was specified as part of HTTP/0.9). You may serve any plain text file verbatim, do not worry about the distinction between HTML and plain text made by the HTTP/0.9 standard.
If the client requests a file that does not exist, or submits a malformed request, then you must return a well-formed (but simple) HTML page to the client containing a customized error message specifying what went wrong.
fork() is a system call which spawns a second process that is identical to the current one, except that the new one has a new process ID. After the call to fork(), both processes continue from the same point in the program. The original process is referred to as the parent, and the copy is referred to as the child.
In the parent process, fork() returns the ID of the child process. In the child process, fork() returns 0. Thus, the return value of fork() can be used to differentiate between the parent and child processes. The usual pattern looks something like this:
int pid;
pid = fork();
if(pid > 0) {
/* Parent process */
} else if(pid == 0) {
/* Child process */
} else {
/* Error; check errno */
}
You will use fork() to spawn a child process to handle each incoming request. This way, your server can handle multiple requests simultaneously. Because fork() spawns child processes, you do not have to worry about coordinating access to shared memory, the parent and child will not share memory.
There are examples of using fork() in the Using TCP Through Sockets handout.
Your server must use C file I/O to serve real files on your local file system to web clients.
File and network socket I/O should both be done using fixed-size buffers in memory. This means that you may not allocate memory based on the size of an incoming request or the size of a file that will be sent in a response. This is similar to how you echoed the response from a server using a fixed-size buffer in the program sc in Project 1.
Your server must be invoked on the command line like this:
./webserver document-root port
Examples:
./webserver /home/ross/.www 8080 ./webserver /home/ross/proj/cs146a/project2/test 8080
You may assume that your webserver is always quit by sending the interrupt signal (pressing ctrl-c).
You can test with your program sc from Project 1, or with the program telnet. Here is an example using sc:
./sc localhost 8080 "GET /foo/bar/baz"
You can also test with Firefox, although you will have to configure it to use HTTP/1.0 by carrying out the following steps:
Don't forget to change the HTTP version back to 1.1 when you are done testing.
Finally, you might find it convenient to test with lynx, a text-mode browser commonly found on Linux. lynx is a simple browser that doesn't try to display images, so it should work well with your server.
As you know, TCP allows a server to support multiple services using the port abstraction. The port number is an identifier that is transmitted to the server when initiating a TCP connection. The server uses the port number to determine which application should service the connection. Many port numbers are reserved by convention for particular applications. If you are curious, you can review a comprehensive list of assigned port numbers managed by the IANA.
It turns out that port numbers below 1024 are inaccessible to ordinary user programs. In order to test your webserver, instruct it to listen to a higher port number. Port 8080 is a user-accessible port often used for HTTP.
You may find the following man pages useful (execute these commands at a Linux/UNIX command line):
man 2 open man 2 close man 2 read man 2 write man 2 stat
The Using TCP Through Sockets handout shows the header files that need to be included with #include needed for a socket server. You can find the headers for file I/O by consulting the man pages for file I/O-related functions.
You may want to break the project down into smaller chunks. Besides employing modularity to divide the assignment into separable components, you might also want to implement a simple test server to make sure that your TCP handling and use of fork() work correctly before adding support for HTTP. One idea is to first implement an echo server. An echo server accepts TCP connections, forks to handle them, and sends as a response an exact copy of the request. You can test your echo server using telnet.
The specification above does not prevent clients from requesting resources outside of the document root. For example, if the document root is /home/ross/.www, and the client sends the request GET /../cs146a/grades/project2/mallory.txt, then a student Mallory might be able to learn their Project 2 grade, even though this file was not intended to be accessible via the web.
Earn 5% extra credit for preventing your webserver from serving files outside of the document root.
You can implement a small part of HTTP/1.0 and give your webserver the ability to serve binary data like images. You will need to consider what headers to send in your response and also how to read binary files (as opposed to plain text files).
Earn 5% extra credit for serving images from your webserver.
Submission instructions appear in the FAQ.