The Internet and HTTP
Ross Shaull, Brandeis University
In this lecture we take a whirlwind tour of the Internet and a
fundamental building block of the web, HTTP.
The Internet as Cartoon
A better way for us to start visualizing the Internet is as an
opaque cloud to which hosts connect.
We will spend this lecture looking more closely at this picture...
- Looking inside the cloud
- Looking inside the client
- Looking inside the server (just a little)
Core Internet Idea(l)s
- The success of the Internet is due in large part to:
- openness,
- internetworking, and
- end-to-end principle
- Openness means
-
communication standards are freely available and can be used
to develop new protocols (e.g., the development of the Web)
-
many of the core building blocks of the web (HTML, CSS,
JavaScript) are visible to end users, so that they can learn
from example
-
Internetworking means that many little networks
are connected to each other, and is the key of the rapid growth
and resilience of the Internet (e.g.,
many ISPs
connected together)
-
End-to-end means that the value of the network
is at the edges (that's you!); so, the Internet cloud is
designed to be transparent
Switching: Connecting One Host to Another
- Setting up a path between two hosts
- A switch is a device that manages the connections of devices
along each path
- Traditional telephone networks use circuit switching
- Early exchanges made complete electrical circuits
- Modern switches can set up virtual circuits without moving wires
around
Packet Switching
- Leonard Kleinrock's thesis (1961)
- Break up a digital message into small datagrams
-
When you hear people someone talking about packets on
the Internet, this is what they are talking about
- Connections created and bandwidth allocated as-needed
- Sometimes results in collisions
- To visualize a collision, imagine people talking on
a party
line. Two people have a good protocol for avoiding
collisions in a phone call (conversational structure). A third
person may jump in, introducing a collision (two people talking
at once). To resolve this, the two persons speaking at the same
time may wait some small amount of time then retry their
statement, waiting longer each time they talk at the same
time.
-
This is essentially the same technique used for resolving
collisions on shared Ethernet, called exponential
backoff.
-
Collisions are not so common on wired networks anymore, but they
are on wireless networks. You may have experienced slowdown on
busy wireless networks... that's because of congestion!
Ethernet
- Application of packet switching
- Not first packet signaling technology
- Metcalfe developed AlohaNet, a precursor to Ethernet, in 1972
- He pushed for vendor-neutral Ethernet standards in 1979
- Very important for open-ness of early Internet development
- Was also profitable
- Metcalfe started 3com
- Shared Media?
-
Collisions happen because multiple machines force charge on
the same wire
-
Modern Ethernet is typically switched with no shared media,
each host has its own dedicated wire to a switch
Addressing for Ethernet: IP
- IP Addresses (a term in fairly common use)
- Internet Protocol Address
- IPv4
- 4 "octets" of 8 bits each
- Each octent roughly defines a subnet that contains hosts (or
other networks)
- Businesses and Institutions are sometimes assigned groups of
Addresses (classful address allocation)
- Originally, first octet determined network, the rest
were for hosts inside that network.
- Later, divided on octet boundaries into class A, B, and
C networks
- Tool break: ping, nslookup
- A Single IP Address
- Class C
- Example: 129.64.99.*
- 2^7 * 2^7 * 2^7 = 2,097,252 possible networks
- 2^8 - 2 = 254 possible hosts in domain
- Class B
- Example: 129.64.*.*
- 2^7 * 2^7 = 16,384 possible networks
- 2^8 * 2^8 - 2 = 65,534 possible hosts in domain
- Class A
- Example: 129.*.*.*
- 2^7 - 2 = 126 possible networks
- 2^8 * 2^8 * 2^8 - 2 = 16,777,214 possible hosts in domain
Internet Address Book: DNS
DNS stands for Domain Name Service.
- Remembering IP addresses is hard
- IP addresses can change
- DNS helps remediate these problems
-
DNS has protocols for synchronizing and updating names when the
mapping from name to IP address changes
- We all have to agree on the name!
- Who controls it?
- Naming is an extremely important computer science concept
- Why is naming so important?
- Non-computer science applications of naming?
Domains and Names
- Remember that IP addresses are divided into subnets?
-
Names have divisions as well (which we read in reverse order):
www.unet.brandeis.edu
-
edu: TLD, indicates educational institution
-
brandeis: part of the identification for
all hosts that are part of Brandeis (129.64.0.0)
-
unet: a subnet of Brandeis that serves
internal functions (129.64.99.0)
-
www: the unet webserver
(129.64.99.132)
-
Domains can be virtualized behind a single IP address (the
Apache web server calls this name-based virtual hosting).
-
A common use for this is to make
www.company.com and
company.com point to the same web server.
-
Another use is if you only purchased one IP address but want
to run multiple sites; you can use this tactic to run
multiple web sites from your home Internet connection
-
Later we'll look at the part of the web communication
protocol that allows this to work
TLDs
- com
- edu
- org
- net
- gov
-
countries have their own (co.uk is equivalent of com in the UK)
The US has a country code (.us) too, but it's not typically used
(some government sites use it).
Sometimes countries sell their domains to companies,
like Tuvalu (who
knows what their TLD is?).
Some DNS Details
- DNS uses caching, hierarchies, and indirection
- Domains run DNS to manage their internal names
- Talking to a DNS near you makes lookups faster
- You could run your own DNS server at home
TCP/IP
- TCP stands for Transmission Control Protocol
-
TCP is a reliable Internet packet transportation protocol
-
TCP/IP is TCP implemented for IP addressing; remember:
- IP is a way of organizing networks
- You can use DNS to translate friendly names to IP addresses
-
This means that temporary transmission problems won't corrupt
messages (like downloading a web page)
-
Remember the end-to-end principle? TCP/IP provides an end-to-end
abstraction where senders and receivers can essentially ignore
all the stuff that goes on in transporting their packets from
point A to point B
The Protocol Stack
- Communication on the Web can be divided up into layers
- Each layer is encapsulated in the previous layer
HTTP
-
HTTP stands for "HyperText Transfer Protocol"
-
HyperText is what HTML describes (remember: HTML stands for
HyperText Markup Language)
-
HTTP is a (relatively) simple protocol encapsulated inside
TCP/IP (so HTTP is in the application layer)
-
HTTP is structured a series of requests
from web clients and responses
from web servers
Request and Response
Here is an abbreviated example of the http protocol communication
between your computer and the facebook.com web server:
-
You type "facebook.com" into the location bar of
your web browser
-
Your web browser consults DNS to find the IP address for facebook.com
-
Your web browser connects to facebook.com's IP address and
sends an HTTP GET request for
"/", the standard root for websites.
-
The facebook.com webserver processes the request and sends an
HTTP response message containing the HTML of the facebook.com homepage
-
Your browser parses the HTML in the response and displays the
facebook.com homepage
HTTP Requests
- There are different types of HTTP requests. The two you need
to know are:
-
GET requests that the server provide you a
resources at the specified
URL
GET /images/welcome/welcome_3.gif HTTP/1.1
Host: static.ak.fbcdn.net
Accept: image/png,image/*
Referer: http://www.facebook.com/
-
Notice the "Host" headers. Modern web browsers send
them so that named virtual hosting works.
-
POST requests that the server accept and
process some data that you want to send; you still provide a URL,
since that is how you specify which resource should handle the
data
POST /login.php HTTP/1.1
Host: login.facebook.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 14
email=example@gmail.com&pass=unicorns
What do you notice that might seem a bit odd?
-
This is why your password isn't safe if you don't have an
encrypted connection (your web browser tells you if your session
is encrypted with the little lock icon).
GET can send data too
- GET can also send data to the server
- It is appended to the request URL:
- http://www.example.com/?name=Ross&status=napping
-
The part after the "?" is called the query string
-
The query string is composed of pairs of names and values
separated by an equals sign, just POST data.
-
You can use query strings in your every day life; who knows
about the fmt=18 youtube trick?
HTTP Responses
Always a status line with a status code, followed by headers,
followed by the content. Here is a request and a response:
Another response header: redirect with Found
Sometimes when you visit a site, the URL in your location bar changes. The
reason is beacuse the site sent you a redirect. For
example, facebook.com redirects
to www.facebook.com.
Another response header: caching with Not Modified
Web browsers cache content locally. This is why hitting the back
button is fast.
If you refresh a page, the web server may choose to tell you that
content hasn't changed, in which case your browser will know to
use the local cache instead of downloading the same content again.
Many Requests per Page
Even for a very basic web page, your browser will make many
requests to the server! This is why parts of a web page sometimes
seem to load slower than others, and also why you can start
reading text before images or movies show up.
For example, here are some of the additional HTTP GET requests
that your browser makes when it starts rendering the facebook
homepage:
GET /rsrc.php/98481/css/welcome.css HTTP/1.1
GET /rsrc.php/101731/css/dialogpro.css HTTP/1.1
GET /images/welcome/welcome_3.gif HTTP/1.1
-
Can you think of a reason why you might want to put your CSS in
a separate file instead of embedding it into the HTML of your
web page?
-
The web server can send a 304 Not Modified response for CSS
files even if the HTML page changes frequently, saving some
bandwidth costs.
What is a Web Browser?
A web browser is:
- A program for creating HTTP GET and POST requests
- A program for handling HTTP responses
- A program for displaying HTML
Another Tool Break
Let's use a program called telnet to act like a web browser
Practical Principles
The end-to-end principle says that the usefulness of the network
is at its edges. HTTP and TCP/IP together form the Web, where the
edges are your web browser and a web server.
The openness principle means that much of the
code that makes the Internet possible is there for you to look at. A
good way to learn HTML and CSS (and JavaScript) is to view
source.
What's Next?
We start putting our knowledge of HTML, CSS, and HTTP to work to
create forms and process input data