Brandeis CS 146a
ASSIGNMENT 4: September 15 to September 22, 2009
For Class Tue September 15, 2009
See the schedule for Friday September 11, 2009 in assignment 2.
For Class Friday September 18, 2009
Read "Flash: an Efficient and Portable Web Server",
by Pai, Drischel, and Zwaenopoel (copies of the paper are available in the department office).
This is a very well written paper that considers the impact of
structure on web server performance.
Your assignment includes answering the following questions:
1. Give a specific example where Flash exchanges larger latency for
larger
throughpue. Why is this worthwhile?
2. Why are Flash and SPED close for small data set?
Why does Flash beat SPED xfor large data set?
For Lecture Material:
Read Chapter 6 (Performance) from S&K.
For Discussion, Tuesday, September 22, 2009
Read "MapReduce" (paper #8), by Dean and Ghemawat.
This is a more recent paper than Flash you read for previous class,
and,
unlike the single-node "Flash", "MapReduce"
is concerned with a system consisting of multiple nodes.
The paper
describes a novel high-performance system design developed at Google
for a specialized model of computation.
Your reading assignment questions, therefore, focus on performance.
1. What are the two main reasons to execute the map and reduce functions
in paraller on multiple machines?
2. Give examples of the use of batching and explain the specific
performance benefit achieved.
3. How do the authors evaluate their system performance?
What are "Input", "Output"
and "Shuffle"? How do stragglers impact performance?
4. (optional):
A blogger Alex Barrera describes in general terms a common scalability problem in todays
Internet backend system:
http://alwaysnewmistakes.wordpress.com/2009/07/20/scalability-issues-what-are-they-and-their-repercussions/ .
Which of his problems can be solved by MapReduce and which can not?
Give examples of specific computations and explain.
CS 146a Assignment 3, issued 9/11/09