Assignment 2 - CS120a Due Wednesday, February 23, 2:00 PM (Bring to Volen, 137) --------------------------------------------------------- 1. Consider the 10 requests to the Brandeis CS web server below. These requests were extracted from the web access log and supplemented with an additional parameter that indicates the service demand time required of the disk to complete each request. No. Requester Request File Size (Bytes) D_Disk (msec) ----------------------------------------------------------------------------------------------------- 1 129.64.160.68 GET /~tim/Classes/Spr00/CS155/Notes/ HTTP/1.0 1874 12 2 209.210.203.43 GET /~mikeb/images/dreambg.gif HTTP/1.1 15448 68 3 209.210.203.43 GET /~mikeb/images/ciawww.gif HTTP/1.1 35959 200 4 208.222.98.155 GET /~mikeb/images/ciaww.gif HTTP/1.1 78623 346 5 209.185.253.175 GET /~tim/Courses/1997/CS2a/Quizes/quiz17.gif 78479 345 6 209.245.141.164 GET /~suresh/cs11/HW/hw3.html HTTP/1.1 1766 9 7 209.245.141.164 GET /~suresh/cs11/HW/HW3.class HTTP/1.1 2261 15 8 12.79.222.154 GET /~cs21b/files/hw1.html HTTP/1.0 12071 54 9 12.79.222.154 GET /~cs21b/files/submit.html HTTP/1.0 14050 57 10 151.197.17.36 GET /~paulb/CoreLex/corelex.html HTTP/1.1 6198 28 a. Use the clustering algorithm discussed in class to cluster the 10 requests above into 4 clusters (very small, small, medium and large requests), according to their file sizes and disk service demands. Make sure to scale parameter values using their z-scores before running the clustering algorithm. (Recall that the standard deviation for any set of values, V = {v1, ..., vn} with mean value, M is defined as follows: stdev (V) = sqrt (average ({sqr (v1 - M), sqr (v2 - M), ..., sqr (vn - M)})) Indicate which requests belong in which clusters, as well as the average z-score and *raw* values for each parameter. b. Suppose that we know that the response times for very small requests = e (s, io) (s = size, io = num io's) small requests = f (s, io) medium requests = g (s, io) large requests = h (s, io) A query processor must predict the expected response time for a file request of size n bytes that is determined to require m msec of disk service demand. How would you determine whether its expected response time would be e (n, m); f (n, m); g (n, m); or h (n, m)? 2. Suppose requests arrive at a network queue for a T1 Line (1.5 Mbps) at a rate of 2000 packets / second, and that the average length of a packet is 515 bits. What is the expected throughput, response time and average population of the queue? 3. A small business has two outside lines for its telephones. Calls come in at a rate that is slower than the rate at which calls are processed, and yet measurements show that 1 out of every 7 incoming calls (roughly 14%) gets a busy signal. How many outside lines should be added to reduce the number of incoming calls that get a busy signal to 1 in 511 (less than .2%). 4. Consider the DB Server of Example 9.3 of your text, and discussed in class. a. What is the expected response time when there are 50 requests in the system? (Express your answer in msec). b. Which of the CPU or Disk is the bottleneck in the DB server? Justify your answer by considering how independently replacing each resource with a faster version affects the maximum throughput of the system. c. What is the maximum possible throughput (expressed to 4 decimal places) of the system assuming that each request requires 15 msec of CPU time. Explain how you got your answer. d. Assuming the CPU is left as it is, what is the minimum disk speed (measured in transfer rate (KBps)) required to achieve 80% of the throughput you calculated in (c), with as few as 20 requests in the system? For this question, you can assume that each record read requires reading a block (2048 bytes) from disk. *Hint: Using a spreadsheet for this problem is recommended. If you do so, hand in a printout of your spreadsheet.