Compressed Bloom Filters and Compressing the Web Graph

Michael Mitzenmacher
Harvard University

Thursday, September 6, Volen 101, 2:00 p.m.

Following the theme of compression, I plan to compress two talks into one (time permitting). Both talks are meant for general audiences.

Compressed Bloom Filters (in PODC 01) A Bloom filter is a simple space-efficient randomized data structure for representing a set that supports approximate membership queries. Bloom filters are increasing being used in distributed systems; for example, in a distributed Web cache, proxies do not need to share exact the URL lists of their caches, but instead periodically broadcast Bloom filters respresenting their cache. We introduce compressed Bloom filters, which improve on Bloom filters in distributed settings.

Compressing the Web Graph (joint work with Micah Adler, in the 2001 Data Compression Conference) We consider the problem of compressing graphs of the link structure of the World Wide Web. We provide efficient algorithms for such compression that are motivated by recently proposed random graph models for describing the Web. We also provide hardness results demonstrating limitations on natural extensions of our approach.

Host: Marty Cohn