Jim Storer


CS Department Information
How to Contact Me
Biographical Sketch and Research Interests
Books
Patents
Data Compression Conference
Brandeis Data Compression Group
Other Research
Useful Advice




The Brandeis Computer Science Department

The department home page is: http://www.cs.brandeis.edu

For questions, contact our Department Administrator Myrna Fox:


Contact Information

Prof. James A. Storer
Computer Science Department
Brandeis University
Waltham, MA 02454

phone: 781-736-2714 or 2700

email: storer@brandeis.edu

web: www.cs.brandeis.edu/~storer



Biographical Sketch and Research Interests

James A. Storer received his B.A. in Mathematics and Computer Science from Cornell University in 1975, his M.A. in Computer Science from Princeton University in 1977, and his Ph.D. in Computer Science from Princeton University in 1979. From 1979 to 1981 he was a researcher (MTS) at Bell Laboratories in Murray Hill, New Jersey.

In 1981 he came to Brandeis University, where he served as Chair of the Computer Science Department from 1992 to 2002 and is currently Full Professor of Computer Science and member of the Brandeis Center for Complex Systems.

Dr. Storer's research interests include computer algorithms, data compression and archiving (including text, images, video, and multi-media), data communications, processing of large data sets, text, image, and video processing, parallel computing, machine learning.

He has obtained two U.S. patents, does computing technology and patent related consulting, and has served as Chair of the annual Data Compression Conference (proceedings published by the IEEE Computer Society Press).



Books

Hyperspectral Data Compression

Giovanni Motta, Francesco Rizzo, James A. Storer, Editors

This book provides a survey of recent results in the field of compression of remote sensed 3D data, with a particular interest in hyperspectral imagery. Remote sensed data present special challenges in the acquisition, transmission, analysis, and storage process. Perhaps most significant is the information extraction process. In most cases accurate analysis depends on high quality data, which comes with a price tag: increased data volume. For example, the NASA JPL's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS, http://aviris.jpl.nasa.gov) records the visible and the near infrared spectrum of the reflected light of an area 2 to 12 kilometers wide and several kilometers long (depending on the duration of the flight) into hundreds of non overlapping bands. The resulting data volume typically exceeds 500 Megabytes per flight and it is mainly used for geological mapping, target recognition, and anomaly detection. On the other hand, ultraspectral sounders such as the NASA JPL's Atmospheric Infrared Sounder (AIRS, http://www airs.jpl.nasa.gov), which has recently become a reference in compression studies on this class of data, records thousands of bands covering the infrared spectrum and generates more than 12 Gigabytes of data daily. Chapter 1 addresses compression architecture and reviews and compares compression methods. Chapter 2 through 4 focus on lossless compression (where the decompressed image must be bit for bit identical to the original). Chapter 5 (contributed by the editors) describes a lossless algorithm based on vector quantization with extensions to near lossless and possibly lossy compression for efficient browsing and pure pixel classification. Chapters 6 deals with near lossless compression while Chapter 7 considers lossy techniques constrained by almost perfect classification. Chapters 8 through 12 address lossy compression of hyperspectral imagery, where there is a tradeoff between compression achieved and the quality of the decompressed image. Chapter 13 examines artifacts that can arise from lossy compression.
Springer-Verlag, 2005
(425 pages, 6" x 9", hard-bound)
ISBN: 0-387-28579-2
www.springer.com



An Introduction to Data Structures and Algorithms

James A. Storer

Data structures and algorithms are presented at the undergraduate and first-year graduate level. The 13 chapters are: RAM Model, Lists, Induction and Recursion, Trees, Algorithm Design, Hashing, Heaps, Balanced Trees, Sets Over a Small Universe, Graphs, Strings, Graphs, Discrete Fourier Transform, Parallel Computation. Except for the introduction and bibliographic notes for each chapter, all page breaks have been put in manually and the format corresponds to a lecture style with ideas being presented in "digestible" page-size quantities. Chapters 1 through 4 cover elementary concepts and data structures at a slower pace than the remainder of the book. A more advanced course can essentially start with Chapter 5 (algorithm design). Although it is not clear what parallel architectures will prevail in the future, the last chapter further teaches fundamental concepts in algorithm design by exploring classic models of parallel computation, including the PRAM, generic PRAM simulation, HC/CCC/Butterfly, the mesh, and parallel hardware area-time tradeoffs.

Springer-Verlag, 2002)
(600 pages, 7" x 10", hard-bound)
ISBN 0-8176-4253-6, ISBN 3-7643-4253-6
www.springer.com

More information, including table of contents and errata.



Data Compression: Methods and Theory

James A. Storer

The first two chapters contain introductory material on information and coding theory. The remaining four chapters cover some of my data compression research performed in the period 1977 - 1985 (including substantial material that has not been reported elsewhere). Chapter 3 considers on-line textual substitution methods that employ "learning" heuristics to adapt to changing data characteristics. Chapter 4 considers massively parallel algorithms for on-line methods and their VLSI implementations. Chapter 5 considers off-line methods (including the NP-completeness of certain methods). Chapter 6 addresses program size (Kolmogorov) complexity. The appendices present source code and empirical results.

Computer Science Press (a subsidiary of W. H. Freeman Press), 1988
(419 pages 6" x 9", hard-bound)
ISBN 0-88175-161-8



Image and Text Compression

James A. Storer, Editor

This is an edited volume of papers by researchers in the field; topics include: vector quantization, fractals, optical algorithms, massively parallel hardware, and coding theory. Also included is a 75 page bibliography of data compression research compiled specifically for this book.

Kluwer Academic Press (part of Springer), 1992
(354 pages, 6" x 9", hard-bound)
ISBN 0-7923-9243-4



Proceedings Compression and Complexity of Sequences

Bruno Carpentieri, Alfredo De Santis, Ugo Vaccaro, and James A. Storer, Editors

This is an edited volume of the papers presented at the International Conference on Compression and Complexity of Sequences, held in Positano, Italy in 1997. Papers address the theoretical aspects of data compression and its relationship to problems on sequences, and include contributions from the editors.

IEEE Computer Society Press, 1998
(400 pages, 6" x 9", hard-bound)
ISBN 0-8186-8132-2



Proceedings, Data Compression Conference

James A. Storer, Conference Chair and co-editor

The DCC proceedings are co-edited with the DCC program committee chair, which over the years has included J. Reif (1991), M. Cohn (1992-2006), and M. Marcellin (2007).

Each volume has ten page extended abstracts of the presentations at technical sessions and one page abstracts of presentations at the posters session. Research areas addressed at DCC: Lossless and lossy compression algorithms for specific types of data (text, images, multi-spectral and hyper-spectral images, palette images, video, speech, music, maps, instrument and sensor data, space data, earth observation data, graphics, 3D representations, animation, bit-maps, etc.), source coding, joint source-channel coding, multiple description coding, quantization theory, vector quantization (VQ), multiple description VQ, compression algorithms that employ transforms (including wavelet transforms), bi-level image compression, gray scale and color image compression, video compression, geometry compression, speech and audio compression, compression of multi-spectral and hyper-spectral data, compression of science, weather, and space data, source coding in multiple access networks, parallel compression algorithms and hardware, fractal based methods, error resilient compression, adaptive compression algorithms, string searching and manipulation used in compression applications, closest-match retrieval in compression applications, browsing and searching compressed data, content based retrieval employing compression methods, minimal length encoding and applications to learning, system issues relating to data compression (including error control, data security, indexing, and browsing), medical imagery storage and transmission, compression of web graphs and related data structures, compression applications and issues for computational biology, compression applications and issues for the internet, compression applications and issues for mobile computing, applications of compression to file distribution and software updates, applications of compression to files storage and backup systems, applications of compression to data mining, data compression standards (including the JPEG, MPEG, G.xxx, and H.XXX families).

1991 - present
(6 by 9 inch, approximately 500 pages hard-bound)

IEEE Computer Society Press
10662 Los Vaqueros Circle
Los Alamitos, CA 90720
phone: 714-821-8380
toll-free phone: 800-678-4333
fax: 714-821-4641




Patents

Computational Modeling of Rotation and Translation Capable Human Visual Pattern Recognition

U.S. Provisional Patent Application.
INVENTORS: John Lisman, James A. Storer, Martin Cohn, Antonella DiLillo
FILED: August 29, 2005
(assigned to Brandeis University)

Addresses rotation and translation invariant recognition.


In-Place Differential File Compression

U.S. Patent Application.
INVENTORS: James A. Storer and Dana Shapira
FILED: March 18, 2004
GRANTED: July 18, 2006
(23 claims, 6 of them independent)

Addresses in-place differential file compression methods that can be used in software update and backup systems.


Method and Apparatus for Data Compression

United States Patent 5,379,036
INVENTOR: James A. Storer
FILED: April 1, 1992
GRANTED: January 1, 1995
(23 claims, 3 of them independent)

Addresses high speed parallel algorithms and hardware for data compression.


System for Dynamically Compressing and Decompressing Electronic Data

United States Patent 4,876,541
INVENTOR: James A. Storer
FILED: October 15, 1987
GRANTED: October 24, 1989
(55 claims, 5 of them independent)

Addresses dictionary based adaptive data compression.



Data Compression Conference




Brandeis Data Compression Group

Data compression is the encoding of data that is typically used to reduce storage and communication bandwidth requirements. Other applications that employ minimal size representations include machine learning and information retrieval. Lossless compression is when the decompressed data must be identical to the original; lossy compression is when the decompressed data may be an acceptable approximation. Here we use the term "data" broadly to refer to most anything that is represented in digital format (text, images, video, audio, etc.). Current compression research and the subject of recent Ph.D. theses of the Brandeis Data Compression Group include:
Content Based Image Retrieval
Hyperspectral Image Compression and Processing
String Similarity, Differential Compression, and Incremental Backup
Error Resilient Communication
Adaptive Image Compression
Lossless Image Compression
Video Compression
Sub-Linear Algorithms
Systolic Algorithms
Text Compression


Content Based Image Retrieval:

We have investigated a low complexity approach for content-based image retrieval (CBIR) using vector quantization (VQ). The hope is that encoding an image with a codebook of a similar image will yield a better representation than when a codebook of a dissimilar image is used. Our most simple method ÒtagsÓ each image with a thumbnail and a small VQ codebook of only 8 entries, where each entry is a 6 element color feature vector. If we think of a web crawler where the image database is being dynamically updated, the preprocessing can be thought of as a tagging of images as they are gathered. We use a simple mean squared error computation for the distance function for retrieval. To test our method, we have used for our database a subset of the COREL collection consisting of 1,500 JPEG images, 100 in each of 15 classes; three typical images from each class are shown below. These images are already relatively small, but still larger than is needed for a thumbnail. To create the database thumbnails, the system retains the central region (256 x 256 for these images) and scales it to 128 x 128 for each image (although it is possible to use smaller thumbnails, this size has allowed us to make direct comparisons with previous work).



The graph shown below compares the precision-recall tradeoff of our simple VQ algorithm to that of a more complex method based on GMVQ from the previous literature. Using the same code, with the appropriate places modified (distance function, etc.), VQ retrieval used on the average 1/3 of the time of GMVQ (that is the total time to perform a retrieval of each of the 1,500 images on the entire 1,500 image set was more than three times greater for GMVQ than VQ), while achieving the same or slightly better performance.



We have also condsidered extending our basic VQ approach presented in the previous section by adding positional information to each feature vector. That is, the mean XY coordinates of all blocks associated with a particular codebook entry are added to that entry (so the dimension of codebook entries is increased from 6 to 8). The figure below shows some database images with the code vector positions marked on the image. The color within each black bordered box represents the color of the code vector.



In general, the addition of XY coordinate features gives an overall modest improvement in retrieval quality. However, the real benefit of adding XY to the feature vector comes as "insurance" on classes with highly contrasting regions. For example, the first figure below shows the first 10 images retrieved for one of the dinosaur images with color only and the second figure below shows the first 10 for color + XY.

Without XY features:


With XY features:


For further information and references see:
  • A. H. Daptardar and J. A. Storer [2006]. "Reduced Complexity Content-Based Image Retrieval using Vector Quantization", Proceedings Data Compression Conference, IEEE Computer Society Press, 342-351.
  • A. H. Daptardar and J. A. Storer [2005]. "Content-Based Image Retrieval Via Vector Quantization", In Advances in Visual Computing - Springer Lecture Notes in Computer Science 3804/2005 (ISBN 3-540-30750-8), 502-509; also appeared in International Symposium on Visual Computing, December 5-7, 2005.



Hyperspectral Image Compression and Processing:

Remote sensing technology has shifted from panchromatic data (a wide range of wavelengths merged into a single response), through multispectral (a few possibly overlapping bands in the visible and infrared range with spectral width of 100 200nm each), to hyperspectral imagers and ultraspectral sounders, with hundreds or thousands of bands. In addition, the availability of airborne and spaceborne sensors has increased considerably, followed by the widespread availability of remote sensed data in different research environments, including defense, academic, and commercial. In hyperspectral photography, each pixel records the spectrum (intensity at different wavelengths) of the light reflected by a specified area, with the spectrum decomposed into many adjacent narrow bands. Data compression plays a key role in the original transmission of imagery to a central location, in the archiving of data sets, and in the distribution of data to users.



We have considered a number of approaches to hyperspectral image compression, including the ones where vector quantization is applied independently to each pixel in a way that preserves uniform quality across all vector components. That is, high dimensional source vectors are partitioned into a number of subvectors of (possibly) different length and then each subvector is individually encoded with an appropriate codebook and distortion measure. Then an entropy coder that may exploit spatial correlations is employed for both the VQ indices and residual. This scheme allows efficient lossless, near lossless, or lossy compression of high dimensional data sources such as hyperspectral imagery in which each vector component is allowed to have different alphabet and distribution. A key advantage of our approach is the use of independent small VQ codebooks that allow fast encoding and decoding. Motivated by the need of maintaining uniform quality across all vector components, especially for applications needing target identification, our experiments have used the Percentage Maximum Absolute Error (PMAE) as the distortion measure.



For further information and references see:

    G. Motta, F. Rizzo, and J. A. Storer [2006], editors.. Hyperspectral Data Compression, Springer.
  • F. Rizzo, B. Carpentieri, G. Motta, and J. A. Storer [2005]. "Low Complexity Lossless Compression of Hyperspectral Imagery via Linear Prediction", IEEE Signal Processing Letters 12:2, 138-141.
  • F. Rizzo, B. Carpentieri, G. Motta, and J. A. Storer [2004]. "Compression of Hyperspectral Imagery", Proceedings International Conference on E-Business and Telecommunications (ICETE), Sebutal, Portugal, 2004; an extended version is a chapter in the book E-Business and Telecommunication Networks, Edited by J. Ascenso, L. Vasiu, C. Belo, and M. Saramago, Kluwer / Springer (2005).
  • F. Rizzo, B. Carpentieri, G. Motta, and J. A. Storer [2004]. "Real-Time Software Compression and Classification of Hyperspectral Images", 11th SPIE International Symposium on Remote Sensing Europe, Maspalomas, Gran Canaria, Spain, Sept. 13-17, 5573-17.
  • F. Rizzo, B. Carpentieri, G. Motta, and J. A. Storer [2004]. "Lossless Compression of Hyperspectral Imagery: A Real Time Approach", 11th SPIE International Symposium on Remote Sensing Europe, Maspalomas, Gran Canaria, Spain, Sept. 13-17, 2004, 5573-25.
  • G. Motta, F. Rizzo, J. A. Storer, and B. Carpentieri [2004]. "High Performance Compression of Hyperspectral Imagery with Reduced Search Complexity in the Compressed Domain", Proceedings Data Compression Conference, IEEE Computer Society Press, 479-488.
  • G. Motta, F. Rizzo, and J. A. Storer [2003]. "Partitioned Vector Quantization: Application to Lossless Compression of Hyperspectral Images", Proceedings International Conference on Acoustics, Speech, and Signal Processing (ICASSP); presented at IEEE International Conference on Multimedia and Expo (ICME2003).
  • G. Motta, F. Rizzo, and J. A. Storer [2003]. "Compression of Hyperspectral Imagery", Proceedings Data Compression Conference, IEEE Computer Society Press, 333-342.



String Similarity, Differential Compression, and Incremental Backup:

Identifying similar strings is a problem that arises in many applications. Internet search algorithms and database retrieval algorithms, including tasks in bioinformatics, can employ effective string similarity testing to process queries. The completion of the human genome as well as that for other species has produced large strings in which researchers are looking for structure. Differencing algorithms used by software updates and backup systems can be improved with string similarity algorithms. The traditional edit-distance problem is to find the minimum number of insert-character and delete-character (and sometimes change character) operations required to transform a string S (the source string) to become a string T (the target string); that is, starting with S, repeatedly perform operations until we have T, and do so using a minimum number of operations. However, there are many other natural operations that can be used to transform S to become T, including copies, moves, deletes, and reversals of blocks of characters. However the problem complexity is greater (NP-hard in many cases) and efficient approximation algorithms are needed. If the copy operation is allowed, then efficient approximation algorithms that employ data compression techniques can be employed, with applications to differential file compression. In fact, in many practical applications, differential file compression can be performed in-place with performance comparable to the best methods that work with unrestricted memory. In-place differential file compression has the advantage that the sender need not make any assumptions about the resources available to the receiver, or in the case of file back-up systems that work in the background, fewer resources must be claimed. That is, even if other operations are allowed as well, the copy operation is sufficiently powerful, that it alone can achieve efficient approximation. We have considered just the move operation (or the move operation in combination with the insert-character and delete-character operations), just delete operations (possibly with insert-character operations - which is not NP-hard), or combinations of move and delete operations (and possibly insert-character operations). One way to model the move operation is to view strings as linked lists (one character per vertex) and allow the insert and delete operations to be applied to both vertices and links. Although one would not likely represent strings this way in practice, it points out a fundamental difference between these operations and the copy operation; that is, move and delete operations can be done in O(1) time on linked lists whereas copy operations cannot. For many applications, the rearrangement implied by move operations (possibly together with delete and insert-character operations) may better reflect string similarity than the ability to compress the string with respect to another, as is in some sense implied by the copy operation. Because optimal differential compression and generalized edit distance is in general NP-hard, in past work we have studied approximation algorithms and their efficient implementation. Shapira and Storer [2002] consider approximation algorithms for generalized edit distance where block-moves are allowed. Shapira and Storer [2003] present the IPSW algorithm for in-place differential file compression (no more than MAX(m,n) + O(1) space is used where m is the size of S and n is the size of T). IPSW uses a sliding window method to decompress T by initializing the window to S, and restricting its maximum size to max(|S|,|T|); except for the beginning, copies cannot reach all the way back to the start of S, and while decoding in-place, the no longer accessible portion of S is overwritten from left to right.



IPSW is fast and the compression achieved compares well with other existing methods, including those that are not in-place. IPSW is most effective when there is a reasonable degree of alignment between S and T. That is, large matches between S and T occur in approximately the same relative order. Such alignment is typical for many applications, such as subsequent backups of the same data, where when new data is inserted, and other data is deleted, the remaining data often comprises aligned common substrings. We have also condsdered methods with good performance in-place even for those (rare) cases where S and T are not sufficiently well aligned. This could happen, for example, when for some reason large blocks of the file have been moved around to form a new version of the file. An extreme case is when the first and second halves of S have been exchanged to form T (S = uv and T = vu); to decompress T the IPSW algorithm overwrites u as it copies v, and then the ability to represent v with a single copy is lost. Rather than modify IPSW, a very fast and practical method that suffices for many and perhaps most practical inputs, is a preprocessing stage for IPSW that moves substrings within S to better align S with T. Compressed files can be preceded with an initial bit indicating whether preprocessing has occurred. The encoder can compress T with IPSW and compare that to compressing T with respect to S not in place (so at all times all substrings of S are available to be copied). If the difference is significant, this initial bit can be set, alignment preprocessing performed, and a list of moves prepended the normal IPSW encoding. The decoder can perform the moves (in-place) and then proceed in-place with normal IPSW decoding. For move pre-processing, we distinguish employ two different kinds of rearrangements of the blocks. Moves rearrange the blocks so that they occur in S and T at the same order. "Jigglings" move the junk blocks of S backwards in the file, so that, as a side affect, the matched blocks are moved forwards. Move Preprocessing Algorithm:
Step 1: Find Non Overlapping Common Substrings (NOCS) from longest to shortest down to length of a constant K.
Step 2: Compute the minimum number of rearrangements (moves and jigglings) in S so that the blocks are left copies within S and T.
Step 3: Generate S' according to step 2.
Step 4: Compute IPSW(S',T).
Under reasonable assumptions, all of the preprocessing steps 1 through 4 can be done in linear time.

For further information and references see:
  • D. Shapira and J. A. Storer [2004]. "In-Place Differential File Compression of Non-Aligned Files With Applications to File Distribution and Backups", Proceedings Data Compression Conference (DCC), IEEE Computer Society Press, 82-91.
  • D. Shapira and J. A. Storer [2003]. "In-Place Differential File Compression", Proceedings Data Compression Conference (DCC), IEEE Computer Society Press, 263-272.
  • M. Meyerovich, D. Shapira, and J. A. Storer [2003]. "Finding Non-Overlapping Common Substrings in Linear Time", Proceedings Symposium on String Processing and Information Retrieval (SPIRE), 369-377.
  • D. Shapira and J. A. Storer [2002]. "Generalized Edit Distance with Move Operations", Thirteenth Annual Symposium on Combinatorial Pattern Matching (CPM), 85-98.
  • F. Mignosi, A. Restivo, M. Sciortino, and J. A. Storer [2000]. "On Sequence Assembly", Technical Report, Computer Science Department, Brandeis University.



Error Resilient Communication:

There has been much past work concerning the detection and correction of errors on a communication line. However, with on-line data compression where a sender and receiver are in lock-step maintaining constantly changing identical local memories, a single error on the communication channel or storage medium could cause the dictionaries of the encoder and decoder to differ, which in the worst case could corrupt all data to follow. That is, since decoding is only guaranteed to be correct when the dictionary remains the same as the one used to encode the data, there is no way to bound the effects of a single error in the worst case (and, in fact, the expected case is very bad because a single error can corrupt the mapping between codewords and dictionary entries). With data storage devices or one-way communication channels, where the decoder cannot send messages to the encoder, the problem becomes even more critical. In addition, with many existing communication systems where the full bandwidth of the channel is consumed, even if a two-way channel is available to let the encoder know that an error has occurred, re-transmission of data can often be tantamount to losing new data that could have been transmitted during the time used to re-transmit the old data. Re-transmission can also introduce further error propagation problems. Even if error detection and correction methods are available that make the probability of an error getting through sufficiently low, error propagation prevention may be all that is needed for non-critical applications such as web browsing, email processing, etc. We have developed the k-error protocol, a technique for protecting a dynamic dictionary method from error propagation as the result of any k errors on the communication channel or compressed file. In addition theoretical results show that at no asymptotic loss of compression, high probability protection against error propagation resulting from a sustained error rate is also a benefit of the protocol. Experiments show that this approach is highly effective in practice with a low overhead. For LZ2-based methods that "blow up" as a result of a single error, with the protocol in place, high error rates can be sustained without error propagation (the only corrupted bytes decoded are those that are part of the string represented by a pointer that was corrupted).



For further information and references see:

  • J. A. Storer [2000]. "Error Resilient Dictionary Based Compression", Proceedings of the IEEE 88:11, 1713-1721.

  • J. A. Storer and J. H. Reif [1997]. "Low-Cost Prevention of Error-Propagation for Data Compression with Dynamic Dictionaries", Proceedings1997 IEEE Data Compression Conference, 171-180.

  • J. A. Storer and J. H. Reif [1997]. "Error Resilient Optimal Data Compression", SIAM Journal of Computing 26:3, 934-939.



Adaptive Image Compression:

We are studying adaptive vector quantization where the codebook is learned as the image is processed in a single pass (i.e., the codebook contains a changing set of variable-size vectors). Experiments with a variety of data types (gray scale and color magazine photos, medical images, space images, finger prints, handwriting, etc.) show that with no prior knowledge of the data and no training, for a given fidelity this approach compares favorably with traditional VQ that employs codebooks that have been trained on the type of data being compressed (it typically at least equals and often greatly exceeds the amount of compression achieved by the JPEG standard). We have developed efficient parallel algorithms and architectures as well as fast sequential implementations using KD-trees. Key advantages of this approach are that it is adaptive (no advance knowledge of the data or training is required), the tradeoff between the fidelity and the amount of compression can be continuously varied (and automatically controlled), precise guarantees on the amount of distortion on any small area of the image (e.g., 4x4 array of pixels) can be provided, and decoding is extremely fast because it involves primarily table lookup like traditional VQ (encoding is on the same order of complexity as for traditional VQ). The figure below shows an image of a brain on the left and a "map" of the decompressed image on the right (each rectangle has been colored with a random solid color).



For further information and references see:

  • F. Rizzo, J. A. Storer, and B. Carpentieri [2001]. "LZ-Based Image Compression", Information Sciences 135,, 107-122.

  • F. Rizzo and J. A. Storer [2001]. "Overlap in Adaptive Vector Quantization", Proceedings Data Compression Conference, IEEE Computer Society Press, 401-410.

  • G. Motta, F. Rizzo, and J. A. Storer [2000]. "Image Compression", Encyclopeda of Computer Science, Nature Publishing Group, London, 836-840.

  • F. Rizzo, J. A. Storer, and B. Carpentieri [1999]. "Improving Single-Pass Adaptive VQ", IEEE International Conference on Acoustics, Speech, and Signal Processing, IMDSP2.10; also see the article by the same authors: "Experiments with Single-Pass Adaptive Vector Quantization", Proceedings IEEE Data Compression Conference, 546.

  • C. Constantinescu and J. A. Storer [1994]. "On-Line Adaptive Vector Quantization with Variable Size Codebook Entries", Information Processing and Management 30:6, 745-758.

  • C. Constantinescu and J. A. Storer [1994b]. "Improved Techniques for Single-Pass Vector Quantization", Proceedings IEEE (June), 933-939.

  • J. Lin and J. A. Storer [1994]. "Design and Performance of Tree-Structured Vector Quantizers", Information Processing and Management 30:6, 851-862. Lin).

  • M. Cohn, E. Lin, and J. A. Storer [1992]. "Optimal Pruning for Tree-Structured Vector Quantization", Information Processing and Management 28:6, 723-733.

  • J. Lin and J. A. Storer [1992]. "Improving Search for Tree-Structured Vector Quantization", Proceedings IEEE Data Compression Conference, 339-348.

  • J. Lin and J. A. Storer [1991]. "Resolution Constrained Tree Structured Vector Quantization for Image Compression", Proceedings IEEE Symposium on Information Theory, Budapest, Hungary.

  • J. A. Storer [1990]. "Lossy On-Line Dynamic Data Compression", Combinatorics, Compression, Security and Transmission, Springer-Verlag, 348-357.



Lossless Image Compression:

Lossless image compression, particularly bi-level image compression (where pixels are 0 or 1) has often employed techniques quite separate from those used for text compression or lossy image compression; most standards today employ modeling followed by coding (e.g., the JBIG standard, the IBM Q-coder, CCITT Group 4). Constantinescu and Storer [1994] present a lossy image compression scheme that can be viewed as a generalization of lossless dynamic dictionary compression ("LZ2" type methods) to two dimensions with approximate matching; Constantinescu and Storer [1995] have experimented with this approach for lossless image compression with great success. In this research we have generalized "LZ1" type methods (that employ displacement/length pointers to the previously seen text) to lossless image bi-level compression. We make use of a generalization of the suffix trie data structure for two dimensions that was introduced by Giancarlo [1995]. In a two dimensional suffix trie, the suffix at a position (i,j) in the two dimensional input array is just the portion of the array going to the right and down from that position. We list the characters of a suffix in the order depicted on the left in the figure below. Similar to the standard convention used in the one dimensional case, we pad the right and bottom sides of the array with a special character $ as a notation convenience. On the right in the figure below is an example of a 3x3 array (padded to a 4x4 array with $'s) and its corresponding 2D suffix trie. We have generalized the LZ1 approach to two dimensions and confirmed that, like the 1-dimensional case, the complexity of the general problem is NP-complete. Motivated by this fact and that 2D suffix tries can only give information about square shaped matches, we have considered the performance of greedy parsing with square matches, and have introduced a modified parsing strategy that searches for rectangular matches based on the best square match. We have also introduced multi-shape 2D suffix tries; they are more space consuming but have the ability to store information about arbitrary rectangular shaped matches at any position.



For further information and references see:

  • G. Motta, J. A. Storer, and B. Carpentieri [2000]. "Lossless Image Coding via Adaptive Linear Prediction and Classification", Proceedings of the IEEE 88:11, 1790-1796.

  • J. A. Storer and H. Helfgott [1997]. "Lossless Image Compression by Block Matching", The Computer Journal 40:2/3, 137-145.

  • J. A. Storer and H. Helfgott [1997]. "Generalized Node Splitting and Bi-Level Image Compression", Proceedings IEEE Data Compression Conference, 443.

  • J. A. Storer [1996]. "Lossless Image Compression using Generalized LZ1-Type Methods", Proceedings IEEE Data Compression Conference, 290-299.

  • C. Constantinescu and J. A. Storer [1995]. "Application of Single-Pass Adaptive VQ to BiLevel Images", Proceedings IEEE Data Compression Conference, 423.

  • See also the paper mentioned earlier that surveys this approach as well as lossy approaches: F. Rizzo, J. A. Storer, and B. Carpentieri [2001]. "LZ-Based Image Compression", Information Sciences.



Video Compression:

Digitized video can require over 1 billion bits per second in uncompressed form. We have developed adaptive algorithms for block displacement estimation, which when combined with individual frame compression and lossless post-processing can yield high compression while maintaining acceptable fidelity. Our approach is to adaptively grow and shrink variable size and shaped collections of blocks, called "superblocks". We also consider how parallel mesh architectures can be employed for high speed real time encoding. The first figure below shows the first and last frame of a movie clip and the second figure a "map" (each superblock has been colored with a random color) of how it was partitioned into regions (with a course fidelity setting); the hatched blocks are areas where due to motion in the image, portions of superblocks are splitting off. We have also used image classification algorithms that work off-line on a sub-sampling of the frames to provide information to enhance our on-line algorithm. In addition, we have considered the optimization of codings of this type.



For further information and references see:

  • G. Motta and J. A. Storer [2000]. "An Optimal Frame Skipping Algorithm for H263+ Video Encoding", Proceedings 11th Annual International Conference on Signal Processing Applications and Technology, Dallas, TX.
  • G. Motta and J. A. Storer [2000]. "Improving Scene Cut Quality for Real-Time Video Decoding", Proceedings IEEE Data Compression Conference, 470-479.

  • B. Carpentieri and J. A. Storer [1996]. "A Video Coder Based on Split-Merge Displacement Estimation", Journal of Visual Communication and Visual Representation 7:2, 137-143.

  • B. Carpentieri and J. A. Storer [1996]. "Classification of Objects in a Video Sequence", Proceedings SPIE Symposium on Electronic Imaging, San Jose, CA.

  • B. Carpentieri and J. A. Storer [1994]. "Split-Merge Video Displacement Estimation", Proceedings of the IEEE (June), 940-947.

  • B. Carpentieri and J. A. Storer [1994]. "Optimal Inter-Frame Alignment for Video Compression", International Journal of Foundations of Computer Science 5:2, 65-177.



Sub-Linear Algorithms:

Although adaptive dictionary compression via textual substitution is P-complete (DeAgostino [1994], there are many interesting and practical issues that can be addressed. We have investigated poly-log algorithms for the static-dictionary case. In addition, we have developed fast algorithms for what is perhaps the most practical model of massively parallel computation imaginable: a linear array of processors where each processor is connected only to its left and right neighbors, together with a parallel I/O facility (see the figure below). In other words if one views the PRAM model as having two types of complexity, the inter processor communication and the input-output mechanism, one can view this architecture as having virtually eliminated the complexity of inter-processor communication. Parallel I/O is inherent in any sub-linear algorithm (and makes this model much more powerful than the systolic arrays used for our research on real-time compression).



For further information and references see:

  • D. Belinskaya, S. DeAgostino, and J. A. Storer [1995]. "Near Optimal Compression with Respect to a Static Dictionary on a Practical Massively Parallel Architecture", Proceedings IEEE Data Compression Conference, 72-181.

  • S. DeAgostino and J. A. Storer [1992]. "Parallel Algorithms for Optimal Compression Using Dictionaries with the Prefix Property", Proceedings IEEE Data Compression Conference, 152-61.



Systolic Algorithms:

We have studied algorithms for data compression for a systolic pipe; that is, a linear array of processors, each connected only to its left and right neighbors. A real-life example of a systolic pipe is an automobile assembly line that may produce a new car every few minutes even though each car is in the assembly line for a day. Although each station in the automobile assembly line performs a different task, the stations are at least conceptually identical, if we view them all as taking as input a partially built car, performing an elementary operation (such as welding), and then outputting a partially completed car to the next station. From the point of view of practical VLSI implementation, a systolic pipe is very desirable - all processors are identical, connections between adjacent processors are short, the structure can be laid out in linear area, power and ground can be routed without crossing wires, and the layout strategy can be independent of the number of chips used (a larger pipe can be obtained by placing as many processors as possible on a chip and then, using the same layout strategy, placing as many chips as possible on a board). In addition, it is typical for processors to be extremely simple (e.g., a few registers, a comparator, and some finite-state logic). For a variety of textual substitution methods, and have designed and fabricated massively parallel systolic hardware for high bandwidth real-time adaptive dictionary compression. The first figure below shows the layout of an individual processor and the second shows on the left a prototype chip containing 128 processors and on the right 30 chips on a VME board (30 instead of 32 are needed for a 4K dictionary because 256 virtual entries correspond to the raw byte values). The design is inherently fast because all processors compute simultaneously; a clock cycle needs to only be long enough for a individual processor to receive 12 bits and make a 24-bit comparison (there is no global memory or communication). With year 2000 technology, 4K processors can be placed on a single chip that forms a 4K array that can compress or decompress at over a billion bits per second with only an 8-bit data path.





For further information and references see:

  • J. A. Storer and J. H. Reif [1992]. "A Parallel Architecture for High Speed Data Compression", Journal of Parallel and Distributed Computing 13 , 222-227.

  • D. M. Royals, T. Markas, N. Kanopoulos, J. A. Storer and J. H. Reif [1993]. "On the Design and Implementation of a Lossless Data Compression and Decompression Chip", IEEE Journal of Solid-State Circuits (September) , 948-953.

  • J. A. Storer [1992]. "Massively Parallel Systolic Algorithms for Real-Time Dictionary-Based Text Compression", Image and Text Compression, Kluwer Academic Press, 159-178.

  • J. A. Storer, J. H. Reif, and T. Markas [1990]. "A Massively Parallel VLSI Design for Data Compression using a Compact Dynamic Dictionary", Proceedings IEEE VLSI Signal Processing Conference, San Diego, CA, 329-338.

  • J. Storer and J. H. Reif [1990]. "A Parallel Architecture for High Speed Data Compression", Third Symposium on the Frontiers of Massively Parallel Computation, College Park, Maryland, IEEE Press, 238-243.

  • M. Gonzalez and J. A. Storer [1985]. "Parallel Algorithms for Data Compression", Journal of the ACM 32:2, 344-373.



Text Compression:

Textual substitution methods parse a string into phrases and replace those phrases with pointers to copies, called targets of the pointers, that are stored in a dictionary. The encoded string is a sequence of pointers (some of which may represent single characters). Static methods are when the dictionary is known in advance. By contrast, with dynamic or adaptive methods, the dictionary may be constantly changing as the data is processed (these methods are often called "LZ2" methods due to the work of Ziv and Lempel [1978], and are somewhat different from sliding window methods, which are often called "LZ1" methods due to the work of Lempel and Ziv [1977]). Ziv and Lempel [1978] investigated the encoding power of a finite state one-way head machine with an unrestricted decoder and showed for each individual sequence an asymptotically attainable lower bound on the achievable compression ratio. Furthermore, they achieved this lower bound with an encoder/decoder pair where both are finite state one-way head machines (the LZ2 algorithm), proving that imposing this condition only on the encoder does not lead to better compression. Sheinwald, Lempel and Ziv [1995] show that the same asymptotically attainable lower bound holds when the encoder is unrestricted and the decoder is a finite state one-way head machine, proving that imposing this condition only on the decoder does not lead to better compression either. Hence, off-line encoding for on-line decoding is not more powerful with respect to asymptotic results for infinite strings from an information source (under some information-theoretic assumptions). We are interested in the power of off-line decoding for on-line decoding in the finite case. This models situations such as distribution of data on CD-ROM where much time can be spent encoding but decoding must be fast and simple. We have introduced the notion of on-line decodable optimal parsing based on the parsing defined by the LZ2 algorithm and have shown that optimal off-line encoding is NP-complete for both the original LZ2 technique and for some widely used practical implementations (e.g., first-character learning and next-character learning). We have also shown that approximation algorithms for these problems cannot be realized on-line.

For further information and references see:
  • J. H. Reif and J. A. Storer [2001]. "Optimal Encoding of Non-Stationary Sources", Information Sciences 135, 87-105.

  • S. DeAgostino and J. A. Storer [1996]. "On-Line versus Off-Line Computation in Dynamic Text Compression", Information Processing Letters 59, 169-174.

  • J. A. Storer [1991]. "The Worst-Case Performance of Adaptive Huffman Codes", Proceedings IEEE Symposium on Information Theory, Budapest, Hungary.

  • J. A. Storer [1985]. "Textual Substitution Techniques for Data Compression", Combinatorial Algorithms on Words, Springer-Verlag, 111-129.

  • J. A. Storer and T. G. Szymanski [1982]. "Data Compression Via Textual Substitution", Journal of the ACM 29:4, 928-951.

  • J. A. Storer [1983]. "An Abstract Theory of Data Compression", Theoretical Computer Science 24, 221-237.



Other Research

For some of Dr. Storer's past research outside the data compression group, see:


Useful Advice

Almost as much time can be wasted reading web pages as constructing them.
Get back to work!