Tutorials on Algorithms for Massive Data Sets

ICALP 2003 Satellite Event

Eindhoven, The Netherlands, June 29, 2003

* * * * The tutorials have been CANCELLED * * * *

Scope and Topics

Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting data transfer (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. During the last decade a major body of research has been devoted to the development of efficient external memory algorithms, where the goal is to exploit locality in order to reduce the I/O costs.

As a pre-workshop for ICALP 2003, S. Muthu Muthukrishnan and Erik D. Demaine will give tutorials on recent research developments within the following two areas deadling with massive data sets.

Algorithms for data streams
S. Muthu Muthukrishnan, Rutgers University and AT&T Shannon Labs

Abstract Telecommunication networks generate massive streams of traffic logs. Monitoring, auditing and mining such streams is an integral part of network management. Research over the past few years in this area has lead to certain key insights. It is difficult to find a needle in the haystack within the constraints faced in the network application. Fortunately, several interesting network traffic phenomena have the ``few good terms'' property, i.e., summarizing a few trends in the traffic - deviants, wavelet terms, histograms,correlations - is both feasible and insightful. Finally, data quality problems in network datafeeds present many new challenges and principled approach to confronting data quality problems is an implicit concern for data stream mining. In this tutorial, I will survey the application, discuss these insights and present novel algorithmic solutions that have been developed.
Cache-oblivious algorithms and data structures
Erik D. Demaine, MIT

Abstract A recent direction in the design of cache-efficient and disk-efficient algorithms and data structures is the notion of cache obliviousness, introduced by Frigo, Leiserson, Prokop, and Ramachandran in 1999. Cache-oblivious algorithms perform well on a multilevel memory hierarchy without knowing any parameters of the hierarchy, only knowing the existence of a hierarchy. Equivalently, a single cache-oblivious algorithm is efficient on all memory hierarchies simultaneously. While such results might seem impossible, a recent body of work has developed cache-oblivious algorithms and data structures that perform as well or nearly as well as standard external-memory structures which require knowledge of the cache/memory size and block transfer size. Here we describe several of these results with the intent of elucidating the techniques behind their design. Perhaps the most exciting of these results are the data structures, which form general building blocks immediately leading to several algorithmic results.

The two tutorials take place on Sunday June 29 as a satellite event of ICALP 2003 (June 30-July 4, 2003), at the Technische Universiteit Eindhoven, The Netherlands. The tutorials are aimed at graduate students and researchers. Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).

Registration and travel information

Is available at the ICALP 2003 webpage.

Program committee

Lars Arge, Duke University
Gerth Stølting Brodal (co-chair), University of Aarhus
Erik D. Demaine, MIT
Rolf Fagerberg (co-chair), University of Aarhus
Paolo Ferragina, University of Pisa
Giuseppe F. Italiano, University of Rome, "Tor Vergata"
Ulrich Meyer, Max-Planck-Institut, Saarbrücken
Rajeev Raman, University of Leicester
Jeff Vitter, Purdue University
Norbert Zeh, Dalhousie University

Page maintained by Gerth Stølting Brodal