UNIVERSITY OF AARHUS DEPARTMENT OF COMPUTER SCIENCE External Memory Algorithms and Data Structures (EMADS), Fall 2003 |
|
Lecturers · Time and place · Course description · Schedule · Literature · Projects · Evaluation · Prerequisites · Course language · Credits |
Monday 13.15-15.00 and Wednesday 11.15-12.00 (Weeks: 35 - 41 and 44 - 50) in Auditorium D4.
Computer systems usually have an entire hierarchy of memory levels, with each level having its own performance characteristics. Typical memory levels are: CPU registers, several levels of memory cache, main memory (RAM), and secondary memory (disk). Traditionally, the design of algorithms does not take this memory hierarchy into account, but assumes a model with a single level of main memory.
In an increasing number of problems, the amount of data to be processed is far too massive to fit into internal memory. Examples include data collections in astronomy, geology, meteorology, and finance, as well as web search engines, VLSI verification, computer graphics, and geographic information systems (GIS). In such applications, the amount of data may be measured in terabytes.
When dealing with data sets of sizes exceeding main memory, communication between the fast internal memory and the slow external memory is often the performance bottleneck, and the analysis of algorithms under the assumption of a single level of memory may be meaningless.
Instead, a more realistic measure of the efficiency of an algorithm is the number of I/O-operations performed between internal memory and disk. Algorithms designed to minimize this number are termed external memory algorithms.
In this course, we will study the design and analysis of efficient external memory algorithms and data structures. Different paradigms for efficiently solving problems in external memory will be presented, and a number of specific algorithms from areas like sorting and searching, computational geometry, strings, and graphs will be covered.
Week | Date | Topics | Reading |
35 | 25/8 | Introduction (slides: ps.gz). I/O model. Project 1 - Identifying Memory Hierarchies (slides: ps.gz | pdf). | [A97, Sect. 1-3.2] |
27/8 | Sorting and Permuting (slides: ps.gz | pdf). | [KL92] | |
36 | 1/9 | Lower bounds for sorting and permuting
(slides: ps.gz | pdf). Lower bounds by I/O-comparison trees (slides: ps.gz | pdf). Optimal merging (slides: ps.gz | pdf). | [KL92, Sect. 1-3.2 (excl. p.8-9), Sect. 4.1-4.2] |
3/9 | B-trees (slides: ps.gz | pdf). | [BF00] | |
37 | 8/9 |
Amortized analysis of B-trees. Trade-offs for external memory dictionaries (not in curriculum; slides: ps.gz | pdf). Introduction to buffer trees (slides: ps.gz | pdf). | [BF03],[A03] |
10/9 | Discussion of project 1. | ||
38 | 15/9 | Buffer trees: basic structure, used as priority queues, and used as range trees. | [A03] |
17/9 | Buffer trees used as range trees and as segment trees. | [A03] | |
39 | 22/9 | External Memory Interval
Trees. Deadline for project 1. | [AV96] |
24/9 | Weight Balanced B-trees. | [AV96] | |
40 | 29/9 | External Memory 3-Sided Orthogonal Range Searching. | [ASV99] |
1/10 | External Memory 4-Sided Orthogonal Range Searching. Project 2 - Sorting. | [ASV99] | |
41 | 6/10 |
R-trees (slides: ps.gz | pdf). Connected Components. | [AHVV02],[A83, Sect. 2.3],[ABW02] |
8/10 | Connected Components
(slides: ps.gz | pdf). Minimum Spanning Forest. | [ABW02] | |
42-43 | Fall break | ||
44 | 27/10 | More algorithms for MST. | [ABT00, Sections 1-2] |
29/10 | DFS | [BGVW00] | |
45 | 3/11 | BFS | [MR99, Section 5.1],[MM02] |
5/11 | Single Source Shortest Path (SSSP). Deadline for project 2. | [KS96] | |
46 | 10/11 | The String B-tree | [FG99, Sect. 1, 2.1] |
12/11 | String Sorting | [AFGV97] | |
47 | 17/11 | Project 3 - A GIS system Suffix Array Construction | [KS03] |
19/11 | Suffix Array Construction | [KS03] | |
48 | 24/11 | Cache oblivious algorithms:
model, matrix multiplication, search trees
(slides: ps.gz | pdf). | [FLPR99, Sect. 1-2],[BFJ02] |
26/11 | Funnel-Sort | [FLPR99, Sect. 4][BF02] | |
49 | 1/12 | Cache oblivious distribution sweeping | [BF02] |
3/12 | Cancelled ("Åben dag") | ||
50 | 8/12 | Streaming: counting distinct elements, point queries, range queries | [CM04],[FM85] |
10/12 | Beer & softdrinks |