UNIVERSITY OF AARHUS DEPARTMENT OF COMPUTER SCIENCE Algorithms for Web Indexing and Searching, Fall 2002 |
|
Lecturers · Time and place · Course description · Schedule · Literature · Evaluation · Prerequisites · Literature · Course language · Credits |
Mondays 13.15-14.00 and Wednesdays 9.15-11.00 in Auditorium D4.
In this course we will study the algorithmic issues involved in efficient indexing and searching of the web. Starting with classic string algorithms and data structures, we will continue with recent techniques such as indexing schemes for massive amounts of data, link-based ranking algorithms, spectral methods for graph analysis, clustering and nearest neighbor search in high-dimensional spaces, statistical methods for clustering documents, efficient near-equality testing using randomized algorithms and cache replacement strategies. This list of subjects is tentative, and may evolve during the course.
Date | Topics | Reading |
September 2 | Introduction to course
(slides ps |
pdf) Anatomy of Search Engines (slides ps | pdf) | [ACGPR01],[BP98] |
September 4 | Anatomy of Search Engines Web Crawling (slides ps | pdf) | [ACGPR01],[BP98],[NH01],[HN99],[NW01] |
September 9 | Web Crawling and Web
Dynamics
(slides ps |
pdf) | [RM02],[CG00] |
September 12 | Focused crawling Crawling the hidden web | [MPSR01],[CBD99],[RG00] |
September 16 | Project discussion | |
September 18 | I/O model
(slides ps |
pdf),
Sorting
(slides ps |
pdf),
B-trees Indexing (slides ps | pdf) | [MRYG00],[WMB99],[GT98, Pages 660-663] |
September 23 | Hashing | [WMB99] |
September 25 | Tries, Ternary trees | [BS97],[MM90] |
September 30 | Project discussion | |
October 2 | Suffix trees, Suffix arrays | [U95],[MM90] |
October 7 | Efficient computation of PageRank | [H99] |
October 9 | Efficient computation of PageRank, Authoritative sources | [H99],[K99] |
October 14 | Fall break | |
October 16 | Fall break | |
October 21 | Extensions of Kleinbergs algorithm | [CDRRGK98],[CDGKRRT98],[BH98] |
October 23 | Optimal merging
(slides ps |
pdf) Classical information retrieval - models and ranking | [BB99, Chapter 3],[BR99, Section 2.5.1-3, 3.2, 8.4] |
October 28 | Querying | [BR99, Section 4.2-3] |
October 30 | Fingerprinting, Document resemblance | [B93],[B97],[BGNZ97] |
November 4 | Clustering | [B97],[BGNZ97] |
November 6 | Clustering | [HGI00] |
November 11 | Clustering large matrices | [DFKVV99] |
November 13 | Latent semantic indexing | [BDB94],[AFKMS01] |
November 18 | Cancelled | |
November 20 | Cancelled | |
November 25 | Cancelled | |
November 27 | Finding mirrored hosts | [BBDH00] |
December 2 | The web as a graph | [GL02],[BKMRRSTW00],[KRRSTU00] |
December 4 | Cancelled ("Åben dag") | |
December 9 | Bayesian classifiers | [L98] |
December 11 | Cancelled (moved to December 12) | |
December 12 15.15-17.00 Auditorium D4 | Guest lecture René DePont Christensen, Thomas Rask Thomsen (Dreamgate) | announcement |
December 16 | SALSA | [LM01] |
December 18 | Computational game theory and mechanisms design | [P01] |