Algorithm Engineering (Q3, 2013)

Qualification description

The participants will after the course have practical experience with experimental evaluation of the performance of algorithm implementations and the comparison of experimental performance with theoretical asymptotic performance and insight into the influence of computer architecture on the design of efficient algorithms. The working method of the course will also train the participants to read and understand research papers. The participants must at the end of the course be able to:

Course objective

Algorithm engineering combines the theoretical area of algorithm design with practice. Traditional algorithm design assumes the unit cost random access machine model (RAM) as the model of computation - an reasonable adequate model for studying the worst-case performance of algorithms. For more accurate estimations of an algorithm implementations running time of actual hardware, more refined models of current computers are needed, eg. to model cache hierarchies, translation lookaside buffers (TLB), branch prediction strategies, conditional operations, pipelining, trade-offs between cache-faults and branch mispredictions. The core of algorithm engineering is the cycle of algorithmic design, analysis, implementation, and experiment. The goal of this course is to facilitate this cycle of algorithmic development among the students. The initial case studies of the course will be very basic algorithmic problems like scanning, searching sorted lists, merging, sorting, matrix multiplication, and shortest path computations.

Time and Place

January 29-February 5: Tuesdays 11.15-14.00, Nygaard 192
Febuary 11-March 11: Monday 9.15-12.00, Nygaard 192


Gerth Stølting Brodal

Course plan

Date Content Literature Slides
29/1 Introduction and discussion of Project 1
The literature [BFJ02,BM06] relates to project 1, but wil be covered more throughoutly in later lectures.
The papers [J02,S09] are general papers about Algorithm Engineering.
[J02,S09],[BFJ02,BM06] intro | cache
5/2 Memory hierarchies; Memory layout of data structures [FLPR12,BFJ02,JM13] address mapping
12/2 Lecture cancelled due to sickness    
19/2 Branch mispredictions; Trade-offs between cache-misses and branch mispredictions [KS06,BM05,BFM05,BM06] mispredictions
26/2 Certifying algorithms [MMNS11, Sect. 1-7,12-15]  
5/3 Special instructions: Predicated instructions, SIMD [SW04],[SGL09],[K+10]  
12/3 Multicores
Lecture by Darius Sidlauskas
[FM10,AMD11,MHSM09,AKN12,CRSY10,YRV11] multicores



General papers about algorithm engineering

[J02] David S. Johnson. A Theoretician's Guide to the Experimental Analysis of Algorithms, in Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges. Editors: M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch. American Mathematical Society, 215-250, 2002.
[S09] Peter Sanders. Algorithm Engineering - An Attempt at a Definition, in Efficient Algorithms, Essays Dedicated to Kurt Mehlhorn on the Occasion of His 60th Birthday. Susanne Albers, Helmut Alt, Stefan Näher (Eds.). Lecture Notes in Computer Science, Volume 5760, Springer, 2009, ISBN 978-3-642-03455-8.
[M07] Catherine C. McGeoch. Experimental algorithmics, Communications of the ACM, 50(11), 27-31, November 2007.
[MM99] Catherine C. McGeoch and Bernard M. E. Moret. How to present a paper on experimental work with algorithms, ACM SIGACT News, 30(4), 85-90, December 1999.
[P88] Ian Parberry. How to Present a Paper in Theoretical Computer Science: A Speaker's Guide for Students. SIGACT News, 19(2), 42-47, 1988.


[B00] Jon Bentley. Programming Pearls, Second Edition. Addison-Wesley, Inc., 2000. 239+xi pages. ISBN 0-201-65788-0.
[M92] Catherine C. McGeoch. A Guide to Experimental Algorithmics, 272 pages, 2012, Cambridge University Press. ISBN: 9781107001732.

Memory layout

[FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache-Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012.
[BFJ02] Gerth Stølting Brodal, Rolf Fagerberg, Riko Jacob. Cache-Oblivious Search Trees via Binary Trees of Small Height. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 39-48, 2002.
[JM13] Tomasz Jurkiewicz, Kurt Mehlhorn. The cost of address translation, In Proc. 15th Annual Meeting on Algorithm Engineering & Experiments (ALENEX), 148-162, 2013.

Branch mispredictions

[KS06] Kanela Kaligosi, Peter Sanders. How Branch Mispredictions Affect Quicksort. Proc. 14th Annual European Symposium on Algorithms (ESA), Lecture Notes in Computer Science, Volume 4168, 780-791, Springer, 2006.
[BM05] Gerth Stølting Brodal, Gabriel Moruz. Tradeoffs Between Branch Mispredictions and Comparisons for Sorting Algorithms. Proc. 9th International Workshop on Algorithms and Data Structures (WADS), Lecture Notes in Computer Science, Volume 3608, 385-395, Springer, 2005.
[BFM05] Gerth Stølting Brodal, Rolf Fagerberg, Gabriel Moruz. On the Adaptiveness of Quicksort. In Proc. 7th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 130-140, 2005.
[BM06] Gerth Stølting Brodal, Gabriel Moruz. Skewed Binary Search Trees. In Proc. 14th Annual European Symposium on Algorithms (ESA), Lecture Notes in Computer Science, Volume 4168, 708-719, Springer, 2006.


[MMNS11] R.M. McConnell, K. Mehlhorn, S. Näher, P. Schweitzer. Certifying algorithms. Computer Science Review, 5(2), 119-161, 2011.

Special instructions

[SW04] P. Sanders and S. Winkel. Super Scalar Sample Sort. 12th Annual European Symposium (ESA), Lecture Notes in Computer Science, Volume 3221, 784-796, 2004.
[SGL09] Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner. k-ary search on modern processors. Proceedings of the Fifth International Workshop on Data Management on New Hardware (DaMoN), Pages 52-60, 2009.
[K+10] C. Kim, J. Chhugani, N. Satish, E. Sedlar, A.D. Nguyen, T. Kaldewey, V.W. Lee, S.A. Brandt, and P. Dubey. FAST: fast architecture sensitive tree search on modern CPUs and GPUs. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (SIGMOD), Pages 339-350, 2010.


[FM10] Samuel H. Fuller and Lynette I. Millett. The Future of Computing Performance: Game Over or Next Level? The National Academies Press, 2010.
[AMD11] AMD Showcases World's Fastest CPU. 2011.
[MHSM09] Daniel Molka, Daniel Hackenberg, Robert Schone, and Mathias S. Muller. Memory performance and cache coherency effects on an intel nehalem multiprocessor system. 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), 251-270, IEEE 2009.
[AKN12] Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. Proceedings of the VLDB Endowment, 5(10), 1064-1075, ACM 2012.
[CRSY10] John Cieslewicz, Kenneth A. Ross, Kyoho Satsumi, and Yang Ye. Automatic contention detection and amelioration for data-intensive operations. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 483-494, ACM2010.
[YRV11] Yang Ye, Kenneth A. Ross, and Norases Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware (DaMoN), 1-9, ACM 2011.


3 group projects (2-3 persons per group) and individual oral exam, 7-point grading scale, internal examiner.


The oral exam takes 20 minutes, including evaluation. The exam will be a discussion of the reports on the projects and the material covered in class (see "course plan" above). The final grade will be based on the project reports and the oral examination.

The oral exam will take place April 8-9, 2013 in Nygaard 298.