Focus points for the Algorithms and Data Structures group

The group works on improved algorithms and data structures for fundamental problems in classical computational models, as well as in newer models such as the cache-oblivious model that takes the hierarchical memory of modern machines into account.

It especially works on so-called I/O-efficient algorithms, that is, algorithms that are designed to efficiently process truly massive datasets that must reside on slow secondary storage devises. It also has a focus on algorithm engineering. In 2007 the group was awarded a Center of Excellence (Center for Massive Data Algorithms, or MADALGO) from the Danish National Research Foundation.

Full speed for the blocks

Algorithm’s leading boy wonders can be found on the third floor of the Nygaard Building and go by the names of Lars Arge and Gerth Stølting Brodal.

Lars Arge and Gerth Stølting Brodal use I/O algorithms to make computers perform calculations at unprecedented speeds. And quite unashamedly they term themselves world masters in this discipline.

The same is true of the research group Algorithms & Data Structures, which is also closely associated with Peter Bro Miltersen and Gudmund Skovbjerg Frandsen’s research into complexity theory and game theory.

What makes Professor Lars Arge and Reader Gerth Stølting Brodal leaders within their field is their development and use of mathematic models of computers where the machines have several layers of memory, which become slower the further away you are from the processor.

Limited memory
Many algorithms are based on the RAM model (where RAM stands for Random Access Machine), which were worked with in the old days and which are still very widespread.
One disadvantage of the RAM model is that it is based on an infinitely large working memory and the processor therefore uses a single timing unit to retrieve any conditional data from the memory and perform a computation.

However, real computers are typically equipped with four layers of memory: L1 cache, L2 cache, the working memory (RAM, as in Random Access Memory) and finally one or more hard discs. 
The cache and the RAM work a million times faster than the hard disc. As the cache is small and the RAM is very much larger the thinking behind Random Access Machine is often completely incorrect.

Smart sorting
“We are absolutely brilliant at designing algorithms that take the memory hierarchy into account.  You see our I/O (Input/Output) algorithms attempt to minimize the number of reads and writes on the hard disc.  We do this by the smart sorting of data on the hard disc, which forces the computer to only retrieve blocks of data that it will use in the current calculation process”, explains Lars Arge. 
You see the computer does not normally only retrieve the specific data that is requested, but rather blocks of data of 8, 16, 32 or 64 kilo bytes at a time. Therefore you typically get some data that will not be used, which just takes up space and you also have to spend time retrieving the other data.

Effective on all platforms
A new hot area for the group’s research is algorithms that minimize the number of reads and writes at all layers of the memory hierarchy. The goal is to progress to such an extent that the same algorithm is effective on all platforms – from the super computer to the mobile phone. 
“These types of algorithms are called "cache oblivious”, but for the time being the results are on a theoretical level.   Experiments will demonstrate whether the theory can be used in practice at all. The jury is still out, but we are hopeful”, says Gerth Stølting Brodal.

The work in the development of new models to replace Random Access Machine will continue. New types of memory are continually coming to the fore, such as flash memory, which requires new models and algorithms.

Mathematical theories and models are not everything. The duo are on the vanguard of algorithm engineering, where the algorithms are tested using real data in real hardware. Therefore Gerth Stølting Brodal is collaborating with the Bioinformatics Centre for algorithms for genome analysis, whilst Lars Arge is using algorithms for terrain modelling and monitoring of biodiversity.

Also read