Aarhus University Seal

Algorithms, Data Structures and Foundations of Machine Learning


Focus points for the Algorithms, Data Structures and Foundations of Machine Learning group

The group works on a wide range of fundamental problems within algorithms, data structures and theory of machine learning. Particular focus areas include clustering, dimensionality reduction, learning theory, computational geometry and data structures. Common to all these, is that we strive to both design efficient algorithmic solutions, as well as complement them with theoretical lower bounds establishing their optimality.

The group also has a focus on algorithm engineering, and from 2007 to 2017, the group hosted a Center of Excellence (Center for Massive Data Algorithmics, or MADALGO) from the Danish National Research Foundation.


Research Directions

The research group consists of four faculty members as well as students and post docs. The faculty members are:

  • Peyman Afshani, working primarily on Computational Geometry, Data Structures, Lower Bounds and I/O-Algorithms 
  • Gerth Stølting Brodal, working primarily on Data Structures, I/O- and Cache-Oblivious Algorithms, Computational Geometry and Algorithm Engineering 
  • Kasper Green Larsen, working primarily on Data Structures, Lower Bounds, Dimensionality Reduction and Generalization Bounds 
  • Chris Schwiegelshohn, working primarily on Approximation Algorithms, Coresets, Clustering and Dimensionality Reduction. 

In the following, we have highlighted some of these research directions.
 

Computational Geometry

The area of computational geometry builds the theoretical and practical foundations of dealing with geometric data. There are diverse sources for geometric data (GIS, computer graphics, and databases among others) and thus, the discipline has broad applications. The research in the group focuses on particular aspects of computational geometry, namely, geometric data structures. It deals with fundamental questions such as how to process, store, or query geometric data, and what are provably the best possible solutions.

Data Structures

The different ways of organization data can highly influence the efficiency of algorithms accessing the data. The goal of data structure design is to develop structures that support efficient queries and updates while using limited space, i.e., memory. Unfortunately, many data structure problems have intrinsic lower bounds preventing, e.g., both efficient queries and efficient updates, i.e., there are trade-offs between the different operations. Such trade-offs can be between both query time and update time or between query time and the space used to store the data. Central to the group's research in data structures, is to identify such trade-offs by designing data structures achieving good performance, and complementing these upper bounds with formal arguments proving lower bounds for what is the best one can aim for – and ideally achieving matching upper and lower bounds.

Clustering

Clustering is one of the most important methods in unsupervised data analysis. The aim is to group a set of points such that points within the same cluster are similar, and points in different clusters are dissimilar. Computing high quality clusterings is generally considered to be hard for various reasons: the optimization problem is intractable, the algorithms are not scalable and the data sets are large and high dimensional.

To address this, we are designing a variety of algorithms suited to address clustering problems in the modern era. We focus both on the task of recovering or approximating good clusterings, as well as designing algorithms uniquely suited to large data sets, often based on compressing techniques such as coresets or dimension reduction techniques such as random projections.

Dimensionality Reduction

When processing massive data sets, an important technique for speeding up algorithms is dimensionality reduction. Put simply, such techniques allow one to greatly compress data points while approximately preserving the structure of the data set. This leads to savings in both memory consumption and running time. Understanding the possible tradeoffs between compression ratio and approximation is a strong focus point of the group's research.


In Memoriam - Professor Lars Arge

Professor Lars Arge passed away on December 23rd, 2020, at the age of just 53. 

His passing is a great loss to the Department of Computer Science, Aarhus University, as well as to us personally as colleagues who have benefitted from Lars’s excellent and internationally renowned research contributions within Algorithms and Data Structures, as well as his many ideas on how to build a world-class department

Colleagues and collaborators have shared their memories. >>>>