Aarhus University Seal / Aarhus Universitets segl

Special talk by Chris Schwiegelshohn on Dimension Reduction and Compression for Clustering

2020.01.27 | Søs Küster Markussen

Date Mon 10 Feb
Time 10:00 11:00
Location 5342-333 Åbogade 34, 8200 Aarhus N


Dimension Reduction and Compression for Clustering


One of the biggest challenges of modern algorithm design is the increasing volume of data. In the past, polynomial time algorithms were generally considered to be efficient. Nowadays, even a quadratic running time may considered infeasible. We address these issues by compressing the data sets in a preprocessing step before running a conventional algorithm. Popular paradigms for such compressions are coresets, sketches, and sparsifiers. In this talk, we will present current developments for large-scale k-means clustering. The k-means problem consists of finding a set of k centers such that the sum of squared Euclidean distances of every point to its closest center is minimized. As with many problems, k-means suffers from the curse of dimensionality in high dimensions. Dimension reduction for k-means provides some recourse, with previous bounds either achieving O(log n) dimensions via the Johnson Lindenstrauss lemma, or O(k) via Principal Component Analysis. Recently, these bounds were improved to O(log k). We will survey these results and outline future directions.

CS frontpage, Featured, Public/media