Data mining automatically extracts interesting patterns from data to provide insight into large data volumes. Clustering is a data mining task that groups data based on mutual similarity to find prevailing structures in the data. Density-based clustering defines clusters as dense areas in feature space separated by sparsely populated areas. It is known to successfully identify clusters of arbitrary shapes even in noisy data.
Today, we face increasingly high-dimensional data, i.e. data objects described by many attributes. The "curse of dimensionality" for clustering means that in high-dimensional spaces, traditional clustering methods fail to identify meaningful clusters. In little more than a decade, the research field of subspace clustering has established methods for identifying clusters in subsets of the attributes in such high-dimensional spaces. As the number of possible subsets is exponential in the number of attributes, efficient algorithms are crucial. The talk presents models and algorithms for density-based subspace clustering.