Data Mining

Data mining is a discipline at the intersection of machine learning and databases/data management. Data mining research concerns the analysis of typically large amoutns of data resources, to extract interesting patterns that can lead to new knowledge and insights. As opposed to Machine Learning, data mining targets unsupervised learning problems, where training data is not available or required. It focuses on the modeling, retrieval, and discovery of non-trivial patterns in data. A core tenet of data mining is to advance the efficiency of machine learning methods by leveraging smart data structures and algorithms. Data mining applies to a variety of data types, with particular emphasis on high-dimensional data, sequences, graphs, relational tables, heterogeneous data, and text.

Within data mining, we distinguish several key subfields:

Graph Mining: This area focuses on the analysis of networks and graph-structured data. Typical problems include the discovery of frequent or dense subgraphs, efficient computation of centrality measures, information propagation, influence maximization, node and edge classification, and multidimensional embeddings.
Sequence Mining: This subfield addresses pattern discovery and anomaly detection in sequential data, such as time series and data streams.
Text Mining: This area is concerned with extracting meaningful patterns from textual data, including tasks such as sentiment analysis, topic modeling, entity recognition, and information extraction.
Core Data Mining Algorithms: These include clustering, anomaly and outlier detection, dimensionality reduction, rule mining, and probabilistic modeling.

In our group, we have strong expertise in graph mining, including work on semantically rich graphs, clustering and outlier detection methods, and partial involvement in text and sequence mining. We work both at the conceptual level, introducing tasks and solutions that capture novel patterns or handle new data types or data challenges, but also at the algorithmic level, devising efficient and scalable solutions that support the analysis of large data volumes.

Social impact

Data mining has the ability to generate new knowledge, and support the exploration of new data. Given the growing interest in studying social networks, but also in learning methods in science, data mining is a powerful tool that supports these kinds of information needs. In particular, part of our work also concerns knowledge graphs, which serve as explicit information repositories that feed into other learning tasks.

Key publications

Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I. and Houle, M.E. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
(Data mining and knowledge discovery'16)

Müller, E., Günnemann, S., Assent, I. and Seidl, T.
Evaluating clustering in subspace projections of high dimensional data (VLDB Endowment'09)

Tsitsulin, A., Mottin, D., Karras, P., Bronstein, A. and Müller, E.
Netlsd: hearing the shape of a graph (SIGKDD'18)

Safavi, T., Belth, C., Faber, L., Mottin, D., Müller, E. and Koutra, D.
Personalized knowledge graph summarization: From the cloud to your pocket (ICDM'19)

Galhotra, S., Arora, A. and Roy, S.
Holistic influence maximization: Combining scalability and efficiency with opinion-aware models (SIGMOD'16)