Data mining is a discipline at the intersection of machine learning and databases/data management. Data mining research concerns the analysis of typically large amoutns of data resources, to extract interesting patterns that can lead to new knowledge and insights. As opposed to Machine Learning, data mining targets unsupervised learning problems, where training data is not available or required. It focuses on the modeling, retrieval, and discovery of non-trivial patterns in data. A core tenet of data mining is to advance the efficiency of machine learning methods by leveraging smart data structures and algorithms. Data mining applies to a variety of data types, with particular emphasis on high-dimensional data, sequences, graphs, relational tables, heterogeneous data, and text.
Within data mining, we distinguish several key subfields:
In our group, we have strong expertise in graph mining, including work on semantically rich graphs, clustering and outlier detection methods, and partial involvement in text and sequence mining. We work both at the conceptual level, introducing tasks and solutions that capture novel patterns or handle new data types or data challenges, but also at the algorithmic level, devising efficient and scalable solutions that support the analysis of large data volumes.
Data mining has the ability to generate new knowledge, and support the exploration of new data. Given the growing interest in studying social networks, but also in learning methods in science, data mining is a powerful tool that supports these kinds of information needs. In particular, part of our work also concerns knowledge graphs, which serve as explicit information repositories that feed into other learning tasks.
Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I. and Houle, M.E. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
(Data mining and knowledge discovery'16)
Müller, E., Günnemann, S., Assent, I. and Seidl, T.
Evaluating clustering in subspace projections of high dimensional data (VLDB Endowment'09)
Tsitsulin, A., Mottin, D., Karras, P., Bronstein, A. and Müller, E.
Netlsd: hearing the shape of a graph (SIGKDD'18)
Safavi, T., Belth, C., Faber, L., Mottin, D., Müller, E. and Koutra, D.
Personalized knowledge graph summarization: From the cloud to your pocket (ICDM'19)
Galhotra, S., Arora, A. and Roy, S.
Holistic influence maximization: Combining scalability and efficiency with opinion-aware models (SIGMOD'16)