Aarhus University Seal

Special seminar by Emmanuel Müller, Karlsruhe Institute of Technology, Germany

Knowledge Discovery: Assisting Humans in Understanding Complex Databases

Info about event

Time

Tuesday 10 March 2015,  at 14:15 - 15:00

Location

5342-333 Ada

Organizer

Department of Computer Science

In my opinion, knowledge discovery, as part of many scientific and industrial applications, does not end with the execution of algorithms. With data mining algorithms, resulting in discovery of unknown, novel, and unexpected patterns, one should aim at assisting humans in their daily decision making. In order to enable this, I observe two fundamental research challenges in data management and knowledge discovery: (1) heterogeneous data sources and (2) results of data mining algorithms that are hard to comprehend by human users.

One can illustrate these challenges in an example: A system for patient surveillance provides multivariate measurements for each patient. However, for the automatic detection of regular patterns and unexpected events there is too much information available, with some measurements being irrelevant. Algorithms should automatically decide about the relevance of information and scale with the provided complexity of the data sources. Further, it is essential in medical diagnosis to verify all events by health professionals. Therefore, one should not only detect patterns but algorithms should also be able to describe patterns, and hence, assist health professionals in their decisions.

In my talk I will address theoretic challenges in correlation analysis, database schema extraction, and pattern mining as well as practical challenges in efficient computation of these models in large database systems. In particular I will present novel techniques for heterogeneous data spaces that are applicable to a variety of heterogeneous data types and overcome the information loss of traditional techniques on homogeneous data sources. As exemplary technique I will illustrate the selection of relevant attribute combinations in high dimensional databases and give an outlook to correlation analysis in multivariate data streams and homophile structures in attributed graphs.