Aarhus University Seal


Visual Analytics of Data Errors” looks for better ways to identify and analyze data quality issues through the use of Mixed-Initiative Analytics

A core problem in data-driven science and fact-based decision making is poor data quality. The standard solution is to fight it with automated error detection to rid the data of its errors. Yet, the VADE project proposes a different approach: instead of fighting erroneous data, it should be systematically mined and analyzed. The project aims to embrace the erroneous data as a source for important information on the underlying problems of the data management pipeline that should be collected and reported. The fundamental challenge is to cope with the unspecific nature of errors that does not follow a given schema or definition and that in most cases cannot be found through a database query or standard statistics. This requires new ways of dealing with errors that go beyond what pure statistics and machine learning can provide.

To address this challenge, VADE follows the principle of mixed-initiative analytics that combines the computational power of modern IT with the knowledge of domain experts. This allows the human user to gauge the erroneousness of data and to parametrize their inclusion in the analysis of errors. This mixed-initiative approach is heavily facilitated by data visualization, which provides the interactive interface between the computer and the analyst: computational results are added to the visualization by the computer, while the analyst uses the visualization to trigger, steer, and configure computations.

VADE advocates for appreciating and handling erroneous data as helpful indicators of problems in the data management practices underlying the data. By showing this is not only possible, but also beneficial, VADE will pave the way for establishing errors as a first-class data property with their own computation and visualization methods.

Funding source: Aarhus University Research Fund