Abstract

The thesis explores an issue characterized as the problem of answering diverse and constrained top-k combination queries. Essentially a database-like diverse top-k query for sets of records also called combinations, and where the result is defined in part by constraints and in part by a score function.

A potentially emerging problem domain of combination queries is outlined in a minor survey of 8 related papers, and a general terminology for the domain is presented. A discussion on resultset quality in a decision making context results in three quality criteria being defined, a measure indicative of how dissimilar combinations are wrt. to individual records, a measure for how dissimilar the members are in relation to attribute values, and finally a measure for the average record value per combination.

As the problem turns out to be NP-hard by reduction to the Knapsack

Decision Problem, and adequate polynomial-time restrictions are not found, a second part of the thesis focuses primarily on result-set diversity. Randomization, and informed search are two of the main components in the following algorithmic discussion. Though three algorithms are presented none are found to be unconditionally preferable to the others as each have their advantages and disadvantages.

Various discussed algorithmic characteristics are finally illustrated by experimental evaluation.