ALCOMFT-TR-02-121

ALCOM-FT
 

G. Casas-Garriga
A Multi-Step Pruning Strategy to Remove Uninteresting Association Rules
Barcelona. Work packages 1 and 4. May 2002.
Abstract: Association rule mining is a relevant task in Knowledge Discovery in Databases that aims at finding the group of items that most frequently appear with other groups of items in the database. Nowadays, many algorithmic strategies have been developed to discover all those association rules satisfying a minimum support constraint. However, the number of patterns extracted by these algorithms is often very large, leading to a rule quality problem, that is, out of the overwhelming number of discovered association rules only a small part of them are interesting.

This rule quality problem is currently addressed by specifying certain interestingness measures that are used as a single-step prune. The main problem of this solution is that not all measures are equally good at capturing the usefulness of a rule, and thus, after this pruning phase, some useless rules can still remain or also some useful rules can be removed permanently.

In this paper we investigate which properties the interestingness measures should keep in the last pruning phase. Later, we show how this prune turns out to be incomplete when only a single measure is chosen, and we introduce the proposal of a multiple-step pruning strategy that removes all useless rules and summarizes just the interesting ones. Specifically, we develop the working hypothesis that the Pearson coefficient is appropiate to guide the comparisons. The idea behind the strategy is keeping, at each pruning step, the good properties of the chosen measure, and avoiding, at the same time, its drawbacks. Finally, several experiments have been performed to show the viability and advantages of our method.

Postscript file: ALCOMFT-TR-02-121.ps.gz (156 kb).

System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>