ALCOMFT-TR-03-199
|

|
Gemma Casas-Garriga
Statistical Strategies for Pruning All the Uninteresting Association Rules
Barcelona.
Work packages 1 and 4.
December 2003.
Abstract: We propose a general framework
to describe formally the problem of capturing the intensity
of implication for association rules through statistical metrics.
In this framework we present properties that influence the
interestingness of a rule,
analyze the conditions that lead a measure to perform a perfect prune at
a time, and define a final proper order to sort the surviving
rules. We will discuss
why none of the currently employed measures can capture objective interestingness, and
just the combination of some of them, in a multi-step fashion, can be
reliable. In contrast, we propose a new simple modification
of the Pearson coefficient that will meet all the necessary requirements. We
statistically infer the convenient cut-off threshold for this new metric
by empirically describing its distribution function through simulation.
Final experiments serve to show the ability of our proposal.
Postscript file: ALCOMFT-TR-03-199.ps.gz (134 kb).
System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>