ALCOMFT-TR-01-39
|

|
J. Baixeries, G. Casas-Garriga and J.L. Balcázar
A Faster Algorithm for Finding Frequent Sets
Barcelona.
Work package 1.
April 2001.
Abstract: The association rule discovery problem consists of identifying
frequent itemsets in a database, and, then, forming conditional
implication rules from them. For this association task,
we introduce a new algorithm that reduces substantially both
the number of processed transactions and the total computing time.
Our algorithmic proposal relies on the philosophy of a current
state-of-the-art algorithm, Dynamic Itemset Counting (DIC),
which carries out an advanced incorporation of candidates
in order to reduce the number of passes through the data.
However, this advanced incorporation
of candidates only takes place at some previously specified
points during the run, which forces DIC to require an
extra parameter that balances the number of processed
transactions with a certain running time overhead.
We propose a way to incorporate candidates in a on-line fashion by
maintaining explicitly the so-called negative border of the current
maximal frequent sets at each step. The resulting algorithm,
Ready-and-Go, is competitive with previous
algorithmic proposals, beating them in terms of number of
processed transactions and, most importantly, in actual running
time. Furthermore, Ready-and-Go turns out to fit perfectly into
on-line sampling techniques, which make our algorithm still more efficient.
Additionally, in order to show its flexibility, we have
adapted it to mine generalized association rules and episodes
in sequences, obtaining good results.
Postscript file: ALCOMFT-TR-01-39.ps.gz (78 kb).
System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>