ALCOMFT-TR-01-39

ALCOM-FT
 

J. Baixeries, G. Casas-Garriga and J.L. Balcázar
A Faster Algorithm for Finding Frequent Sets
Barcelona. Work package 1. April 2001.
Abstract: The association rule discovery problem consists of identifying frequent itemsets in a database, and, then, forming conditional implication rules from them. For this association task, we introduce a new algorithm that reduces substantially both the number of processed transactions and the total computing time. Our algorithmic proposal relies on the philosophy of a current state-of-the-art algorithm, Dynamic Itemset Counting (DIC), which carries out an advanced incorporation of candidates in order to reduce the number of passes through the data. However, this advanced incorporation of candidates only takes place at some previously specified points during the run, which forces DIC to require an extra parameter that balances the number of processed transactions with a certain running time overhead. We propose a way to incorporate candidates in a on-line fashion by maintaining explicitly the so-called negative border of the current maximal frequent sets at each step. The resulting algorithm, Ready-and-Go, is competitive with previous algorithmic proposals, beating them in terms of number of processed transactions and, most importantly, in actual running time. Furthermore, Ready-and-Go turns out to fit perfectly into on-line sampling techniques, which make our algorithm still more efficient. Additionally, in order to show its flexibility, we have adapted it to mine generalized association rules and episodes in sequences, obtaining good results.
Postscript file: ALCOMFT-TR-01-39.ps.gz (78 kb).

System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>