ALCOMFT-TR-02-6

ALCOM-FT
 

J Baixeries and G Casas-Garriga
Sampling Strategies for Finding Frequent Sets
Barcelona. Work package 1. April 2002.
Abstract: There are many deterministic algorithms that perform the task of finding frequent itemsets in a database. All of them depend somehow on the size of the database, and when this size increases, the computational cost of the algorithms becomes higher.

Sampling strategies are a way to overcome this problem. The classical probabilistic approaches for finding frequent sets have been using Chernoff bounds and equivalents to calculate the size of the sample, although the result always overestimates. One of the proposed alternatives is on-line sampling, which adaptatively calculates the size of the sample for each itemset. But this technique has been proved difficult to implement using the current algorithmic schemes for finding frequent itemsets.

In this paper we present different implementations of the on-line scheme using our previous best-first proposal, which meets the requirements needed by online sampling. We compare these proposals with the batch one. Our final approach shows a stable behaviour in several different situations.

Postscript file: ALCOMFT-TR-02-6.ps.gz (63 kb).

System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>