ALCOMFT-TR-02-41

ALCOM-FT
 

Stefan Burkhardt and Juha K\"arkk\"ainen
One-gapped q-Gram Filters for Levenshtein Distance
MPI. Work packages 1, 4 and 5. May 2002.
Abstract: We have recently shown that q-gram filters based on gapped q-grams instead of the usual contiguous q-grams can provide orders of magnitude faster and/or more efficient filtering for the Hamming distance. In this paper, we extend the results for the Levenshtein distance, which is more problematic for gapped q-grams because an insertion or deletion in a gap affects a q-gram while a replacement does not. To keep this effect under control, we concentrate on gapped q-grams with just one gap. We demostrate with experiments that the resulting filters provide a significant improvement over the contiguous q-gram filters. We also develop new techniques for dealing with complex q-gram filters.
Postscript file: ALCOMFT-TR-02-41.ps.gz (83 kb).

System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>