ALCOMFT-TR-02-41
|

|
Stefan Burkhardt and Juha K\"arkk\"ainen
One-gapped q-Gram Filters for Levenshtein Distance
MPI.
Work packages 1, 4 and 5.
May 2002.
Abstract: We have recently shown that q-gram filters based on gapped q-grams
instead of the usual contiguous q-grams can provide orders of
magnitude faster and/or more efficient filtering for the Hamming
distance. In this paper, we extend the results for the Levenshtein
distance, which is more problematic for gapped q-grams because an
insertion or deletion in a gap affects a q-gram while a
replacement does not. To keep this effect under control, we
concentrate on gapped q-grams with just one gap. We demostrate with
experiments that the resulting filters provide a significant
improvement over the contiguous q-gram filters. We also develop new
techniques for dealing with complex q-gram filters.
Postscript file: ALCOMFT-TR-02-41.ps.gz (83 kb).
System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>