ALCOMFT-TR-03-98
|

|
Mireille Régnier and Alain Denise
Rare Events and Conditional Events on Random Strings
INRIA.
Work packages 1 and 4.
November 2003.
Abstract: Some strings -the texts- are assumed to be randomly generated,
according to a probability model that is either a Bernoulli model or a
Markov model. A rare event is the over or under-representation of a
word or a set of words. The aim of this paper is twofold. First, a
single word is given. We study the tail distribution of the number
of its occurrences. Sharp large deviation estimates are derived.
Second, we assume that a given word is
overrepresented. The conditional distribution of a second word is studied;
formulae for the expectation and the variance are derived. In both
cases, the formulae are precise and can be computed efficiently. These
results have applications in computational biology, where a genome is
viewed as a text.
Postscript file: ALCOMFT-TR-03-98.ps.gz (95 kb).
System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>