ALCOMFT-TR-01-159
|

|
Alain Denise, Mireille Régnier and Mathias Vandenbogaert
Assessing statistical significance of overrepresented oligonucleotides
INRIA.
Work packages 1 and 4.
June 2001.
Abstract: Assessing statistical significance of overrepresentation of exceptional words
is becoming an important task in computational biology. We show on
two problems how large deviation methodology applies.
First, when some oligomer H occurs more often
than expected, e.g. may be overrepresented,
large deviations allow for a very
efficient computation of the so-called p-value. The second problem
we address is the possible changes in the oligomers distribution
induced by the overrepresentation of some pattern.
Discarding this noise allows for
the detection of weaker signals. Related algorithmic and
complexity issues are discussed and compared to previous
results. The approach is illustrated with two typical examples of
applications on biological data.
Postscript file: ALCOMFT-TR-01-159.ps.gz (98 kb).
System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>