ALCOMFT-TR-01-145

ALCOM-FT
 

Arun Jagota, Rune Bang Lyngsø and Christian Nørgaard Storm Pedersen
Comparing Stochastic Models
Århus. Work package 4. June 2001.
Abstract: Stochastic models are commonly used in bioinformatics, e.g. hidden Markov models for modeling sequence families or stochastic context-free grammars for modeling RNA secondary structure formation. Comparing data is a common task in bioinformatics, and it is thus natural to consider how to compare stochastic models. In this paper we present the first study of the problem of comparing a hidden Markov model and a stochastic context-free grammar. We describe how to compute their co-emission - or collision - probability, i.e. the probability that they independently generate the same sequence. We also consider the related problem of finding a run through a hidden Markov model and derivation in a grammar that generate the same sequence and have maximal joint probability by a generalization of the CYK algorithm for parsing a sequence by a stochastic context-free grammar. We illustrate the methods by an experiment on RNA secondary structures.
Postscript file: ALCOMFT-TR-01-145.ps.gz (158 kb).

System maintainer Gerth Stølting Brodal <gerth@cs.au.dk>