When is it safe to use an oversimplified substitution model in tree-making?

A Rzhetsky; T Sitnikova

doi:10.1093/oxfordjournals.molbev.a025691

When is it safe to use an oversimplified substitution model in tree-making?

Mol Biol Evol. 1996 Nov;13(9):1255-65. doi: 10.1093/oxfordjournals.molbev.a025691.

Authors

A Rzhetsky¹, T Sitnikova

Affiliation

¹ Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park 16802, USA.

PMID: 8896378
DOI: 10.1093/oxfordjournals.molbev.a025691

Abstract

The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree-making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Base Sequence
Computer Simulation
Models, Biological*
Models, Genetic
Models, Theoretical
Phylogeny*