A fast likelihood approach for estimation of large phylogenies from continuous trait data

Mol Phylogenet Evol. 2021 Aug:161:107142. doi: 10.1016/j.ympev.2021.107142. Epub 2021 Mar 11.

Abstract

Despite the recent availability of large-scale genomic data for many individuals, few methods for phylogenetic inference are both computationally efficient and highly accurate for trees with hundreds of taxa. Model-based methods such as those developed in the maximum likelihood and Bayesian frameworks are especially time-consuming, as they involve both computationally intensive calculations on fixed phylogenies and searches through the space of possible phylogenies, and they are known to scale poorly with the addition of taxa. Here, we propose a fast approximation to the maximum likelihood estimator that directly uses continuous trait data, such as allele frequency data. The approximation works by first computing the maximum likelihood estimates of some internal branch lengths, and then inferring the tree-topology using these estimates. Our approach is more computationally efficient than existing methods for such data while still achieving comparable accuracy. This method is innovative in its use of the mathematical properties of tree-topologies for inference, and thus serves as a useful addition to the collection of methods available for estimating phylogenies from continuous trait data.

Keywords: Allele frequency; Brownian motion model; Continuous trait data; Likelihood; Phylogeny.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Gene Frequency
  • Humans
  • Likelihood Functions*
  • Phenotype
  • Phylogeny*
  • Reproducibility of Results
  • Research Design