Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome

Bioinformatics. 2001 Oct;17(10):988-96. doi: 10.1093/bioinformatics/17.10.988.

Abstract

Motivation: Current growth in the field of genomics has provided a number of exciting approaches to the modeling of evolutionary mechanisms within the genome. Separately, dynamical and statistical analyses of networks such as the World Wide Web and the social interactions existing between humans have shown that these networks can exhibit common fractal properties-including the property of being scale-free. This work attempts to bridge these two fields and demonstrate that the fractal properties of molecular networks are linked to the fractal properties of their underlying genomes.

Results: We suggest a stochastic model capable of describing the evolutionary growth of metabolic or signal-transduction networks. This model generates networks that share important statistical properties (so-called scale-free behavior) with real molecular networks. In particular, the frequency of vertices connected to exactly k other vertices follows a power-law distribution. The shape of this distribution remains invariant to changes in network scale: a small subgraph has the same distribution as the complete graph from which it is derived. Furthermore, the model correctly predicts that the frequencies of distinct DNA and protein domains also follow a power-law distribution. Finally, the model leads to a simple equation linking the total number of different DNA and protein domains in a genome with both the total number of genes and the overall network topology.

Availability: MatLab (MathWorks, Inc.) programs described in this manuscript are available on request from the authors.

Contact: ar345@columbia.edu.

MeSH terms

  • Biological Evolution
  • Computational Biology
  • DNA / genetics*
  • Databases, Nucleic Acid / statistics & numerical data
  • Databases, Protein / statistics & numerical data
  • Fractals
  • Genomics / statistics & numerical data*
  • Humans
  • Models, Genetic*
  • Proteins / genetics*
  • Stochastic Processes

Substances

  • Proteins
  • DNA