Optimal clustering for detecting near-native conformations in protein docking

Biophys J. 2005 Aug;89(2):867-75. doi: 10.1529/biophysj.104.058768. Epub 2005 May 20.

Abstract

Clustering is one of the most powerful tools in computational biology. The conventional wisdom is that events that occur in clusters are probably not random. In protein docking, the underlying principle is that clustering occurs because long-range electrostatic and/or desolvation forces steer the proteins to a low free-energy attractor at the binding region. Something similar occurs in the docking of small molecules, although in this case shorter-range van der Waals forces play a more critical role. Based on the above, we have developed two different clustering strategies to predict docked conformations based on the clustering properties of a uniform sampling of low free-energy protein-protein and protein-small molecule complexes. We report on significant improvements in the automated prediction and discrimination of docked conformations by using the cluster size and consensus as a ranking criterion. We show that the success of clustering depends on identifying the appropriate clustering radius of the system. The clustering radius for protein-protein complexes is consistent with the range of the electrostatics and desolvation free energies (i.e., between 4 and 9 Angstroms); for protein-small molecule docking, the radius is set by van der Waals interactions (i.e., at approximately 2 Angstroms). Without any a priori information, a simple analysis of the histogram of distance separations between the set of docked conformations can evaluate the clustering properties of the data set. Clustering is observed when the histogram is bimodal. Data clustering is optimal if one chooses the clustering radius to be the minimum after the first peak of the bimodal distribution. We show that using this optimal radius further improves the discrimination of near-native complex structures.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Binding Sites
  • Cluster Analysis
  • Computer Simulation
  • Models, Chemical*
  • Models, Molecular*
  • Molecular Sequence Data
  • Multiprotein Complexes / analysis
  • Multiprotein Complexes / chemistry
  • Protein Binding
  • Protein Conformation
  • Protein Interaction Mapping / methods*
  • Proteins / analysis
  • Proteins / chemistry*
  • Proteins / classification*
  • Sequence Analysis, Protein / methods*

Substances

  • Multiprotein Complexes
  • Proteins