Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses

Nat Methods. 2017 Sep;14(9):921-927. doi: 10.1038/nmeth.4398. Epub 2017 Aug 21.

Abstract

Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, as exemplified by the technique SWATH-MS, has emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale data sets. We demonstrate that statistical concepts developed for discovery proteomics based on spectrum-centric scoring can be adapted to large-scale DIA experiments that have been analyzed with peptide-centric scoring strategies, and we provide guidance on their application. We show that optimal tradeoffs between sensitivity and specificity require careful considerations of the relationship between proteins in the samples and proteins represented in the spectral library. We propose the application of a global analyte constraint to prevent the accumulation of false positives across large-scale data sets. Furthermore, to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported for the detected peptide queries, peptides and inferred proteins.

MeSH terms

  • Computer Simulation
  • Data Interpretation, Statistical*
  • High-Throughput Screening Assays / methods*
  • Mass Spectrometry / methods*
  • Models, Statistical
  • Peptide Mapping / methods*
  • Proteins / analysis
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins