Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses

George Rosenberger; Isabell Bludau; Uwe Schmitt; Moritz Heusel; Christie L Hunter; Yansheng Liu; Michael J MacCoss; Brendan X MacLean; Alexey I Nesvizhskii; Patrick G A Pedrioli; Lukas Reiter; Hannes L Röst; Stephen Tate; Ying S Ting; Ben C Collins; Ruedi Aebersold

doi:10.1038/nmeth.4398

Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses

Nat Methods. 2017 Sep;14(9):921-927. doi: 10.1038/nmeth.4398. Epub 2017 Aug 21.

Authors

George Rosenberger^{1

2}, Isabell Bludau^{1

2}, Uwe Schmitt³, Moritz Heusel^{1

4}, Christie L Hunter⁵, Yansheng Liu¹, Michael J MacCoss⁶, Brendan X MacLean⁶, Alexey I Nesvizhskii^{7

8}, Patrick G A Pedrioli¹, Lukas Reiter⁹, Hannes L Röst¹, Stephen Tate¹⁰, Ying S Ting⁶, Ben C Collins¹, Ruedi Aebersold^{1

11}

Affiliations

¹ Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
² PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland.
³ ID Scientific IT Services, ETH Zurich, Zurich, Switzerland.
⁴ PhD program in Molecular and Translational Biomedicine, Competence Center Personalized Medicine (CC-PM), ETH Zurich and University of Zurich, Zurich, Switzerland.
⁵ SCIEX, Redwood City, California, USA.
⁶ Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
⁷ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.
⁸ Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.
⁹ Biognosys, Schlieren, Switzerland.
¹⁰ SCIEX, Concord, Ontario, Canada.
¹¹ Faculty of Science, University of Zurich, Zurich, Switzerland.

Abstract

Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, as exemplified by the technique SWATH-MS, has emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale data sets. We demonstrate that statistical concepts developed for discovery proteomics based on spectrum-centric scoring can be adapted to large-scale DIA experiments that have been analyzed with peptide-centric scoring strategies, and we provide guidance on their application. We show that optimal tradeoffs between sensitivity and specificity require careful considerations of the relationship between proteins in the samples and proteins represented in the spectral library. We propose the application of a global analyte constraint to prevent the accumulation of false positives across large-scale data sets. Furthermore, to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported for the detected peptide queries, peptides and inferred proteins.

MeSH terms

Computer Simulation
Data Interpretation, Statistical*
High-Throughput Screening Assays / methods*
Mass Spectrometry / methods*
Models, Statistical
Peptide Mapping / methods*
Proteins / analysis
Proteins / chemistry*
Reproducibility of Results
Sensitivity and Specificity
Sequence Analysis, Protein / methods*

Substances

Proteins

Abstract

MeSH terms

Substances

Grants and funding