Waste not, want not: revisiting the analysis that called into question the practice of rarefaction

mSphere. 2024 Jan 30;9(1):e0035523. doi: 10.1128/msphere.00355-23. Epub 2023 Dec 6.

Abstract

In 2014, McMurdie and Holmes published the provocatively titled "Waste not, want not: why rarefying microbiome data is inadmissible." The claims of their study have significantly altered how microbiome researchers control for the unavoidable uneven sequencing depths that are inherent in modern 16S rRNA gene sequencing. Confusion over the distinction between the definitions of rarefying and rarefaction continues to cloud the interpretation of their results. More importantly, the authors made a variety of problematic choices when designing and analyzing their simulations. I identified 11 factors that could have compromised the results of the original study. I reproduced the original simulation results and assessed the impact of those factors on the underlying conclusion that rarefying data is inadmissible. Throughout, the design of the original study made choices that caused rarefying and rarefaction to appear to perform worse than they truly did. Most important were the approaches used to assess ecological distances, the removal of samples with low sequencing depth, and not accounting for conditions where sequencing effort is confounded with treatment group. Although the original study criticized rarefying for the arbitrary removal of valid data, repeatedly rarefying data many times (i.e., rarefaction) incorporates all the data. In contrast, it is the removal of rare taxa that would appear to remove valid data. Overall, I show that rarefaction is the most robust approach to control for uneven sequencing effort when considered across a variety of alpha and beta diversity metrics.IMPORTANCEOver the past 10 years, the best method for normalizing the sequencing depth of samples characterized by 16S rRNA gene sequencing has been contentious. An often cited article by McMurdie and Holmes forcefully argued that rarefying the number of sequence counts was "inadmissible" and should not be employed. However, I identified a number of problems with the design of their simulations and analysis that compromised their results. In fact, when I reproduced and expanded upon their analysis, it was clear that rarefaction was actually the most robust approach for controlling for uneven sequencing effort across samples. Rarefaction limits the rate of falsely detecting and rejecting differences between treatment groups. Far from being "inadmissible", rarefaction is a valuable tool for analyzing microbiome sequence data.

Keywords: 16S rRNA gene seqeuncing; amplicon sequencing; bioinformatics; microbial ecology; microbiome.

MeSH terms

  • Computer Simulation
  • High-Throughput Nucleotide Sequencing / methods
  • Microbiota* / genetics
  • RNA, Ribosomal, 16S / genetics
  • Sequence Analysis, DNA / methods

Substances

  • RNA, Ribosomal, 16S