Double sampling with multiple imputation to answer large sample meta-research questions: introduction and illustration by evaluating adherence to two simple CONSORT guidelines

Patrice L Capers; Andrew W Brown; John A Dawson; David B Allison

doi:10.3389/fnut.2015.00006

Double sampling with multiple imputation to answer large sample meta-research questions: introduction and illustration by evaluating adherence to two simple CONSORT guidelines

Front Nutr. 2015 Mar 9:2:6. doi: 10.3389/fnut.2015.00006. eCollection 2015.

Authors

Patrice L Capers¹, Andrew W Brown¹, John A Dawson², David B Allison³

Affiliations

¹ Office of Energetics and Nutrition Obesity Research Center, School of Public Health, University of Alabama at Birmingham , Birmingham, AL , USA.
² Office of Energetics and Nutrition Obesity Research Center, School of Public Health, University of Alabama at Birmingham , Birmingham, AL , USA ; Section on Statistical Genetics, University of Alabama at Birmingham , Birmingham, AL , USA.
³ Office of Energetics and Nutrition Obesity Research Center, School of Public Health, University of Alabama at Birmingham , Birmingham, AL , USA ; Section on Statistical Genetics, University of Alabama at Birmingham , Birmingham, AL , USA ; Department of Nutrition Sciences, University of Alabama at Birmingham , Birmingham, AL , USA ; Department of Biostatistics, University of Alabama at Birmingham , Birmingham, AL , USA.

Abstract

Background: Meta-research can involve manual retrieval and evaluation of research, which is resource intensive. Creation of high throughput methods (e.g., search heuristics, crowdsourcing) has improved feasibility of large meta-research questions, but possibly at the cost of accuracy.

Objective: To evaluate the use of double sampling combined with multiple imputation (DS + MI) to address meta-research questions, using as an example adherence of PubMed entries to two simple consolidated standards of reporting trials guidelines for titles and abstracts.

Methods: For the DS large sample, we retrieved all PubMed entries satisfying the filters: RCT, human, abstract available, and English language (n = 322, 107). For the DS subsample, we randomly sampled 500 entries from the large sample. The large sample was evaluated with a lower rigor, higher throughput (RLOTHI) method using search heuristics, while the subsample was evaluated using a higher rigor, lower throughput (RHITLO) human rating method. Multiple imputation of the missing-completely at-random RHITLO data for the large sample was informed by: RHITLO data from the subsample; RLOTHI data from the large sample; whether a study was an RCT; and country and year of publication.

Results: The RHITLO and RLOTHI methods in the subsample largely agreed (phi coefficients: title = 1.00, abstract = 0.92). Compliance with abstract and title criteria has increased over time, with non-US countries improving more rapidly. DS + MI logistic regression estimates were more precise than subsample estimates (e.g., 95% CI for change in title and abstract compliance by year: subsample RHITLO 1.050-1.174 vs. DS + MI 1.082-1.151). As evidence of improved accuracy, DS + MI coefficient estimates were closer to RHITLO than the large sample RLOTHI.

Conclusion: Our results support our hypothesis that DS + MI would result in improved precision and accuracy. This method is flexible and may provide a practical way to examine large corpora of literature.

Keywords: CONSORT; adherence; double sampling; meta-research; modeling; multiple imputation.

Abstract

Grants and funding