Two-sample test for correlated data under outcome-dependent sampling with an application to self-reported weight loss data

Yi Cai; Jing Huang; Jing Ning; Mei-Ling Ting Lee; Bernard Rosner; Yong Chen

doi:10.1002/sim.8346

Two-sample test for correlated data under outcome-dependent sampling with an application to self-reported weight loss data

Stat Med. 2019 Nov 10;38(25):4999-5009. doi: 10.1002/sim.8346. Epub 2019 Sep 5.

Authors

Yi Cai¹, Jing Huang², Jing Ning³, Mei-Ling Ting Lee⁴, Bernard Rosner^{5

6}, Yong Chen²

Affiliations

¹ AT&T Services, Inc, Plano, Texas.
² Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania.
³ Department of Statistical Science, Cornell University, Ithaca, New York.
⁴ Department of Epidemiology and Biostatistics, The University of Maryland School of Public Health, College Park, Maryland.
⁵ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts.
⁶ Channing Laboratory, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.

Abstract

Standard methods for two-sample tests such as the t-test and Wilcoxon rank sum test may lead to incorrect type I errors when applied to longitudinal or clustered data. Recent alternatives of two-sample tests for clustered data often require certain assumptions on the correlation structure and/or noninformative cluster size. In this paper, based on a novel pseudolikelihood for correlated data, we propose a score test without knowledge of the correlation structure or assuming data missingness at random. The proposed score test can capture differences in the mean and variance between two groups simultaneously. We use projection theory to derive the limiting distribution of the test statistic, in which the covariance matrix can be empirically estimated. We conduct simulation studies to evaluate the proposed test and compare it with existing methods. To illustrate the usefulness proposed test, we use it to compare self-reported weight loss data in a friends' referral group, with the data from the Internet self-joining group.

Keywords: U-statistics; correlated data; outcome-dependent sampling; pseudolikelihood; two-sample test.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Biometry / methods*
Cluster Analysis
Computer Simulation
Humans
Internet
Longitudinal Studies
Self Report*
Weight Loss*

Abstract

Publication types

MeSH terms

Grants and funding