Estimation of an inter-rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales

Carly A Bobak; Paul J Barr; A James O'Malley

doi:10.1186/s12874-018-0550-6

Estimation of an inter-rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales

BMC Med Res Methodol. 2018 Sep 12;18(1):93. doi: 10.1186/s12874-018-0550-6.

Authors

Carly A Bobak¹, Paul J Barr², A James O'Malley^{3

4}

Affiliations

¹ Department of Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, 1 Rope Ferry Road, Hanover, 03755, NH, USA.
² The Dartmouth Institute, Geisel School of Medicine, Dartmouth College, 1 Rope Ferry Road, Hanover, 03755, NH, USA.
³ Department of Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, 1 Rope Ferry Road, Hanover, 03755, NH, USA. James.OMalley@dartmouth.edu.
⁴ The Dartmouth Institute, Geisel School of Medicine, Dartmouth College, 1 Rope Ferry Road, Hanover, 03755, NH, USA. James.OMalley@dartmouth.edu.

Abstract

Background: Intraclass correlation coefficients (ICC) are recommended for the assessment of the reliability of measurement scales. However, the ICC is subject to a variety of statistical assumptions such as normality and stable variance, which are rarely considered in health applications.

Methods: A Bayesian approach using hierarchical regression and variance-function modeling is proposed to estimate the ICC with emphasis on accounting for heterogeneous variances across a measurement scale. As an application, we review the implementation of using an ICC to evaluate the reliability of Observer OPTION⁵, an instrument which used trained raters to evaluate the level of Shared Decision Making between clinicians and patients. The study used two raters to evaluate recordings of 311 clinical encounters across three studies to evaluate the impact of using a Personal Decision Aid over usual care. We particularly focus on deriving an estimate for the ICC when multiple studies are being considered as part of the data.

Results: The results demonstrate that ICC varies substantially across studies and patient-physician encounters within studies. Using the new framework we developed, the study-specific ICCs were estimated to be 0.821, 0.295, and 0.644. If the within- and between-encounter variances were assumed to be the same across studies, the estimated within-study ICC was 0.609. If heteroscedasticity is not properly adjusted for, the within-study ICC estimate was inflated to be as high as 0.640. Finally, if the data were pooled across studies without accounting for the variability between studies then ICC estimates were further inflated by approximately 0.02 while formerly allowing for between study variation in the ICC inflated its estimated value by approximately 0.066 to 0.072 depending on the model.

Conclusion: We demonstrated that misuse of the ICC statistics under common assumption violations leads to misleading and likely inflated estimates of interrater reliability. A statistical analysis that overcomes these violations by expanding the standard statistical model to account for them leads to estimates that are a better reflection of a measurement scale's reliability while maintaining ease of interpretation. Bayesian methods are particularly well suited to estimating the expanded statistical model.

Keywords: Bayesian analysis; Hierarchical regression; ICC; Observer OPTION5; Reliability; Shared decision making; Variance function modelling.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Bayes Theorem*
Data Collection / methods
Data Collection / statistics & numerical data
Data Interpretation, Statistical*
Decision Making
Humans
Models, Theoretical*
Outcome Assessment, Health Care / methods
Outcome Assessment, Health Care / statistics & numerical data*
Physician-Patient Relations