Inter-rater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa

Br J Dermatol. 2019 Sep;181(3):483-491. doi: 10.1111/bjd.17716. Epub 2019 Jun 6.

Abstract

Background: Monitoring disease activity over time is a prerequisite for clinical practice and research. Valid and reliable outcome measurement instruments (OMIs) and staging systems provide researchers and clinicians with benchmark tools to assess the primary and secondary outcomes of interventional trials and to guide treatment selection properly.

Objectives: To investigate inter-rater reliability and agreement in instruments currently used in hidradenitis suppurativa (HS), with dermatologists experienced in HS as the rater population of interest.

Methods: In a prospective completely balanced design, 24 patients with HS underwent a physical examination by 12 raters (288 assessments) using nine instruments. The results were analysed using generalized linear mixed models.

Results: For the staging systems, the study found good inter-rater reliability for Hurley staging in the axillae and gluteal region, moderate inter-rater reliability for Hurley staging in the groin and for Physician's Global Assessment, and fair inter-rater reliability for refined Hurley staging and the International HS Severity Scoring System. For all the tested OMIs, the observed intervals for limits of agreement were very wide relative to the ranges of the scales.

Conclusions: The very wide intervals for limits of agreement imply that substantial changes are needed in clinical research in order to rule out measurement error. The results illustrate a difficulty, even for experienced HS experts, to agree on the type and number of lesions when evaluating disease severity. The apparent caveats call for global efforts, such as the HIdradenitis SuppuraTiva cORe outcomes set International Collaboration (HISTORIC) to reach consensus on how best to measure physical signs of HS reliably in randomized trials. What's already known about this topic? Without valid and reliable instruments to measure outcomes, researchers and clinicians lack the necessary benchmarks to assess primary and secondary end points of interventional trials properly. Hidradenitis suppurativa (HS) is a chronic inflammatory skin disease. Several outcome measure instruments exist for HS, but their validation is generally incomplete or of relatively low methodological quality. What does this study add? Using a prospective completely balanced design this study examined inter-rater reliability with HS-experienced dermatologists as the rater population of interest. The study did not find very good reliability for any included instrument or lesion counts. This study illustrates the difficulty in finding agreement on the type and number of HS lesions, even among experts. The results question whether physical signs are best measured by a traditional physician lesion count instrument. What are the clinical implications of this work? For staging, Hurley staging and physician global visual analogue scale proved to be acceptable instruments in terms of inter-rater reliability. For the instruments designed to measure changes in health status, our study illustrates how difficult it is, even for experts, to measure the physical signs of HS using a simple rater counting. Consequently, other assessment methods of physicals signs, such as ultrasound evaluation, require consideration.

MeSH terms

  • Adult
  • Female
  • Hidradenitis Suppurativa / diagnosis*
  • Hidradenitis Suppurativa / therapy
  • Humans
  • Male
  • Middle Aged
  • Observer Variation
  • Patient Outcome Assessment*
  • Prospective Studies
  • Randomized Controlled Trials as Topic
  • Reproducibility of Results
  • Severity of Illness Index*