One-shot distributed algorithms for addressing heterogeneity in competing risks data across clinical sites

J Biomed Inform. 2024 Feb:150:104595. doi: 10.1016/j.jbi.2024.104595. Epub 2024 Jan 18.

Abstract

Objective: To characterize the interplay between multiple medical conditions across sites and account for the heterogeneity in patient population characteristics across sites within a distributed research network, we develop a one-shot algorithm that can efficiently utilize summary-level data from various institutions. By applying our proposed algorithm to a large pediatric cohort across four national Children's hospitals, we replicated a recently published prospective cohort, the RISK study, and quantified the impact of the risk factors associated with the penetrating or stricturing behaviors of pediatric Crohn's disease (PCD).

Methods: In this study, we introduce the ODACoRH algorithm, a one-shot distributed algorithm designed for the competing risks model with heterogeneity. Our approach considers the variability in baseline hazard functions of multiple endpoints of interest across different sites. To accomplish this, we build a surrogate likelihood function by combining patient-level data from the local site with aggregated data from other external sites. We validated our method through extensive simulation studies and replication of the RISK study to investigate the impact of risk factors on the PCD for adolescents and children from four children's hospitals within the PEDSnet, A National Pediatric Learning Health System. To evaluate our ODACoRH algorithm, we compared results from the ODACoRH algorithms with those from meta-analysis as well as those derived from the pooled data.

Results: The ODACoRH algorithm had the smallest relative bias to the gold standard method (-0.2%), outperforming the meta-analysis method (-11.4%). In the PCD association study, the estimated subdistribution hazard ratios obtained through the ODACoRH algorithms are identical on par with the results derived from pooled data, which demonstrates the high reliability of our federated learning algorithms. From a clinical standpoint, the identified risk factors for PCD align well with the RISK study published in the Lancet in 2017 and other published studies, supporting the validity of our findings.

Conclusion: With the ODACoRH algorithm, we demonstrate the capability of effectively integrating data from multiple sites in a decentralized data setting while accounting for between-site heterogeneity. Importantly, our study reveals several crucial clinical risk factors for PCD that merit further investigations.

Keywords: Communication-efficient; Competing risks model; Distributed research network; Federated learning; One-shot distributed algorithm.

Publication types

  • Meta-Analysis
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Algorithms*
  • Child
  • Computer Simulation
  • Humans
  • Likelihood Functions
  • Proportional Hazards Models
  • Reproducibility of Results