Predicting Splicing from Primary Sequence with Deep Learning

Cell. 2019 Jan 24;176(3):535-548.e24. doi: 10.1016/j.cell.2018.12.015. Epub 2019 Jan 17.

Abstract

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.

Keywords: artificial intelligence; deep learning; genetics; splicing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alternative Splicing / genetics
  • Autistic Disorder / genetics
  • Deep Learning
  • Exons / genetics
  • Forecasting / methods*
  • Humans
  • Intellectual Disability / genetics
  • Introns / genetics
  • Neural Networks, Computer
  • RNA Precursors / genetics*
  • RNA Precursors / metabolism
  • RNA Splice Sites / genetics
  • RNA Splice Sites / physiology
  • RNA Splicing / genetics*

Substances

  • RNA Precursors
  • RNA Splice Sites