Mutation rate heterogeneity at the sub-gene scale due to local DNA hypomethylation

Nucleic Acids Res. 2024 May 8;52(8):4393-4408. doi: 10.1093/nar/gkae252.

Abstract

Local mutation rates in human are highly heterogeneous, with known variability at the scale of megabase-sized chromosomal domains, and, on the other extreme, at the scale of oligonucleotides. The intermediate, kilobase-scale heterogeneity in mutation risk is less well characterized. Here, by analyzing thousands of somatic genomes, we studied mutation risk gradients along gene bodies, representing a genomic scale spanning roughly 1-10 kb, hypothesizing that different mutational mechanisms are differently distributed across gene segments. The main heterogeneity concerns several kilobases at the transcription start site and further downstream into 5' ends of gene bodies; these are commonly hypomutated with several mutational signatures, most prominently the ubiquitous C > T changes at CpG dinucleotides. The width and shape of this mutational coldspot at 5' gene ends is variable across genes, and corresponds to variable interval of lowered DNA methylation depending on gene activity level and regulation. Such hypomutated loci, at 5' gene ends or elsewhere, correspond to DNA hypomethylation that can associate with various landmarks, including intragenic enhancers, Polycomb-marked regions, or chromatin loop anchor points. Tissue-specific DNA hypomethylation begets tissue-specific local hypomutation. Of note, direction of mutation risk is inverted for AID/APOBEC3 cytosine deaminase activity, whose signatures are enriched in hypomethylated regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • CpG Islands*
  • DNA Methylation*
  • Genetic Heterogeneity
  • Genome, Human
  • Humans
  • Mutation
  • Mutation Rate*
  • Transcription Initiation Site