A unifying network modeling approach for codon optimization

Bioinformatics. 2022 Aug 10;38(16):3935-3941. doi: 10.1093/bioinformatics/btac428.

Abstract

Motivation: Synthesizing genes to be expressed in other organisms is an essential tool in biotechnology. While the many-to-one mapping from codons to amino acids makes the genetic code degenerate, codon usage in a particular organism is not random either. This bias in codon use may have a remarkable effect on the level of gene expression. A number of measures have been developed to quantify a given codon sequence's strength to express a gene in a host organism. Codon optimization aims to find a codon sequence that will optimize one or more of these measures. Efficient computational approaches are needed since the possible number of codon sequences grows exponentially as the number of amino acids increases.

Results: We develop a unifying modeling approach for codon optimization. With our mathematical formulations based on graph/network representations of amino acid sequences, any combination of measures can be optimized in the same framework by finding a path satisfying additional limitations in an acyclic layered network. We tested our approach on bi-objectives commonly used in the literature, namely, Codon Pair Bias versus Codon Adaptation Index and Relative Codon Pair Bias versus Relative Codon Bias. However, our framework is general enough to handle any number of objectives concurrently with certain restrictions or preferences on the use of specific nucleotide sequences. We implemented our models using Python's Gurobi interface and showed the efficacy of our approach even for the largest proteins available. We also provided experimentation showing that highly expressed genes have objective values close to the optimized values in the bi-objective codon design problem.

Availability and implementation: http://alpersen.bilkent.edu.tr/NetworkCodon.zip.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Amino Acid Sequence
  • Amino Acids*
  • Codon
  • Genetic Code*

Substances

  • Codon
  • Amino Acids