Origin & Evolution of Life

Synthetic DNA Sheds Light On Mysterious Difference Between Living Cells At Different Points In Evolution

By Keith Cowing
Press Release
NYU Langone Health / NYU Grossman School of Medicine
March 7, 2024
Filed under , , , , , ,
Synthetic DNA Sheds Light On Mysterious Difference Between Living Cells At Different Points In Evolution
a, Schematic illustrating the strategy of reversing the HPRT1 sequence to produce the HPRT1R sequence. b, The human HPRT1 locus was cloned into a assemblon vector and flanked by lox recombination sites for Big-IN integration. HPRT1R was assembled de novo from 28 synthetic segments, shown below the locus. Vector components include centromere (CEN)–autonomously replicating sequence (ARS) and LEU2 for propagation and selection in S. cerevisiae, bacterial artificial chromosome (BAC) oriS and oriV (low copy and inducible high copy origins, respectively) and the kanamycin resistance gene (Kanr) for propagation and selection in Escherichia coli, and eGFP–T2A–BSD for transient selection in mammalian cells. Chr., chromosome. c, Genomic contexts for interrogating synthetic locus activity. Episomal (Epi) and genomically integrated (chromosome XI) in S. cerevisiae, and genomically integrated (chromosome X and chromosome 3) in M. musculus. The chromosome 3 integration is monoallelic on the BL6 locus, leaving Sox2 intact on the CAST locus. d,e, DNA sequencing coverage plots from next-generation sequencing verification of assembled and integrated synthetic loci. Yeast samples were whole-genome sequenced and mouse ES cell samples were characterized by Capture-seq. Sc Epi, episomal in S. cerevisiae; Sc chr. XI, integrated on S. cerevisiae chromosome XI; Mm chr. X, integrated on M. musculus chromosome X; Mm chr. 3, integrated on M. musculus chromosome 3. (1) and (2) indicate two independent mouse ES cell clones. GC content shown as a line plot and colour-scaled. For HPRT1 Mm chr. X (1), dotted lines on the right show 2x coverage depth for most of the synthetic locus and 1x coverage depth at the edges. The relative position of the reversed HPRT1 coding sequence is indicated above the BAC in b and below the coverage plots in e.

“Random DNA” is naturally active in the one-celled fungi yeast, while such DNA is turned off as its natural state in mammalian cells, despite their having a common ancestor a billion years ago and the same basic molecular machinery, a new study finds.

The new finding revolves around the process by which DNA genetic instructions are converted first into a related material called RNA and then into proteins that make up the body’s structures and signals. In yeast, mice, and humans, the first step in a gene’s expression, transcription, proceeds as DNA molecular “letters” (nucleobases) are read in one direction. While 80% of the human genome – the complete set of DNA in our cells – is actively decoded into RNA, less than 2% actually codes for genes that direct the building of proteins.

A longstanding mystery in genomics then is what is all this non-gene-related transcription accomplishing. Is it just noise, a side effect of evolution, or does it have functions?

A research team at NYU Langone Health sought to answer the question by creating a large, synthetic gene, with its DNA code in reverse order from its natural parent. Then they put synthetic gene into yeast and mouse stem cells and watched transcription levels in each. Published online March 6 in the journal Nature, the new study reveals that in yeast the genetic system is set so that nearly all genes are continually transcribed, while the same “default state” in the mammalian cells is that transcription is turned off.

Interestingly, say the study authors, the reverse order of the code meant that all of the mechanisms that evolved in yeast and mammalian cells to turn transcription on or off were absent because the reversed code was nonsense. Like a mirror image, however, the reversed code reflected some basic patterns seen in the natural code in terms of how often DNA letters were present, what they fell near, and how often they were repeated. With the reversed code being 100,000 molecular letters long, the team found that it randomly included many small stretches of previously unknown code that likely started transcription much more often yeast, and stopped it in mammalian cells.

“Understanding default transcription differences across species will help us to better understand what parts of the genetic code have functions, and which are accidents of evolution,” said corresponding author Jef Boeke, PhD, the Sol and Judith Bergstein Director of the Institute for Systems Genetics at NYU Langone Health. “This in turn promises to guide the engineering of yeast to make new medicines, or create new gene therapies, or even to help us find new genes buried in the vast code.”

The work lends weight to the theory that yeast’s very active transcriptional state is set so that foreign DNA, rarely injected into yeast for instance by a virus as it copies itself, is likely to get transcribed into RNA. If that RNA builds a protein with a helpful function, the code will be preserved by evolution as a new gene. Unlike a single-celled organism in yeast, which can afford risky new genes that drive faster evolution, mammalian cells, as part of bodies with millions of cooperating cells, are less free to incorporate new DNA every time a cell encounters a virus. Many regulatory mechanisms protect the delicately balanced code as it is.


The new study had to account for the size of DNA chains, with 3 billion “letters” included in the human genome, and some genes being 2 million letters long. While famous techniques enable changes to be made letter by letter, some engineering tasks are more efficient if researchers build DNA from scratch, with far-flung changes made in large swaths of pre-assembled code swapped into a cell in place of its natural counterpart. Because human genes are so complex, Boeke’s lab first developed its “genome writing” approach in yeast, but then recently adapted it to the mammalian genetic code. The study authors use yeast cells to assemble long DNA sequences in a single step, and then deliver the them into mouse embryonic stem cells.

For the current study, the research team addressed the question on how pervasive transcription is across evolution by introducing a synthetic 101 kilobase stretch of engineered DNA – the human gene hypoxanthine phosphoribosyl transferase 1 (HPRT1) in reverse coding order. They observed widespread activity of the gene in yeast despite the lack in the nonsense code of promoters, DNA snippets that evolved to signal for the start of transcription.

Further, the team identified small sequences in the reversed code, repeated stretches of adenosine and thymine building blocks, known to be recognized by transcription factors, proteins that bind to DNA to initiate transcription. Just 5 to 15 letters long, such sequences could easily occur randomly and may partly explain the very active yeast default state, the authors said.

To the contrary, the same reversed code, inserted into the genome of a mouse embryonic stem cells, did not cause widespread transcription. In this scenario, transcription was repressed even though evolved CpG dinucleotides, known to actively shut down (silence) genes, were not functional in the reversed code. The team surmises that other basic elements in the mammalian genome may restrict transcription much more so than in yeast, and perhaps by directly recruiting a protein group (the polycomb complex) known to silence genes.

“The closer we get to introducing a ‘genome’s worth’ of nonsense DNA into living cells, the better they can compare it to the actual, evolved genome,” said first author Brendan Camellato, a graduate student in Boeke’s lab. “This could lead us to a new frontier of engineered cell therapies, as the capacity to put in ever longer synthetic DNAs enables better understanding of what insertions genomes will tolerate, and perhaps the inclusion of one or more larger, complete, engineered genes.”

Along with Boeke and Camellato, NYU Langone study authors were Ran Brosh, Hannah Ashe, and Matthew Maurano. The study was funded by the U.S. Department of Health & Human Services and by National Human Genome Research Institute (NHGRI) grant 1RM1HG009491.

Synthetic reversed sequences reveal default genomic states, Nature (open access)

Astrobiology, Genomics,

Explorers Club Fellow, ex-NASA Space Station Payload manager/space biologist, Away Teams, Journalist, Lapsed climber, Synaesthete, Na’Vi-Jedi-Freman-Buddhist-mix, ASL, Devon Island and Everest Base Camp veteran, (he/him) 🖖🏻