Generative Design Of Novel Bacteriophages With Genome Language Models
Editor’s note: As we expand outward from Earth to other worlds we are almost certainly going to encounter things we did not expect to find – things that are unlikely or impossible on Earth. Life on other worlds may arise from a totally different set of chemical pathways than was the case on Earth. Or it may follow a very similar path. Or both. How do we estimate what could exist such that we are better prepared to search for the unexpected?
The genetic code of all life on our home world (with a few minor exceptions) is based on a genomic code consisting of 4 standard nucleotides. It is possible that Earth life may have once used a different assortment of letters in its instructional alphabet but the evidence suggests that the current code has been in use for quite some time. But the genomics of life on other worlds might be different and use a different genetic alphabet.
As we tinker with earthly genomics for commercial and health reasons we are discovering novel ways to tweak the standard genomic model to alter the outcome of a genetic sequence. This study using the Artificially Expanded Genetic Information Systems (AEGIS) shows that the pairing of non-standard nucleotides is at least possible. Whether the new sequences will work is another matter. But it does give us insight into how genetic sequences work and, by extension, how they might work elsewhere.
Out of about 300 phage genomes the scientists synthesized and tested in dishes full of E. coli, 16 were functional.
The experiment itself wasn’t dangerous, and designing “life” is a far heavier lift than the simple phage — a bacteria-infecting virus — that they created. Scientists used “Evo,” a generative AI model trained on the genomes of living things. Similar to how other AI large language models are trained on a massive corpus of text, the most advanced version of Evo ingested about 9 trillion letters of DNA from an atlas spanning all domains of life.
Many important biological functions arise not from single genes, but from complex interactions encoded by entire genomes.
Genome language models have emerged as a promising strategy for designing biological systems, but their ability to generate functional sequences at the scale of whole genomes has remained untested.
Here, we report the first generative design of viable bacteriophage genomes. We leveraged frontier genome language models, Evo 1 and Evo 2, to generate whole-genome sequences with realistic genetic architectures and desirable host tropism, using the lytic phage ΦX174 as our design template.
Experimental testing of AI-generated genomes yielded 16 viable phages with substantial evolutionary novelty. Cryo-electron microscopy revealed that one of the generated phages utilizes an evolutionarily distant DNA packaging protein within its capsid.
Multiple phages demonstrate higher fitness than ΦX174 in growth competitions and in their lysis kinetics. A cocktail of the generated phages rapidly overcomes ΦX174-resistance in three E. coli strains, demonstrating the potential utility of our approach for designing phage therapies against rapidly evolving bacterial pathogens.
This work provides a blueprint for the design of diverse synthetic bacteriophages and, more broadly, lays a foundation for the generative design of useful living systems at the genome scale.
Generative design of novel bacteriophages with genome language models, biorxiv.org (open access)
Astrobiology, genomics, SynBio, nanotechnology,