Artificial Intelligence Aided Design Of Peptides With Custom Secondary Structure Motifs And Reduced Amino Acid Alphabets
Proteins are highly diverse functional polymers where the specific sequence of amino acids, selected from a standard genetically-encoded alphabet of twenty (C20), determines the structure and ultimately the function of the resulting folded protein.
This standard alphabet has been identified to be non-randomly distributed in physicochemical properties crucial to both structure-formation and function, often referred to as coverage theory.
While machine learning models have drastically improved protein structure prediction, protein design has yet to have similar development. Here we therefore bridge contemporary biological theory with recent advancements in artificial intelligence (AI) to develop and evaluate a generative AI protein design model, trained on hundreds of thousands of proteins within the RSCB PDB, for custom secondary structure motifs using reduced amino acid alphabets.
Results indicate an overall success in designing novel proteins with desired secondary structure motifs for a broad range of amino acid alphabets. Interestingly this tool often captures the full three-dimensional tertiary structure of a target protein despite training only on physicochemical sequence space and DSSP secondary structure.
The development of this model advances research across multiple disciplines, from general scientific AI/ML architecture development to protein design for biotechnology, astrobiology, and early-Earth evolutionary biology.

Major components of the bLSTMa encoder-decoder model architecture. Detailed architectures of the Encoder block, primarily made up of LSTM encoder layers and multi-head self-attention (bottom right) and model head, where the Decoder output is separately fed through a classifier and continuous value sequence to predict sequences and their associated properties (top right). — biorxiv.org
- Artificial Intelligence Aided Design Of Peptides With Custom Secondary Structure Motifs And Reduced Amino Acid Alphabets, PubMed
- Artificial intelligence aided design of peptides with custom secondary structure motifs and reduced amino acid alphabets, biorxiv.org
Astrobiology, Genomics,