Microbiology & Virology

Structure-based Inference Of Eukaryotic Complexity In Asgard Archaea

By Keith Cowing
Press Release
July 9, 2024
Filed under , , , , , , , , ,
Structure-based Inference Of Eukaryotic Complexity In Asgard Archaea
Modeling the Asgard archaeal structural pangenome. (A) Number of Asgard archaeal draft genomes per group in the database used for pangenome-wide structural analyses. Fill color indicates publicly available genomes (grey) and newly added Asgard archaeal draft genomes (blue), respectively. (B) Protein sequence clustering into existing Asgard COGs and de novo clustering with unassigned5 proteins. X-axis indicates the number of proteins and y-axis the number of respective clusters. Fill indicates protein sequences from publicly available genomes (grey) and added Asgard archaeal draft genomes (blue), respectively. (C) Workflow for the pangenome-wide prediction of Asgard archaeal protein structures. (D) Scatter plot depicting pLDDT scores of structure predictions of 100 randomly selected ‘Candidatus Prometheoarchaeum syntrophicum’ proteins computed with the default (x-axis) and the Asgard-enriched10 (y-axis) ColabFold database, respectively. The diagonal black line indicates x = y, purple line indicates linear correlation fitted to the data. (E) Distribution of average pLDDT scores of 37,223 predicted Asgard archaeal protein structures. MSA, multiple sequence alignment. — biorxiv.org

Asgard archaea played a key role in the origin of the eukaryotic cell. While previous studies found that Asgard genomes encode diverse eukaryotic signature proteins (ESPs), representing homologs of proteins that play important roles in the complex organization of eukaryotic cells, the cellular characteristics and complexity of the Asgard archaeal ancestor of eukaryotes remain unclear.

Here, we used de novo protein structure modeling and sensitive sequence similarity detection algorithms within an expanded Asgard archaeal genomic dataset to build a structural catalogue of the Asgard archaeal pangenome and identify 908 new isomorphic ESPs (iESPs), representing clusters of protein structures most similar to eukaryotic proteins and that likely underwent extensive sequence divergence.

While most previously identified ESPs were involved in cellular processes and signaling, iESPs are enriched in information storage and processing functions, with several being potentially implicated in facilitating cellular complexity.

By expanding the complement of eukaryotic proteins in Asgard archaea, this study indicates that the archaeal ancestor of eukaryotes was more complex than previously assumed.

Structure-guided identification of functionally diverse iESP structural clusters. (A) Workflow to cluster protein structures and identify iESPs. (B) Identification of Asgard archaeal iESPs based on structural similarity. (C) Bar chart summarizing the clustering of previously described ESP and iESP protein structures into structural clusters, respectively. (D) Sankey diagram displaying functional categories of newly identified iESPs clusters and clusters containing previously established ESPs. Categories are inferred from the best SwissProt hits EggNOG annotation. ‘Multiple’ indicates an association of a structural cluster with multiple functional categories. (E) Subgraph of protein structure similarity network, highlighting small GTPase (black outline) and Argonaute proteins. P, prob ability. — biorxiv.org

Asgard archaeal protein complexes implicating cellular compartmentalization. Asgard archaeal proteins related to eukaryotic (A-C) MVPs and (D-F) COMMD-containing proteins. (A) Phylogeny of prokaryotic and eukaryotic full-length MVPs. See Fig. S4A for tree based only on the shoulder domain. (B) Rat MVP complex (45) next to Lokiarchaeial MVP (predicted structure) indicating the cap helix, shoulder, and repeat domains (R). (C) Biological assembly of the rat MVP cap (left) next to5 a multimer model of the Asgard archaeal homodecamer (right). (D) Human COMMD2 next to Lokiarchaeial homolog indicating the HN and COMM domains. (E) Phylogeny of prokaryotic and eukaryotic COMMD-containing proteins. (F) Resolved human COMMD heterodecamer (46) next to a multimer model of the Asgard archaeal homodecamer. (G, H) Identification of Asgard archaeal iESPs of eukaryotic ubiquitin fold modifier 1 (G) and cyclin-dependent kinase 2-interacting protein (Hodarchaeales10 clade indicated with grey background) (H). Asgard archaeal query protein structure, best-scoring SwissProt target structural model and phylogenetic analysis of related protein sequences are indicated in the left, middle and right panel, respectively. Structural models exclude long terminal disordered regions. Additional data include Foldseek E-value, Dali Z-score, enrichment of eukaryotic structures (Fisher’s exact test, Bonferroni-corrected p-value, ‘p-EukEnr’), and amino-acid identity to best structure hit (‘AA-15 identity’). Phylogenetic analyses highlight sequences for query and target structures, input MSA positions, and substitution model. Scale bar: 1 amino acid substitution per position. Multimer model confidence measures (pLDDT, pTM, ipTM) are indicated. — biorxiv.org

Structure-based inference of eukaryotic complexity in Asgard archaea, biorxiv.org


Explorers Club Fellow, ex-NASA Space Station Payload manager/space biologist, Away Teams, Journalist, Lapsed climber, Synaesthete, Na’Vi-Jedi-Freman-Buddhist-mix, ASL, Devon Island and Everest Base Camp veteran, (he/him) 🖖🏻