Genomics, Proteomics, Bioinformatics

GLARE: Discovering Hidden Patterns In Spaceflight Transcriptome Using Representation Learning

By Keith Cowing

Status Report

biorxiv.org

June 9, 2024

Filed under Arabidopsis, biorxiv.org, CARA, datasets, genomics, GLARE, Machine learning, Microgravity, NASA GeneLab, omics, Transcriptome

GLARE: Discovering Hidden Patterns In Spaceflight Transcriptome Using Representation Learning — Overall pipeline of GLARE: Gene LAb Representation learning pipelinE. (a) Illustration of GLARE, starting with a verification study followed by preprocessing through detecting outliers using k-means clustering. Using the clean dataset, GLARE provides options for representation learning from PCA to state-of-the-art SAE pre-trained with high-throughput single-cell data. Retrieved data representation is then processed through ensemble clustering to find the hidden patterns within the data. Results from the verification study and ensemble clustering are then used for post-pipeline analysis. (b) Model architecture illustration of employed SAE for both training with and without pre-training. (c) Ensemble clustering using three base clustering algorithms based on different statistical methodologies. Evidence accumulation clustering is used to derive consensus clusters from these algorithms. — biorxiv.org

Spaceflight studies present novel insights into biological processes through exposure to stressors outside the evolutionary path of terrestrial organisms. Despite limited access to space environments, numerous transcriptomic datasets from spaceflight experiments are now available through NASA GeneLab data repository, which allows public access to these datasets, encouraging further analysis.

While various computational pipelines and methods have been used to process these transcriptomic datasets, learning-model-driven analyses have yet to be applied to a broad array of such spaceflight-related datasets.

In this study, we propose an open-source framework, GLARE: GeneLAb Representation learning pipelinE, which consists of training different representation learning approaches from manifold learning to self-supervised learning that enhances the performance of downstream analytical tasks such as pattern recognition. We illustrate the utility of GLARE by applying it to gene-level transcriptional values from the results of the CARA spaceflight experiment, an Arabidopsis root tip transcriptome dataset that spanned light, dark, and microgravity treatments.

We show that GLARE not only substantiated the findings of the original study concerning cell wall remodeling but also revealed additional patterns of gene expression affected by the treatments, including evidence of hypoxia. This work suggests there is great potential to supplement the insights drawn from initial studies on spaceflight omics-level data through further machine-learning-enabled analyses.

Analysis of hypoxia cluster found in FLT clustering result. (a) Heatmap of normalized FPKM values on hypoxia cluster. (b) Enriched ontology on hypoxia cluster from Metascape (c) Stress Knowledge Map (SKM) on five Transcription Factors (TFs) in hypoxia cluster: ‘DREB2A’, ‘RHL41 / ZAT12’, ‘MYC2’, ‘RRTF1 / ERF109’, and ‘STZ / ZAT10’. — biorxiv.org

GLARE: Discovering Hidden Patterns In Spaceflight Transcriptome Using Representation Learning, biorxiv.org

Astrobiology, Genomics,

Keith Cowing

Explorers Club Fellow, ex-NASA Space Station Payload manager/space biologist, Away Teams, Journalist, Lapsed climber, Synaesthete, Na’Vi-Jedi-Freman-Buddhist-mix, ASL, Devon Island and Everest Base Camp veteran, (he/him) 🖖🏻

Follow on Twitter