Tricorder Tech: Machine Learning To Scan For Signs Of Extraterrestrial Life
A machine learning framework can distinguish molecules made by biological processes from those formed through non-biological processes and could be used to analyze samples returned by current and future planetary missions.
José C. Aponte, Amirali Aghazadeh, and colleagues analyzed eight carbonaceous meteorites and ten terrestrial geologic samples using two-dimensional gas chromatography coupled with high-resolution time-of-flight mass spectrometry. Using this data, the authors developed LifeTracer, a computational framework that processes mass spectrometry data and applies machine learning to identify patterns distinguishing abiotic from biotic origins.

Visualization of the distribution of compounds in meteoritic samples and terrestrial geologic samples and the regression coefficients of the logistic regression model trained in LifeTracer. — NASA/PNAS NEXUS
A logistic regression model trained on compound-level features achieved over 87% accuracy in classifying samples as meteoritic or terrestrial. The analysis identified 9,475 peaks in meteorite samples and 9,070 in terrestrial samples, with statistically significant differences between the two sample types in molecular weight distributions and retention times, which describes how long it takes the compound to move through the chromatograph’s two columns. Organic compounds in meteorite samples showed significantly lower retention times, consistent with higher volatility in abiotically formed materials.
The framework identified polycyclic aromatic hydrocarbons and alkylated variants as key predictive features, with naphthalene emerging as the most predictive compound for abiotic samples. According to the authors, the approach enables scalable, unbiased biosignature detection and could be a powerful tool for interpreting complex organic mixtures that will be returned by current and future planetary sample return missions.

The LifeTracer workflow for collecting, curating, and analyzing the mass spectrometry data and developing a machine learning model for classifying samples. A) The soluble nonpolar and semipolar organics in 8 meteorites and 10 terrestrial geologic samples were analyzed using untargeted 2D gas chromatography coupled to high-resolution time-of-flight mass spectrometry ( ), resulting in total ion images (TIIs) in four dimensions corresponding to the mass-over-charge ratio (m/z), retention time in the first column (RT1), retention time in the second column (RT2), and intensity (abundance). This illustration shows the workflow for Meteorite 1 (Aguas Zarcas) and Earth Sample 1 (Iceland soil), with distinct peaks at and 102 amu, respectively. The device shown as a cartoon schematic to illustrate the instrument layout. B) High-intensity peaks in TIIs are extracted. Peaks may represent fragment ions originating from the same parent compound. C) Peaks are clustered and tabulated with rows representing features and columns representing samples. Black and gray squares indicate the presence or absence of features, respectively. In this illustration, the squares marked as A and B correspond to the peaks at m/z = 162 and 102 amu in Aguas Zarcas and Iceland soil samples. D) A logistic regression model is trained on the processed data to classify samples into the abiotic and biotic classes based on the composition of their organic compounds. Features with large regression coefficients are analyzed to identify the organic compounds that play a key role in distinguishing between biotic and abiotic samples. We manually analyzed the fragmentation patterns and exact masses in comparison to standards to determine the identity or candidate molecule type for each discriminative compound discovered by LifeTracer.– NASA/PNAS NEXUS
Discriminating abiotic and biotic organics in meteorite and terrestrial samples using machine learning on mass spectrometry data, PNAS NEXUS (open access)
Astrobiology,