AI - Data - Apps - Cybernetics

The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data

By Keith Cowing

Status Report

astro-ph.IM

December 6, 2024

Filed under AI, App, astro-ph.IM, astronomy, astrophysics, data, dataset, Machine learning, Multimodal Universe

The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data — Illustration of the main modalities included in the Multimodal Universe, along with typical associated machine learning tasks. In addition, the Multimodal Universe also includes a small amount of hyperspectral images and tabular data, not shown here. — astro-ph.IM

We present the MULTIMODAL UNIVERSE, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research.

Overall, the MULTIMODAL UNIVERSE contains hundreds of millions of astronomical observations, constituting 100TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated scientific measurements and “metadata”.

In addition, we include a range of benchmark tasks representative of standard practices for machine learning methods in astrophysics.

This massive dataset will enable the development of large multi-modal models specifically targeted towards scientific applications. All codes used to compile the MULTIMODAL UNIVERSE and a description of how to access the data is available at https://github.com/MultimodalUniverse/MultimodalUniverse

The Multimodal Universe Collaboration. Eirini Angeloudi, Jeroen Audenaert, Micah Bowles, Benjamin M. Boyd, David Chemaly, Brian Cherinka, Ioana Ciucă, Miles Cranmer, Aaron Do, Matthew Grayling, Erin E. Hayes, Tom Hehir, Shirley Ho, Marc Huertas-Company, Kartheik G. Iyer, Maja Jablonska, Francois Lanusse, Henry W. Leung, Kaisey Mandel, Juan Rafael Martínez-Galarza, Peter Melchior, Lucas Meyer, Liam H. Parker, Helen Qu, Jeff Shen, Michael J. Smith, Connor Stone, Mike Walmsley, John F. Wu

Comments: Accepted at NeurIPS Datasets and Benchmarks track
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Astrophysics of Galaxies (astro-ph.GA); Solar and Stellar Astrophysics (astro-ph.SR)
Cite as: arXiv:2412.02527 [astro-ph.IM] (or arXiv:2412.02527v1 [astro-ph.IM] for this version)
https://doi.org/10.48550/arXiv.2412.02527
Focus to learn more
Submission history
From: Marc Huertas-Company
[v1] Tue, 3 Dec 2024 16:21:17 UTC (6,981 KB)
https://arxiv.org/abs/2412.02527

Astrobiology,

Keith Cowing

Explorers Club Fellow, ex-NASA Space Station Payload manager/space biologist, Away Teams, Journalist, Lapsed climber, Synaesthete, Na’Vi-Jedi-Freman-Buddhist-mix, ASL, Devon Island and Everest Base Camp veteran, (he/him) 🖖🏻

Follow on Twitter