Biosignatures & Paleobiology

On The Salient Limitations Of The Methods Of Assembly Theory And their Classification Of Molecular Biosignatures

By Keith Cowing
Press Release
October 13, 2022
Filed under ,
On The Salient Limitations Of The Methods Of Assembly Theory And their Classification Of Molecular Biosignatures
Correlation plot between ‘Molecular Assembly’ (MA) and Compression Algorithms. The strongest positive correlation was identified between MA and 1D-RLE compression (R= 0.9001), which is one of the most basic compression schemes and among the most similar to the original definition of MA. Other compression algorithms, including the Huffman coding (R = 0.896), also show a strong positive correlation with MA. As seen, the compression values of both 1D-RLE and 1D-Huffman coding show overlapping and nearly identical medians (horizontal line at center) and ranges on the whisker plot. Our analysis reveals the similarity in behaviour of MA and popular statistical lossless compression algorithms that are based on the same counting principles

A recently introduced approach termed “Assembly Theory”, featuring a computable index based on basic principles of statistical compression has been claimed to be a novel and superior approach to classifying and distinguishing living from non-living systems and the complexity of molecular biosignatures.

Here, we demonstrate that the assembly pathway method underlying this index is a suboptimal restricted version of Huffman’s encoding (Shannon-Fano type), widely adopted in computer science in the 1950s, that is comparable (or inferior) to other popular statistical and computable compression schemes. We show how simple modular instructions can mislead the assembly index, leading to failure to capture subtleties beyond trivial statistical properties that are not realistic in biological systems.

We present cases whose low complexities can arbitrarily diverge from the random-like appearance to which the assembly pathway method would assign arbitrarily high statistical significance, and show that it fails in simple cases (synthetic or natural). Our theoretical and empirical results imply that the assembly index, whose computable nature we show is not an advantage, does not offer any substantial advantage over existing concepts and methods computable or uncomputable. Alternatives are discussed.

Abicumaran Uthamacumaran, Felipe S. Abrahão, Narsis A. Kiani, Hector Zenil

Comments: 32 pages with the appendix, 3 figures
Subjects: Information Theory (cs.IT)
Cite as: arXiv:2210.00901 [cs.IT] (or arXiv:2210.00901v2 [cs.IT] for this version)
Focus to learn more
Submission history
From: Hector Zenil
[v1] Fri, 30 Sep 2022 11:19:53 UTC (1,113 KB)
[v2] Sun, 9 Oct 2022 00:33:31 UTC (557 KB)

Explorers Club Fellow, ex-NASA Space Station Payload manager/space biologist, Away Teams, Journalist, Lapsed climber, Synaesthete, Na’Vi-Jedi-Freman-Buddhist-mix, ASL, Devon Island and Everest Base Camp veteran, (he/him) 🖖🏻