Exoplanet Host Star Classification: Multi-Objective Optimisation Of Incomplete Stellar Abundance Data
The presence of a planetary companion around its host star has been repeatedly linked with stellar properties, affecting the likelihood of sub-stellar object formation and stability in the protoplanetary disc, thus presenting a key challenge in exoplanet science.
Furthermore, abundance and stellar parameter datasets tend to be incomplete, which limits the ability to infer distributional characteristics harnessing the entire dataset. This work aims to develop a methodology using machine learning and multi-objective optimisation for reliable imputation for subsequent comparison tests and host star recommendation. It integrates fuzzy clustering for imputation and ML classification of hosts and comparison stars into an evolutionary multi-objective optimisation algorithm.
We test several candidates for the classification model, starting with a binary classification for giant planet hosts. Upon confirmation that the XGBoost algorithm provides the best performance, we interpret the performance of both the imputation and classification modules for binary classification. The model is extended to handle multi-label classification for low-mass planets and planet multiplicity. Constraints on the model’s use and feature/sample selection are given, outlining strengths and limitations.
We conclude that the careful use of this technique for host star recommendation will be an asset to future missions and the compilation of necessary target lists.
Schematic for the chromosome encoding within the GA design. The specific configuration for both the imputation and classification model are represented as a string of genes within the chromosome. The set of imputation genes consists of the clustering hyperparameters and the coordinates of all designated cluster centres. The classification genes will be values used to build and define the classification model, and therefore depend on whichever model is being utilised in that particular run. Within the classification module, schematic solid directional lines represent paths which are present in both the binary and multi-label variations of the design, while the dashed line is used to represent those present only in the multi-label modification. — RAS Techniques and Instruments
Exoplanet host star classification: Multi-Objective Optimisation of incomplete stellar abundance data, RAS Techniques and Instruments (open access)
Astrobiology