Shyue Ping Ong1,Ji Qi1,Tsz Wai Ko1
University of California, San Diego1
Shyue Ping Ong1,Ji Qi1,Tsz Wai Ko1
University of California, San Diego1
The biggest bottleneck to machine learning (ML) for materials science is the generation of training data. In this talk, I will discuss various approaches to efficiently generate and use materials data to develop ML models. For instance, I will demonstrate the use of universal interatomic potentials to pre-generate a large configuration space of structures, as well as a DImensionality Rduced Encoded Clusters with sTratified (DIRECT) sampling approach to create a robust training set for an ML interatomic potential (MLIP). I will also discuss the application of multi-fidelity techniques to maximize the return on scarce, high quality data. While a major focus of this talk will be on MLIP development, I will also highlight the applicability of these techniques to other ML-enabled applications.