Adam Krajewski1,Zi-Kui Liu1
The Pennsylvania State University1
Adam Krajewski1,Zi-Kui Liu1
The Pennsylvania State University1
The quality of materials design has always been dependent on the availability and quality of starting data. In recent years, advances in machine learning further complicated the task of merging data from many sources into a useful, homogeneous structure. In this work, we show an implementation of a data ecosystem that alleviates many of the commonly found challenges.<br/><br/>ULTERA database, developed under the ARPA-E's ULTIMATE program, is designed to automatically integrate data coming from many methods, such as literature extraction (manual and NLP), generative modeling, predictive modeling, experimental or computational validations, as well as many sources such as project members and industry partners. Merging of the data is done in real-time, fully automatically, on the cloud, allowing any project component to operate on the best available dataset. Thus, at any given time, generative modeling is done on the best starting dataset, and experiments/simulations can be run on the most likely candidate materials. This inherently manages collaboration between research groups on the project, reducing delays in sending the data. Furthermore, in addition to dynamically presenting the best candidates, such a data processing approach allows for efficient curation of the dataset by identifying and verifying the most abnormal data.