Rishi Kumar1,Anubhav Jain1
Lawrence Berkeley National Laboratory1
Rishi Kumar1,Anubhav Jain1
Lawrence Berkeley National Laboratory1
The advent of machine learning and laboratory automation in materials science has reinforced the need for a performant and flexible solution for materials data storage. Self-driving laboratories in particular require that data be available and malleable for any analysis required during online learning by an AI agent. We propose a graph-based database schema and associated python library for the entry and retrieval of experimental and computed materials data. In contrast to rigid table-based schemes, our graph approach accommodates the evolving workflows of a research setting. Furthermore, we keep the data structure similar to that generated by recent text-mining efforts to facilitate fusion of local data with the literature. This database was developed to support self-driving labs at Lawrence Berkeley National Lab executing diverse experiments ranging from spincoating to solid-state synthesis.<br/><br/>Our schema is designed to be lightweight and flexible while adhering to FAIR principles. Data is entered (either by operators or the self-driving lab codebase) from the perspective of the experimentalist as a directed acyclic graph (i.e. sequence) of nodes (i.e. steps) which process, measure, and analyze materials. Analysis is distinguished from measurement to establish extracted features as first-class components, as well as to enable multiple interpretations of the raw data. The directed graph structure naturally captures material flows within and across experiments; an experiment that uses a material generated by another experiment will generate an edge which connects the two graphs. Finally, the data can be filtered by any step of the experimental life cycle (e.g. by input material, by process variables, or by analysis results) to generate tabular datasets amenable to downstream analysis.<br/><br/>We conclude by demonstrating this schema for the storage and manipulation of data generated by self-driving laboratories at Lawrence Berkeley National Lab. In particular we show how text-mined recipes for solid-state synthesis are joined with experimental data in our database to inform experimental campaigns. While we show this in an automated laboratory setting, we stress that our schema has been kept lightweight to ease its adoption in traditional laboratory environments. The code to deploy this schema is publicly available on our GitHub repository.