John Gregoire1,Joel Haber1,Dan Guevarra1,Lan Zhou1,Kevin Kan1,Ryan Jones1,Yungchieh Lai1,Ja'Nya Breeden1,Michael Statt2,Brian Rohr2,Santosh Suram3
California Institute of Technology1,Modelyst LLC2,Toyota Research Institute3
John Gregoire1,Joel Haber1,Dan Guevarra1,Lan Zhou1,Kevin Kan1,Ryan Jones1,Yungchieh Lai1,Ja'Nya Breeden1,Michael Statt2,Brian Rohr2,Santosh Suram3
California Institute of Technology1,Modelyst LLC2,Toyota Research Institute3
In the quest to accelerate materials discovery via experiment automation and artificial intelligence, we recognize the challenges in emulating human capabilities with respect to contextualizing data and rapidly adapting experiments based on real-time data streams. In the development of infrastructure for next-generation workflows, these aspects of traditional research are most tightly connected to instrument control software and the management of experimental data. We will describe the evolution of these capabilities at Caltech, from automated workflows focused on throughput and consistency to workflows that embrace modularity and responsiveness to new knowledge, where techniques for this latter approach are being developed collaboratively with Modelyst, Inc. and Toyota Research Institute. The lessons learned with respect to data management may be the most generalizable to the materials chemistry community, especially our development of Event-Sourced Architecture for Materials Provenance Management (ESAMP) and the Materials Experiment Knowledge Graph (MekG), which addresses the hierarchical nature of materials data. Regarding representation of materials data, high-level descriptors can be provided by the chemical elements, crystal structure motifs, and types of materials properties, and ultimately a given piece of data must be considered in the context of its acquisition. Detailed descriptors of a piece of experimental data include not only the metadata for the experiment that generated it, but also the prior history of synthesis and metrology experiments. Graph databases offer an opportunity to represent such hierarchical relationships among data, organizing semantic relationships into a knowledge graph. Initial reports of knowledge graphs in materials science highlight the breadth of approaches for their development. We describe a knowledge graph of materials experiments whose construction encodes the complete provenance of each material sample and its associated experimental data and metadata. Additional relationships among materials and experiments further encode knowledge and facilitate data exploration. MekG is sufficiently large and complex to demonstrate a path toward a global materials knowledge graph. We characterize the scalability of this approach, especially with respect to executing queries, illustrating the value that modern graph databases can provide to the enterprise of data-driven materials science.