Amanda Barnard, Australian National University
Bronwyn Fox, Swinburne University of Technology
Manyalibo Matthews, Lawrence Livermore National Laboratory
Krishna Rajan, University at Buffalo, The State University of New York
Army Research Office
CT05.01: Machine Learning I
Sunday AM, April 18, 2021
8:00 AM - *CT05.01.01
Network Theory Meets Materials Science
Northwestern University1Show Abstract
One of the holy grails of materials science, unlocking structure-property relationships, has largely been pursued via bottom-up investigations of how the arrangement of atoms and interatomic bonding in a material determine its macroscopic behavior. Here we consider a complementary approach, a top-down study of the organizational structure of networks of materials, based on the interaction between materials themselves. We demonstrate the utility of applying network theory to materials science in two applications: First, we unravel the complete “phase stability network of all inorganic materials” as a densely-connected complex network of 21,000 thermodynamically stable compounds (nodes) interlinked by 41 million tie-lines (edges) defining their two-phase equilibria, as computed by high-throughput density functional theory. Using the connectivity of nodes in this phase stability network, we derive a rational, data-driven metric for material reactivity, the “nobility index”, and quantitatively identify the noblest materials in nature. Second, we apply network theory to the problem of synthesizability of inorganic materials, a grand challenge for accelerating their discovery using computations. We use machine-learning of our network to predict the likelihood that hypothetical, computer generated materials will be amenable to successful experimental synthesis. ** In collaboration with V. Hegde, M. Aykol, S. Kirklin, L. Hung, S. Suram, P. Herring, and J. Hummelshoj
- Hegde, V. I., Aykol, M., Kirklin, S., & Wolverton, C. (2020). The phase stability network of all inorganic materials. Science Advances, 6(9), eaay5606.
- Aykol, M., Hegde, V. I., Hung, L., Suram, S., Herring, P., Wolverton, C., & Hummelshøj, J. S. (2019). Network analysis of synthesizable materials discovery. Nature communications, 10(1), 1-7.
8:25 AM - CT05.01.02
Late News: Optimizing Complex Geometries with Feed Forward Control and Machine Learning
Clara Druzgalski1,Gabe Guss1,Ava Ashby1,Simon Lapointe1,Aiden Martin1,Maria Strantza1,Zachary Reese1,Manyalibo Matthews1
Lawrence Livermore National Laboratory1Show Abstract
Laser powder bed fusion (LPBF) enables the fabrication of complex metal parts for many industries. However, quality control of LPBF remains a challenge due to defects that negatively impact repeatability and reliability. Complex parts contain many features that are defect-prone such as overhangs, thin walls, and channels. Process parameters must be optimized to satisfy engineering requirements and reduce the likelihood of part failure. This work describes computational methods to identify defect-prone features and apply optimized parameters using feed forward control and machine learning models. This targeted approach adapts the laser parameters to improve dimensional accuracy and reduce porosity.
8:40 AM - *CT05.01.03
Natural Language Processing for Materials Design—What Can We Extract From the Research Literature?
Lawrence Berkeley National Laboratory1Show Abstract
Traditionally, researchers have been able to leverage the tremendous wealth of information in the research literature only by reading and reviewing articles one at a time. This talk focuses on how advancements in natural language processing will make it possible to leverage the collective knowledge embedded in decades of previous research studies (often only represented in unstructured text documents) in novel ways. In this talk, I will provide a brief overview of natural language processing techniques and how they can be trained on the materials domain. Next, I will provide several examples of how such techniques are being leveraged to accelerate materials research. This includes the use of these algorithms to generate structured databases of materials properties (on which machine learning algorithms can be trained), to improve the ability to query and extract information from the body of research work, and even to make predictions of materials for functional applications. Finally, I will provide an outlook on the future of these techniques might be integrated into several different models of materials research and development.
9:05 AM - CT05.01.04
Natural Language Processing for Insensitivity Classification of Energetic Materials
Gaurav Kumar1,Allen Garcia1,Connor O'Ryan1,Peter Chung1
University of Maryland1Show Abstract
The vast and ever-increasing number of published text on energetic materials, written in natural language, opens doors for the application of Natural Language Processing (NLP) tools to extract information that can be used for characterization, design, and discovery of materials. In this work, we present how NLP can be used to extract information from open literature about energetics and their physical/chemical properties such as h50 impact sensitivity. We combine text from ~20000 journal articles and US patents to form our corpus. Two types of classifiers are developed (1) A binary classifier which categorizes the energetics into two groups i.e. sensitive or insensitive, and (2) A multinomial classifier which computes the likelihood of energetics‘ sensitivity as a function of distance within an embedding vector space to five common energetics whose insensitivities are well known. The binary classifier is evaluated by comparing the classifier result with a reference list of the energetics. The evaluation of the multinomial classifier is based on Kolmogorov-Smirnov test, Jensen-Shannon, Hellinger, and Wasserstein distances between statistical distributions developed from NLP and those generated using actual h50 data. The preliminary results indicate an accuracy of ~80% and an f-score of 0.86 for the binary classifier whereas the multinomial classifier consistently scores above a P-value of 0.90. This indicates that word embeddings can effectively capture the semantics of domain-specific language and that NLP, originally developed for natural language interpretation, can be extended to the study of materials, for instance, to semantically learn how to characterize material properties and capture relationships between chemicals, their properties, and applications.
9:20 AM - *CT05.01.05
Active Materials Exploration and Characterization with Bayesian Optimization
Aalto University1Show Abstract
Data generation in materials science is often limited by the time it takes to perform experiments or simulations. To facilitate the exploration and characterization of complex materials, we have developed the Bayesian Optimization Structure Search (BOSS) code. BOSS is an active learning technique that strategically samples the parameter space of material-science tasks be it experimental or computational. BOSS proposes new data acquisition points for maximum knowledge gain, balancing exploitation with exploration. I will demonstrate BOSS' smart and efficient data strategy for two examples: 1) sustainable biomaterials and 2) hybrid organic-inorganic electronic materials. For 1), we extract lignin from wood samples with hydrothermal treatment. Lignin is further processed by chemical modification into sustainable composite materials (e.g. carbon fibers, thermoplastics and three-dimensional printed objects). Lignin extraction and processing is coupled to BOSS to visualize process-structure-property correlations and to efficiently optimize extraction and modification conditions. For 2), we couple BOSS to density-functional theory (DFT) calculations to study the adsorption of a camphor molecule on the Cu(111) surface. We identify 8 unique stable adsorbates. By matching the stable structures to atomic force microscopy (AFM) images, we conclude that the experiments feature 3 different structures of chemisorbed camphor molecules.
CT05.02: Automation and High Throughput I
Sunday PM, April 18, 2021
10:30 AM - *CT05.02.01
Materials Informatics and Manufacturing Scalability and Sustainability
Massachusetts Institute of Technology1Show Abstract
Data has become a fundamental ingredient for accelerating and optimizing materials design and synthesis. Advances in applying natural language processing (NLP) to material science text has greatly increased the size and acquisition speed of materials science data from the published literature. This presentation will describe work to extract information from peer reviewed academic literature across a range of materials with particular focus on developing strategies for manufacturing scalability and sustainability challenges. Examples will be drawn from use of alternative feedstocks in cement as well as solid state electrolyte development.
10:55 AM - *CT05.02.02
Automated Multimodal Manufacturing Optimization
Brian Giera1,Adam Jaycox1,Kyle DeVlugt1,Joseph Nicolino1,Brian Au1,Sam Ludwig1
Lawrence Livermore National Laboratory1Show Abstract
The characterization and fabrication of manufactured parts is often a serial process, where different post-processing steps and quality measurements are obtained in a non-co-located and nonautomated fashion. The traditional process for taking components from the design stage to a final, qualified component is an extremely personnel intensive series of steps that are typically treated as siloed activities. Thus, the development cycle (e.g. part specification, fabrication, and qualification) is subject to bottlenecks, making part repeatability difficult and costly to achieve, quantify, and optimize. This is true for established manufacturing processes and especially true for emerging advanced manufacturing technologies. To address this, we adopt an “object-oriented” or modular methodology at all stages of manufacturing, fabrication, inspection, and so on. For instance, on such inspection modality is an automated metrology system that can perform a variety of inspection routines on arbitrary objects. Another module is a comprehensive NoSQL database that can log all data acquired at each step for every part fabricated by our system. We also create digital twins of each inspection and fabrication module that evolve over time as insights are extracted from the growing database. Although we exercise these capabilities using a suite of fused deposition model printers, the object-oriented approach is agnostic to any given manufacturing approach or inspection system. During this talk, we walk through key capabilities demonstrated with these modules and the implications towards automation of process design.
11:20 AM - *CT05.02.03
Autonomous End-to-End Systems for Materials Discovery
Muratahan Aykol1,Joseph Montoya1
Toyota Research Institute1Show Abstract
Autonomous research platforms driven by artificial intelligence have the potential to enable rapid identification of improved material components for technological devices, such as batteries or fuel cells, at significantly reduced research costs. In this talk, we will present our recent progress on the development of an end-to-end autonomous platform that helps materials scientists efficiently find or expand the space of optimal candidate materials with minimal intervention. This platform provides a software framework for (i) designing and testing of goal-oriented research agents that can flexibly combine machine learning with physical and chemical constructs as well as heuristics to guide the experiments, and (ii) seamless deployment of the designed agents in actual closed-loop experimental settings. In particular, we will present a recent implementation of this framework for on-demand, cost-effective discovery of stable inorganic materials by automated, intelligent control of crystal structure selection and density functional theory simulations. This platform is running uninterrupted on cloud computing resources, and have found thousands of previously unreported ground state or nearly stable inorganic compounds in binary and ternary metal oxide, sulfide, phosphide and metal-alloy chemistries, notably augmenting the layout of the phase diagrams in many of these systems.
11:45 AM - CT05.02.04
Robotics-Enabled Exploration of Multicomponent Lead Halide Perovskites via Machine Learning
Kate Higgins1,Sai Valleti2,Maxim Ziatdinov3,Sergei Kalinin2,3,Mahshid Ahmadi1
Joint Institute for Advanced Materials1,University of Tennessee2,Oak Ridge National Laboratory3Show Abstract
Metal halide perovskites (MHPs) have attracted considerable attention due to the combination of outstanding optoelectronic properties and low fabrication cost, making them uniquely attractive for various optoelectronic and sensing applications. Despite extensive effort, the synthesis of these materials preponderantly entails modifying a single compositional or synthesis variable and observing the structure and functionalities changes. Combined with lengthy processing and optimization times, this approach has largely been inefficient in its ability to explore vast design spaces. Here, we establish a workflow for the rapid synthesis and characterization of MHPs via combinatorial synthesis combined with rapid throughput photoluminescent measurements. We adopt an approach based on multivariate statistical analysis to gain insight into the variability of the photoluminescent properties across the compositional series. We map the compositional-dependent property (photoluminescence) and use the Gaussian Processing framework to determine associated uncertainties. From these uncertainties, we are then able to identify possible areas of interest and characterize them further. Overall, through the utilization of automated synthesis, we demonstrate how this workflow utilizes data-driven machine learning models for the accelerated discovery of large compositional spaces in MHPs with optimized properties for multifunctional optoelectronics.
CT05.03: Applications I
Sunday PM, April 18, 2021
1:00 PM - *CT05.03.01
Machine Learning for the Modeling of Complex Energy Materials
Columbia University1Show Abstract
The properties of materials for energy applications, such as heterogeneous catalysts and battery materials, often depend on complicated chemical compositions and complex structural features including defects and disorder. This complexity makes the direct modeling with first principles methods challenging. Machine-learning (ML) potentials trained on first principles reference data enable linear-scaling atomistic simulations with an accuracy that is close to the reference method at a fraction of the computational cost. ML models can also be trained to predict the outcome of simulations (or experiments), bypassing explicit atomistic modeling altogether.
Here, I will give an overview of recent methodological advancements of ML potentials based on artificial neural networks (ANNs) [1-5] and applications of the method to challenging materials classes including metal and oxide nanoparticles and amorphous phases. Further, I will show an example of integrating large computational and small experimental data sets for the ML-guided discovery of catalyst materials .
1. J. Behler and M. Parrinello, Phys. Rev. Lett. 98 146401 (2007).
2. N. Artrith, T. Morawietz, and J. Behler, Phys. Rev. B 83, 153101 (2011).
3. N. Artrith and A. Urban, Comput. Mater. Sci. 114, 135-150 (2016).
4. N. Artrith, A. Urban, and G. Ceder, Phys. Rev. B 96, 014112 (2017).
5. A. Cooper, J. Kästner, A. Urban, and N. Artrith, npj Comput. Mater. 6, 54 (2020).
6. N. Artrith, Z. Lin, and J. G. Chen, ACS Catal. 10, 9438−9444 (2020).
1:25 PM - CT05.03.02
Automated In Silico Screening of Nanoporous Materials for Enhanced CO2 Capture
Rodrigo Neumann1,Fausto Martelli1,Binquan Luan1,Tonia Elengikal1,Anshul Gupta1,Guojing Cong1,Mathias Steiner1,Thomas Peters2,Flor Siperstein3,Breanndan O Conchuir1
IBM Research1,University of Connecticut2,University of Manchester3Show Abstract
One of the strategies for carbon capture and storage is to leverage the adsorption properties of nanoporous materials. The carbon emissions of point sources, such as power plants, can be significantly reduced by applying these materials to the post-combustion capture of flue gas. Zeolites , Metal-Organic Frameworks , Zeolitic Imidazolate Frameworks  and Porous Polymer Networks  are examples of promising nanoporous materials which can efficiently trap flue gas molecules in their pores with diameters of a few (tens of) Angstroms.
Carbon dioxide, as a small gas molecule with 3.3 Angstroms of kinetic diameter, represents roughly 10% of the composition of flue gas coming out of coal-fired power station exhausts. In order to be a good carbon capture material, the nanoporous structure must not only adsorb flue gas components but preferentially adsorb CO2 as opposed to more abundant flue gas components, such as N2. Therefore, both the absolute adsorbate loading and the relative CO2/N2 selectivity are important performance figures-of-merit to be considered.
Millions of possible crystalline nanoporous materials  have been identified for carbon capture, extending far beyond our capability to quantify in silico the adsorption performance of each individual nanoporous structure by brute force calculations. Experimentally fabricating and measuring the adsorption properties of each framework is also unrealistic, due to the time and cost constraints, leading to the requirement for a pre-screening step to improve the resource allocation. In this talk, we present our work on optimising the classification mechanisms for characterizing nanoporous structures, enabling efficient high-throughput screening of materials for carbon capture.
Our automated material screening tool leverages cloud resources to spawn multiple computational experiments in parallel to rapidly explore the vast space of relevant nanoporous materials. A full computational experiment comprises, of not only the realisation of Grand Canonical Monte Carlo (GCMC) adsorption simulations, but also a full geometrical and topological characterisation of the material in terms of its crystalline structure as represented by a point cloud of atomic positions. These can be combined with machine learning to accelerate the estimation of adsorption properties based solely on the atomistic structure of materials.
 Siriwardane, R. V., Ming-Shing S., and Edward P. F. Adsorption of CO2, N2, and O2 on Natural Zeolites. Energy & Fuels 17, 571-576 (2003). https://doi.org/10.1021/ef020135l
 Saha, D., et al. Adsorption of CO2, CH4, N2O, and N2 on MOF-5, MOF-177, and Zeolite 5A. Environmental Science & Technology 44, 1820-1826 (2010). https://doi.org/10.1021/es9032309
 Hayashi, H., Côté, A., Furukawa, H. et al. Zeolite A imidazolate frameworks. Nature Mater 6, 501–506 (2007). https://doi.org/10.1038/nmat1927
 Lu, W., et al. Porous polymer networks: synthesis, porosity, and applications in gas storage/separation. Chemistry of Materials 22, 5964-5972 (2010). https://doi.org/10.1021/cm1021068
 Boyd, P., Lee, Y. & Smit, B. Computational development of the nanoporous materials genome. Nat Rev Mater 2, 17037 (2017). https://doi.org/10.1038/natrevmats.2017.37
1:40 PM - CT05.03.03
Late News: Machine Learning with Persistent Homology and Chemical Word Embeddings Improves Predictive Accuracy and Interpretability in Metal-Organic Frameworks
Joseph Montoya3,Aditi Krishnapriyan1,2,Maciej Haranczyk4,Jens Hummelshoej3,Dmitriy Morozov1
Lawrence Berkeley National Laboratory1,University of California, Berkeley2,Toyota Research Institute3,IMDEA Materials Institute4Show Abstract
Machine learning has emerged as a powerful approach in materials discovery. Its major challenge isselecting features that create interpretable representations of materials, useful across multiple predictiontasks. We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material’s structure and chemistry. This approach builds on computational topology techniques (namely, persistent homology) and word embeddings from natural language processing. It automatically encapsulates geometric and chemical information directly from the material system. We demonstrate our approach on multiple nanoporous metal–organic framework datasets by predicting methane and carbon dioxide adsorption across different conditions. Our results show considerable improvement in both accuracy and transferability across targets compared to modelsconstructed from the commonly–used, manually–curated features, consistently achieving an average 25–30% decrease in root-mean-squared-deviation and an average increase of 40–50% in R2 scores. A key advantage of our approach is interpretability: Our model identifies the pores that correlate best to adsorption at different pressures, which contributes to understanding atomic-level structure–property relationships for materials design.
1:55 PM - *CT05.03.04
Defect Detection and Uncertainty Quantification in Property Prediction with Machine Learning
Dane Morgan1,Mingren Shen1,Ryan Jacobs1,Glenn Palmer1,Kevin Field2
University of Wisconsin–Madison1,University of Michigan–Ann Arbor2Show Abstract
In this talk I will discuss two areas of applications of machine learning to materials science and engineering. First, I will share recent results on extracting defects automatically from electron microscopy images. Electron microscopy is widely used to explore defects in crystal structures, but human tracking of defects can be time-consuming, error prone, and unreliable, and is not scalable to large numbers of images or real-time analysis. In this work I discuss application of machine learning approaches to find the location and geometry of different defect clusters in irradiated steels. We show that performance comparable to human analysis can be achieved with relatively small training data sets. We explore multiple deep learning methods that provide various features, e.g., fast processing for video and pixel level categorization to simplify defect dimension determination.
Second, I will share some studies we have been doing to assess accuracy of materials property prediction from machine learning models. Machine learning provides a powerful tool to predict materials properties, but relatively little attention has been paid to the critical issue of assessing the domain of the machine learning model and the accuracy of the predictions within that domain. In this talk I explore the effectiveness of some model error estimation methods, including ensemble and Bayesian methods, and consider how these might be used to obtain accurate error estimates within with the domain of the model. We apply the results on a realistic problem of modeling dilute impurity diffusion coefficients in a host, demonstrating that the model can predict accurate values for new systems but that the domain and errors are essential to consider for effective use of the model.
2:20 PM - CT05.03.05
Machine Learning the Quantum-Chemical Properties of Metal–Organic Frameworks for Accelerated Materials Discovery with a New Electronic Structure Database
Andrew Rosen1,Shaelyn Iyer1,Debmalya Ray2,Zhenpeng Yao3,Alan Aspuru-Guzik4,Laura Gagliardi5,Justin Notestein1,Randall Snurr1
Northwestern University1,University of Minnesota Twin Cities2,Harvard University3,University of Toronto4,The University of Chicago5Show Abstract
Metal–organic frameworks (MOFs) are a widely investigated class of crystalline solids with tunable structures that make it possible to impart specific chemical functionality tailored for a given application. However, the enormous number of possible MOFs that can be synthesized makes it difficult to determine which materials would be the most promising candidates, especially for applications governed by electronic structure properties that are often computationally demanding to simulate and time-consuming to probe experimentally. Here, we have developed the first publicly available quantum-chemical database for MOFs (the “QMOF database”), which consists of properties derived from density functional theory (DFT) for over 14,000 experimentally synthesized MOFs. Throughout this study, we demonstrate how this new database can be used to identify MOFs with targeted electronic structure properties. As a proof-of-concept, we use the QMOF database to evaluate the performance of several machine learning models for the prediction of DFT-computed band gaps and find that crystal graph convolutional neural networks are capable of achieving superior predictive performance, making it possible to circumvent computationally expensive calculations. We also show how unsupervised learning methods can aid the discovery of otherwise subtle structure–property relationships using the computational findings in this work. We conclude by highlighting several MOFs with low band gaps, a challenging task given the electronically insulating nature of most MOF structures. The data and predictive models generated in this work, as well as the database of MOF structures, should be highly useful to other researchers interested in the predictive design and discovery of MOFs for the many applications dictated by quantum-chemical phenomena.
CT05.04: Material Informatics I
Sunday PM, April 18, 2021
4:00 PM - *CT05.04.01
Understanding and Visualizing Hyperspectral ToF-SIMS Data Sets Using Machine Learning
Paul Pigram1,Wil Gardner1,2,3,David Winkler2,4,5,Davide Ballabio6,Benjamin Muir3
La Trobe University1,La Trobe Institute for Molecular Science, La Trobe University2,CSIRO Manufacturing3,Monash Institute of Pharmaceutical Sciences, Monash University4,School of Pharmacy, University of Nottingham5,Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca6Show Abstract
The application of multivariate analysis to mass spectral data sets has been thoroughly investigated in recent decades. Contemporary studies, however, frequently involve large scale and complex data collections comprising libraries of spectra or hyperspectral mass spectrometry imaging (MSI) or depth profiling. The understanding and visualizing the complex relationships between peaks, pixels and voxels embodied in these data remains a major challenge. It is now recognized that most mass spectral data contain non-linear relationships, which has led to increased application of machine learning approaches to provide unique insights into the underlying surface chemistry.
We have exemplified the use of the self-organizing map (SOM), a type of artificial neural network, for analyzing time-of-flight secondary ion mass spectrometry (ToF-SIMS) data derived from spectral libraries, hyperspectral images and depth profiles. Recently, we developed a novel methodology, SOM-RPM, which incorporates the algorithm relational perspective mapping (RPM) to improve visualization of the SOM for 2D ToF-SIMS images. We have also used SOM-RPM to characterize and interpret 3D ToF-SIMS depth profile data, voxel-by-voxel. An organic IrganoxTM multilayer standard sample was depth profiled using ToF-SIMS and SOM-RPM was used to create 3D similarity maps of the depth-profiled sample, in which the mass spectral similarity of individual voxels is modelled with color similarity. We used this similarity map to segment the data into spatial features, demonstrating that the unsupervised method meaningfully differentiated between Irganox-3114 and Irganox-1010 nanometer-thin multilayer films. The method also identified unique clusters at the surface associated with environmental exposure and sample degradation. Key fragment ions characteristic of each cluster were identified, tying clusters to their underlying chemistries. SOM-RPM has the demonstrable ability to reduce vast data sets to simple 3D visualizations that can be used for clustering data and visualizing the complex relationships within.
4:25 PM - CT05.04.02
Charting the Low-Loss Region in Electron Energy Loss Spectroscopy with Machine Learning
Juan Rojo2,Laurien Roest1,Sabrya van Heijst1,Jaco ter Hoeve2,Louis Maduro1,Isabel Postmes1,2,Sonia Conesa-Boj1
Kavli Institute of Nanoscience Delft1,VU Amsterdam & Nikhef2Show Abstract
Electron energy-loss spectroscopy (EELS) within the transmission electron microscope (TEM) provides a wide range of valuable information on the structural, chemical, and electronic properties of nanoscale materials. A particularly important region of EEL spectra is the low-loss region, whose analysis makes possible charting the local electronic properties of nanomaterials from the characterisation of bulk and surface plasmons, excitons, and phonons to the determination of their bandgap and band structure. A major challenge for EELS data interpretation in this low-loss region is the presence of the zero-loss-peak (ZLP) associated to elastic scatterings. The presence of this ZLP often overwhelms the contribution from the inelastic scatterings between the TEM beam electrons with the sample such that relevant signals of low-loss phenomena risk becoming drowned in the ZLP tail. An accurate removal of the ZLP contribution is thus crucial in order to accurately map and identify key physical information from the low-loss region in EEL spectra. Several approaches to ZLP subtraction have been put forward in the literature. These methods are however affected by significant limitations: they based on specific, ad-hoc model assumptions about the ZLP, in particular concerning its parametric functional dependence on the electron energy loss ΔE, and they lack an estimate of the associated uncertainties.
Here we bypass these limitations by developing a model-independent strategy to realise a multidimensional determination of the ZLP with a faithful uncertainty estimate. Our approach is based on machine learning (ML) techniques originally developed in high-energy physics to study the quark and gluon substructure of protons in particle collisions. It is based on the Monte Carlo replica method to construct a probability distribution in the space of experimental data and artificial neural networks as unbiased interpolators to parametrise the ZLP. The end result is a faithful sampling of the probilbility distribution in the ZLP space which can be used to subtract its contribution to EEL spectra while propagating the associated uncertainties. One can also extrapolate the predictions from this ZLP parametrisation to other TEM operating conditions beyond those included in the training dataset.
By means of this approach, we construct a ML model of ZLP spectra acquired in vacuum, which is able to accommodate an arbitrary number of input variables corresponding to different operation settings of the TEM. We demonstrate how this model successfully describes the input spectra and we assess its extrapolation capabilities for other microscope operation conditions. Further, we construct a one-dimensional model of the ZLP as a function of the energy loss ΔE from spectra acquired on two different specimens of tungsten disulfide (WS2) nanoflowers characterised by a 2H/3R mixed polytypism. The resulting subtracted spectra are used to determine the value and nature of the WS2 bandgap in these nanostructures as well as to map the properties of the associated exciton peaks appearing in the ultra-low loss region.
We also present results of further applications of our ML-subtracted EEL spectra to characterise the local electronic properties of TMD-based nanostructures. First, by means of the evaluation of the complex dielectric function via the Kramers-Kronig relations. Second, by implementing the automation of data analysis in spectral TEM images, where each pixel contains an individual EEL spectrum. By using ML regression and classification methods, one can identify relevant features of the spectra (peaks, edges, shoulders) with minimal human intervention and then determine how these features vary as we move along different regions of the nanostructure.
The framework presented in this work has been implemented and made available in an open source Python package, dubbed EELSfitter, and available from GitHub.
4:40 PM - CT05.04.03
Discovery of Interpretable X-Ray Absorption Spectroscopy Signatures via Random Forest Machine Learning Models
Steven Torrisi1,2,Matthew Carbone3,Brian Rohr2,Joseph Montoya2,Yang Ha4,Junko Yano4,Santosh Suram2,Linda Hung2
Harvard University1,Toyota Research Institute2,Columbia University3,Lawrence Berkeley National Laboratory4Show Abstract
X-ray absorption spectroscopy (XAS) produces a wealth of information about the local structure of materials, but interpretation of spectra often relies on easily accessible trends and prior assumptions about the structure. Recently, researchers have demonstrated that machine learning models can automate this process to predict the coordinating environments of absorbing atoms from their XAS spectra. However, machine learning models are often difficult to interpret, making it challenging to determine when they are valid and whether they are consistent with physical theories. In this work, we present three main advances to the data-driven analysis of XAS spectra: we demonstrate the efficacy of random forests in solving two new property determination tasks (predicting Bader charge and mean nearest neighbor distance), we address how choices in data representation affect model interpretability and accuracy, and we show that multiscale featurization can elucidate the regions and trends in spectra that encode various local properties. This multiscale featurization transforms the spectrum into a vector of polynomial-fit features, and is contrasted with the commonly-used “pointwise” featurization that directly uses the entire spectrum as input. We find that across thousands of transition metal oxide spectra, the relative importance of features describing the curvature of the spectrum can be localized to individual energy ranges, and we can separate the importance of constant, linear, quadratic, and cubic trends, as well as the white line energy. This work has the potential to assist rigorous theoretical interpretations, expedite experimental data collection, and automate analysis of XAS spectra, thus accelerating the discovery of new functional materials.
 S.B. Torrisi, M.R. Carbone, B.A. Rohr, J.H. Montoya, Y. Ha, J. Yano, S.K. Suram, L. Hung; npj Computational Materials volume 6, Article number: 109 (2020) https://rdcu.be/b9n2Y
4:55 PM - CT05.04.04
Late News: Machine Learning Force Fields for Understanding the Thermodynamics of Li-Ion Cathodes
Joshua Gabriel1,Juan Garcia1,Noah Paulson1,John Low1,Marius Stan1,Hakim Iddir1
Argonne National Laboratory1Show Abstract
Machine Learning Force Fields (MLFF) have emerged as a tool to accelerate the atomic scale modeling of materials while preserving accuracy of density functional theory calculations. A workflow for developing such an MLFF, which leverages state of the art deep learning high performance computing architectures available at the national laboratories, is presented and discussed. The MLFF is used to study the thermodynamics and structural changes of lithium nickel oxide (LNO) battery cathode material during room temperature operation. The results emphasize the complexity of phase stability in this system and demonstrate the predictive power of the MLFF method.
5:00 PM - CT05.04.05
Improvement of Adhesion Between NiTi Alloy and Diamond-Like Carbon Film by Bayesian Optimization
Masafumi Toyonaga1,Terumitsu Hasebe1,2,Shunto Maegawa2,Tomohiro Matsumoto1,2,Atsushi Hotta1,Tetsuya Suzuki1
Keio University1,Tokai University Hachioji Hospital2Show Abstract
Surface coating is one of the most interesting methods for improving the mechanical, physical, chemical and biocompatible properties of materials and devices. Fluorine-incorporated diamond-like carbon (F-DLC) has received much attention as a coating material because of outstanding blood compatible properties which suppress fatal failure of the medical devices. However, it is well known that F-DLC thin films exhibit poor adhesion on metallic alloys and delamination or cracks are easy to occur after coating. In order to improve adhesion of F-DLC on metallic alloys, many scientific methods have been reported. Although some of these studies focused on introducing silicon-containing interlayers such as silicon-incorporated DLC (Si-DLC) between metallic alloys and F-DLC thin films to improve the adhesion properties, the film formation conditions of the interlayer that most improves the adhesion are not clear, and the method has not been established for optimizing the film formation conditions. Thus, we considered optimizing the structure of the interlayer using “Bayesian optimization”, which is known as one of machine learning. In this study, we optimize the structure of Si-DLC interlayer by Bayesian optimization to apply F-DLC to low blood compatible nickel-titaniumu (NiTi) alloy, which has been attracting attention as a material for medical devices due to superelasticity and shape memory.
The purpose of this study is evaluating the effectiveness of Bayesian optimization for determining optimal structures of interlayers between metallic substrates and F-DLC, and developing high blood compatible NiTi alloy by improving adhesion properties of F-DLC.
Si-DLC and F-DLC were prepared on NiTi substrates using radio frequency plasma enhanced chemical vapor deposition (RF-PECVD) equipment. The adhesion properties between NiTi substrates and DLC thin films were evaluated by the scratch test, and the structures of Si-DLC interlayer were updated successively by Bayesian optimization on the obtained data. Total of 30 Si-DLC interlayers were produced, and the highest adhesion could be improved to about 53 mN, while the lowest adhesion was about 22 mN. The one with the highest adhesion and the one with the lowest adhesion were deposited on the NiTi stents, and after performing the crimp test and the fatigue test, the surface was observed by Scanning Electron Microscope (SEM). As a result, no delamination was observed in the interlayer derived by Bayesian optimization, whereas delamination occurred in the sample in which structure was not optimized.
Therefore, this study shows that adhesion properties between metallic material and DLC thin film can be improved by Bayesian optimization. In addition, in the future, by applying machine learning in various researches in the field of materials, it is expected to develop materials with unprecedented excellent properties.
5:05 PM - CT05.04.06
High-Throughput Electrochemical Screening of Deep Eutectic Solvent for Use in Redox Flow Batteries
Maria Politi1,Jaime Rodriguez1
University of Washington1Show Abstract
Deep eutectic solvents (DESs) are a class of materials with varied applications including catalysis and synthesis, extraction processes, drug solubilization and battery electrolytes. Their appeal stems from their broad electrochemical stability window, high electrical conductivity, low vapor-pressure, and low-flammability. These solvents present a depression in the melting point at specific molar ratios of organic components that can result in a liquid solution at moderate temperatures. A breadth of candidates with varying concentrations can be used to form DESs, leading to a vast design space. High throughput experiments and data-driven design strategies are key to accelerate the optimization of materials based on their physicochemical and electrochemical properties as well as engineering criteria (e.g. cost, safety, toxicity) for candidate DESs. The implementation of high-throughput tools allows for a rapid evaluation and screening based on metrics such as the melting point, potential stability window and ionic conductivity. Several high-throughput protocols have been designed for identifying the design space of molecules under investigation, their formulation and material characterization. Using data science principles, the molecules composing our basis set were identified using scoring based on metrics such as cost, melting point, toxicity and molecular weight. The formulation of deep eutectic solvents is automated through the use of a pipetting robot, combinatorial techniques and well-plates. Next, the melting point of the proposed mixtures is detected using an IR camera and hot-plate set-up. To screen for their electrochemical properties, 96-well plates with screen printed electrodes in combination with measurement techniques such as Cyclic Voltammetry (CV) and Electrochemical Impedance Spectroscopy (EIS) are implemented. Finally, to adapt high-throughput principles to the analysis of the data collection obtained through the aforementioned protocols, machine learning is leveraged for the data classification and data modeling. Furthermore, open-source Python-based packages have been developed and made available on GitHub. The combination of high-throughput experimentation and data analysis can greatly accelerate the design and screening candidate DES systems. This overall workflow could be easily adapted to other design spaces and applications.
5:10 PM - CT05.04.07
Late News: A Materials-Informatics Based Study of Solid Electrolytes and Protective Coatings for Li Batteries
Shreyas Honrao1,2,Xin Yang3,Balachandran Radhakrishnan1,3,Shigemasa Kuwata3,Hideyuki Komatsu4,Atsushi Ohma4,John Lawson1
NASA Ames Research Center1,KBRR Wyle2,Nissan North America3,Nissan Motor Company4Show Abstract
All-solid-state batteries with Li metal anode can address the safety issues surrounding traditional Li-ion batteries as well as the demand for higher energy densities. However, the development of solid electrolytes and protective coatings simultaneously possessing high ionic conductivity and wide electrochemical stability has proven to be a challenge. Here, we present a data-driven approach to explore the Li compound space for promising solid electrolytes and coatings. This is accomplished through the generation of a large database of battery-related materials properties of Li compounds by computing Li+ migration barriers using bond-valence-based pair potentials, and stability windows using density functional theory energies. Using this database, we implement machine learning models that can accurately predict migration barriers and electrochemical stability windows for any new Li compound. Through feature engineering, we ensure that our models are both accurate and interpretable. We perform feature importance analysis on our models to highlight materials properties that can be tuned for future design of coatings/electrolytes. Our database and informatics approach provide a valuable tool for the rapid discovery of new solid-state battery chemistries.
5:25 PM - CT05.04.08
Late News: Prediction of Bulk and Grain Boundary Ionic Conductivities for Solid-State Li-Ion Conductors by Machine Learning
Yen-Ju Wu1,Takhiro Tanaka1,Tomoyuki Komori2,Mikiya Fujii2,Hiroshi Mizuno2,Satoshi Itoh1,Tadanobu Takada1,Erina Fujita1,Yibin Xu1
National Institute for Materials Science1,Panasonic Corporation2Show Abstract
A machine learning approach for identifying the important descriptors of the ionic conductivities of lithium solid electrolytes is proposed. This approach discriminates the factors of both bulk and grain boundary conductivities, which have been rarely reported. The effects of the interrelated structural, material, chemical and experimental properties on the bulk and grain boundary conductivities are investigated. The data are trained using the bulk and grain boundary conductivities of Li solid conductors at room temperature. Both the bulk conductivity and the grain boundary conductance of single grains were derived from 96 samples in three structural classes: perovskite, garnet, and NASICON. The important descriptors are elucidated by their feature importance and predictive performances, as determined by a nonlinear XGBoost algorithm: (i) the experimental descriptors of sintering conditions are significant for both bulk and grain boundary, (ii) the intrinsic bulk conductivity also changes with grain size, (iii) the local environment affects the grain boundary conductance that the lower coordinate number of Li shows higher grain boundary conductance. These findings can clarify ways of improving bulk conductivities and overcoming the limiting factors of grain boundary conductivities for solid-state Li-ion conductors.
CT05.05: Machine Learning II
Sunday PM, April 18, 2021
6:30 PM - *CT05.05.01
End-to-End Differentiability and Tensor Processing Unit Computing to Accelerate Materials’ Inverse Design
Mathieu Bauchy1,Han Liu1,Yuhan Liu1
University of California, Los Angeles1Show Abstract
Numerical simulations have revolutionized material design. However, although simulations excel at mapping an input material to its output property, their direct application to inverse design (i.e., mapping an input property to an optimal output material) has traditionally been limited by their high computing cost and lack of differentiability—so that simulations are often replaced by surrogate machine learning models in inverse design problems. Here, taking the example of the inverse design of a porous matrix featuring targeted sorption isotherm, we introduce a computational inverse design framework that addresses these challenges. We reformulate a lattice density functional theory of sorption in terms of a convolutional neural network with fixed hard-coded weights that leverages automated end-to-end differentiation. Thanks to its differentiability, the simulation is used to directly train a deep generative model, which outputs an optimal porous matrix based on an arbitrary input sorption isotherm curve. Importantly, this pipeline leverages for the first time the power of tensor processing units (TPU)—an emerging family of dedicated chips, which, although they are specialized in deep learning, are flexible enough for intensive scientific simulations. This approach holds promise to accelerate inverse materials design.
7:10 PM - CT05.05.03
Graphical Model Parameters for Formation of 3D Nanomolecular Complexes
Minjeong Cha1,Emine Turali-Emre1,Xiongye Xiao2,Paul Bogdan2,Nicholas Kotov1
University of Michigan1,University of Southern California2Show Abstract
The design of new functional materials for drug or anti-viral medicines is essential in the biomaterials and pharmaceutical field. However, due to the lack of generalized descriptors for the pairwise interactions of nano molecules, the automatic design and prediction of nanomaterials’ function in the biosystem remain one of the most challenging problems. Here, to understand the comprehensive parameters for the formation of nano-biomolecular complexes, the protein complex information is investigated. The abundant protein topology data from nature allow us to access the key structural aspects for detecting the interaction levels of molecule pairs. The newly introduced features originated from the graph network model would provide a universal description of local properties of any nanostructures and also play crucial roles in predicting the interaction level in protein pairs, defined by the distances. The feature correlation dynamics based on the distance classes validate the significant contribution of graph network features in the interaction prediction algorithms. As the proof-of-concept applications of nanomolecular complexes' interacting sites prediction, the SARS-Cov-2 nucleocapsid protein dimer and experimentally proven protein and nanoparticle pairs are analyzed. The rapid and straightforward prediction of interaction sites in pairwise molecular complexes will enhance our understanding of multi-dimensional structural model parameters for the design rules.
7:25 PM - CT05.05.04
Graph Theory for Design of Complex Biomimetic Nanostructures
University of Michigan1Show Abstract
The main hurdle on the pathway finding relationships between organization and properties of many biological materials is that the frameworks do not exhibit crystalline or any other long-range order that underpins many, if not all, structure-property correlations used for metals, ceramics, polymers, and other materials. Understanding of their organizational patterns and therefore, their mechanical, transport, electrical, and other properties will be essential for their engineering and requires a new approach to quantification of their structures.
This talk will address this problem and pave the way to comprehensive and quantitative description of structural patterns observed in different biomaterials using the graph theory (GT) extensively used in sociology and informatics for evaluation of complex network architectures.
The nanostructures represented by self-assembled chiral nanoparticles , biomimetic composites , and protein complexes  will be represented using GT representations based on discretization of nanostructures into nodes being connected with a network of edges. The resulting graphs can be utilized for enumeration of hierarchical architecture materials, information content (complexity), and structure-property correlations. Predictions of properties based on GT representations and machine learning algorithms will be presented.
 W. Jiang, Z.-B. Qu, P. Kumar, D. Vecchio, Y. Wang, Y. Ma, J. H. Bahng, K.Bernardino, W. R. Gomes, F. M. Colombari, A. Lozada-Blanco, M. Veksler, E. Marino, A. Simon, C. Murray, S. Ricardo Muniz, A. F. de Moura, N. A. Kotov, Emergence of Complexity in Hierarchically Organized Chiral Particles, Science, 2020, 368, 6491, 642-648.
 Wang, M.; Vecchio, D.; Wang, C.; Emre, A.; Xiao, X.; Jiang, Z.; Bogdan, P.; Huang, Y.; Kotov, N. A. Biomorphic Structural Batteries for Robotics. Sci. Robot. 2020, 5 (45), eaba1912. https://doi.org/10.1126/scirobotics.aba1912.
 M.Baranwal, A. Magner, J. Saldinger, E. S. Turali-Emre, S. Kozarekar, P. Elvati, J. S. VanEpps, N. A. Kotov, A. Violi, A. O. Hero, Struct2Graph: A graph attention network for structure based predictions of protein-protein interactions, 2020, BioRxiv, https://doi.org/10.1101/2020.09.17.301200.
7:40 PM - CT05.05.05
Symmetry Incorporated Graph Convolutional Neural Networks for Solid-State Materials
Weiyi Gong1,Hexin Bai1,Peng Chu1,Haibin Ling2,Qimin Yan1
Temple University1,Stony Brook University, The State University of New York2Show Abstract
Recently, graph convolutional neural network (GCN) has been applied in crystal structures with a crystal graph representation to achieve an accurate prediction of material properties. However, graph convolutions used in previous work are mostly performed in real space based on the geometric information of crystal structures. The lack of space group symmetry information in real and reciprocal space limits the prediction accuracy of electron structure related properties. In this talk, we will demonstrate the development of a graph convolutional neural network with global and local symmetries in both real and reciprocal spaces incorporated. The newly proposed model gives accurate predictions, compared to the state-of-the-art atom-based graph neural network models, and inspiring physical insights in the correlation between orbital symmetries and electronic structure properties of solid-state crystalline systems.
Amanda Barnard, Australian National University
Bronwyn Fox, Swinburne University of Technology
Manyalibo Matthews, Lawrence Livermore National Laboratory
Krishna Rajan, University at Buffalo, The State University of New York
Army Research Office
CT05.06: Materials Informatics II
Monday AM, April 19, 2021
8:00 AM - *CT05.06.01
Artificial Intelligence Towards Materials Maps
Matthias Scheffler2,1,Claudia Draxl1,2
Humboldt-Universität zu Berlin1,Fritz Haber Institute of the Max Planck Society2Show Abstract
High-throughput studies provide much information but will never cover the vast chemical and structural space of materials. Thus, Artificial Intelligence (AI) must enhance our research of tomorrow – starting today. To predict novel candidate materials for a given application, possibly even in regions of the materials space that no-one would think of, our goal is to build a general “map of materials properties”. A FAIR (findable and AI ready; a further-reaching interpretation of the original acronym) data infrastructure will be an important step for achieving this goal . The NOMAD Laboratory  is a community effort and a living example for such infrastructure in computational materials science, comprising the NOMAD Repository (raw data) and its Archive (normalized, i.e. code-independent data), the NOMAD Encyclopedia, and the NOMAD Analytics Toolkit. A final breakthrough is, however, only possible if the combined insight from synthesis, experiment , theory, and simulations are brought together. I’ll review where we are on this road that requires an immense but rewarding effort of the wide community.
 C. Draxl and M. Scheffler, Big-Data-Driven Materials Science and its FAIR Data Infrastructure, Invited Perspective in Handbook Andreoni W., Yip S. (eds) Handbook of Materials Modeling. Springer, Cham (2019).
 C. Draxl and M. Scheffler, NOMAD: The FAIR Concept for Big-Data-Driven Materials Science, MRS Bulletin 43, 676 (2018).
 A. Trunschke et al., Towards Experimental Handbooks in Catalysis, Topics in Catalysis, 1-17 (2020).
8:25 AM - CT05.06.02
The Search for New Materials
Joe Pitfield1,Steven Hepplestone1
University of Exeter1Show Abstract
Atomic scale structure prediction is a significant area of focus, yielding results such as the Materials Project  and tools such as AIRSS  and CALPYSO . However, such approaches are focused on the isolated bulk whereas grain boundaries, interfaces and other phenomena dominate device development. Here, we demonstrate using MgO and graphite, how interface physics can lead to unique material formation, contrasting how this differs from the bulk, and what the resultant new properties are.To do this, we have developed A-RAFFLE, our structure at interfaces prediction tool, built upon the ARTEMIS interface prediction software. Here, we demonstrate the base capability of RAFFLE as a structure prediction tool, highlighting its strengths and limitations compared to other approaches, and then discuss how RAFFLE has been implemented to predict structures at the interface between different materials.
 Chris J. Pickard and R. J. Needs. “High-Pressure Phases of Silane”. In:Phys. Rev. Lett.97 (4 July 2006), p. 045504.doi:10.1103/PhysRevLett.97.045504.url:https://link.aps.org/doi/10.1103/PhysRevLett.97.045504.
 Yanchao Wang et al. “Crystal structure prediction via particle-swarm op-timization”. In:Phys. Rev. B82 (9 Sept. 2010), p. 094116.doi:10.1103/PhysRevB.82.094116.url:https://link.aps.org/doi/10.1103/PhysRevB.82.094116.
 Anubhav Jain et al. “The Materials Project: A materials genome approachto accelerating materials innovation”. In:APL Materials1.1 (2013), p. 011002.issn: 2166532X.doi:10.1063/1.4812323.url:http://link.aip.org/link/AMPADS/v1/i1/p011002/s1%5C&Agg=doi.
 Ned Thaddeus Taylor et al. “ARTEMIS: Ab initio restructuring tool en-abling the modelling of interface structures”. In:Computer Physics Com-munications257 (2020), p. 107515.issn: 0010-4655.doi:https://doi.org/10.1016/j.cpc.2020.107515.url:http://www.sciencedirect.com/science/article/pii/S0010465520302423
8:55 AM - *CT05.06.04
Digital Infrastructures for Materials Research and Discovery
École Polytechnique Fédérale de Lausanne1Show Abstract
We present our vision, implementation, and technology stack for a digital infrastructure for materials discovery. The three cornerstones are given by open-source quantum simulation codes tuned to the needs of pre- and exascale machines (Quantum ESPRESSO, SIRIUS); an operating system for high-throughput simulation with full reproducibility and data provenance (AiiDA), and a dissemination platform for raw and curated data, simulation services, and data analytics (Materials Cloud). An example will be given targeted at the discovery of novel two dimensional materials.Work done in collaborations with Giovanni Pizzi, and the AiiDA and Materials Cloud teams.
9:20 AM - CT05.06.05
Automated Microstructural Feature Extraction for Accelerated Materials Discovery
Olga Wodo3,Baskar Ganapathysubramanian1,Daniel Wheeler2,Jaroslaw Zola3
Iowa State University of Science and Technology1,National Institute of Standards and Technology2,University at Buffalo, The State University of New York3Show Abstract
Data-driven based approaches facilitate a systematic way to develop mappings between microstructure and microstructure-sensitive properties. Incorporating data-driven approaches with physically meaningful descriptors enables the elucidation of the underlying physical mechanisms linking structure with property (explainable or interpretable AI). One critical aspect of such an approach is the ability to represent materials and their structure in machine-friendly formats. In this talk, we present our framework to compute a library of generic descriptors for micrographs. We combine two approaches: statistical descriptors (e.g., n-point correlations) with morphological and topological descriptors (e.g., graph-based descriptors). Using this integrated approach, we will benchmark a few problems in the arena of discovering and mining microstructure-property maps. We explain how this work lays the foundation for machine learning of microstructure-property relationships and enables information fusion between multiple scales.
9:35 AM - CT05.06.06
MPDD: Material-Property-Descriptor Database
Adam Krajewski1,ShunLi Shang1,Yi Wang1,Zi-Kui Liu1
The Pennsylvania State University1Show Abstract
Fundamentally, each ML study predicts some property and comprises three elements: a database, a descriptor, and an ML algorithm. These are combined in two steps. First, the data representation is calculated using the descriptor. Then the model is iteratively evaluated on this representation or adjusted to improve it. Both processes are nearly instantaneous compared to ab-initio based methods; however, with extensive databases or materials modeled with large super-cells (e.g., glasses), times can grow into days or years. We present a tool that can speed up total process orders of magnitude by removing the most time-intensive step, i.e., the descriptor calculation.
To accomplish that, we move from traditional sharing of only the material-properties data to sharing of the descriptors-properties data corresponding to the material as well, employing a NoSQL MongoDB database. This change not only enables orders-of-magnitude faster and effortless machine learning of materials but also serves as a tool for an automated and robust embodiment of prior knowledge about them in a graph-like fashion. Furthermore, since the descriptors are often reused for related properties, our database provides a tremendous speed-up in the design space exploration.
CT05.07: Data-Driven Chemistry I
Monday PM, April 19, 2021
10:30 AM - CT05.07.01
Inverse Design of Self-Reporting Redox-Active Materials Using Quantum Chemistry Guided Active Learning
Garvit Agarwal1,Hieu Doan1,Lily Robertson1,Lu Zhang1,Rajeev Surendran Assary1
Argonne National Laboratory1Show Abstract
Redox flow batteries (RFBs) are a promising technology for stationary energy storage applications due to their flexible design, easy scalability and low cost. In RFBs, energy is carried in flowable redox-active materials (anolyte and catholyte redoxmers) which are stored externally and pumped to the cell during operation. Further improvement in energy density of RFBs requires design of redox-active materials with optimal properties i.e. wider redox potential window, higher solubility, and stability. Additionally, designing redoxmers with fluorescence enabled self-reporting functionality allows monitoring crossover of the redox-active material and state-of-health of the RFBs. Here we employ high-throughput density functional theory (DFT) calculations to generate database of reduction potentials, solvation free energies and absorption wavelengths of 1400 anolyte materials. Using simulated data, we develop accurate machine learning (ML) models to predict properties from simplified molecular input line-entry system (SMILES) representation of the molecular materials. The trained ML models are then used as surrogate models to drive the inverse material design loop using multi-objective Bayesian optimization to identify materials with optimum range of multiple properties. We demonstrate the improved efficiency of our active learning strategy as compared to the brute-force random search approach for discovering promising redox-active materials with desirable properties from a vast chemical search space of 2 million molecules.
10:45 AM - CT05.07.02
Accelerated Prediction of Atomically Precise Cluster Structures Using On-the-Fly Active Learning
Yunzhe Wang1,Shanping Liu1,Sam Norwood1,Peter Lile1,Tim Mueller1
Johns Hopkins University1Show Abstract
The chemical and structural properties of nanoclusters are of great interest in numerous applications -- light emitting devices, catalysis, and biomedical imaging to name a few. A systematic study of structure-property relationship necessitates the knowledge of atomically precise structures of stable clusters over a variety of sizes. However, experimental characterization of the structures of these non-crystalline materials can be challenging. Computationally searching for stable structures is a feasible solution, but can be computationally expensive, as it is a global optimization problem. In this work, we present a procedure that can accelerate prediction of low-energy nanocluster structures by combining a genetic algorithm and an interatomic potential model actively learned on-the-fly. A pool-based genetic algorithm is implemented to efficiently sample the configuration space, and moment tensor potentials are used to rapidly relax and evaluate the energies of predicted clusters. The resulting procedure significantly accelerates the process of identifying low-energy cluster structures and is demonstrated on both bare and ligated clusters. The predicted lowest-energy nanoclusters are compared with the lowest energy structures reported in literature to validate this methodology. This workflow provides a feasible way to systematically predict low-energy structures for nanoclusters at a large scale, which can greatly facilitate the discovery and design of novel nanomaterials for a wide range of applications.
11:00 AM - CT05.07.03
Screening and Understanding Li Adsorption on Two-Dimensional Metallic Materials by Learning Physics
Sheng Gong1,Shuo Wang2,Taishan Zhu1,Jeffrey Grossman1
Massachusetts Institute of Technology1,University of Maryland2Show Abstract
Two-dimensional (2D) materials have been applied on addressing challenges in Li-ion batteries (LIBs). However, it is hard to screen Li interaction with 2D materials by conventional first-principle calculations or purely data-driven machine learning. In this work, in order to screen Li interaction with 2D metallic materials, we build a high-throughput screening scheme that incorporate the process of learning physics into the screening circle. First, we use density functional theory (DFT) and graph convolutional networks (GCN) to calculate the minimum Li adsorption energies on a small set of 2D metals, then we propose a three-step adsorption mechanism based on previous understandings of charge-transfer and ionization-coupling to explain the found linear relation between minimum Li adsorption energy and work function of 2D metals. We propose that, during adsorption, a Li atom first ionizes to be a Li+ and an electron with the energy cost equal to the ionization potential of Li, then the electron transfers to the 2D metal and release the energy equal to the work function, and finally the Li+ couples with the negatively charged 2D metal with the energy change of coupling energy. We use chemisorption theory to support the proposed charge transfer direction, and we apply the linear dependence on explaining previous observation of enhanced Li adsorption by doping and functionalization and infeasibility of second-layer Li adsorption. For coupling, we find that the previously proposed image-charge coupling provides reasonable trend but fails for 0-height adsorption, and we use random forest model to predict and understand the coupling process, and find that variances of elemental properties of component elements and packing density are the most correlated features related to coupling. Finally, we apply our models on discovering potential high-voltage materials and find that some fluorides and chromium oxides have minimum Li adsorption energies lower than -7eV, which breaks the record, and we show that our physics-driven models have strong ability of extrapolation and higher accuracy and transferability than purely data-driven models. We hope this work can not only deepen human understanding of Li binding nature and promote the application of 2D materials on Li-ion batteries, but also inspire researchers to use physics to simplify learning problems by decoupling the target property into simpler properties in high-throughput screening for problems that are hard for conventional DFT calculations and purely data-driven machine learning.
11:15 AM - CT05.07.04
Multi-Fidelity Information Fusion DFT Study of Doped-Graphene Single Atom Catalysts
Hud Wahab1,Gaurav Raj1,Patrick Johnson1,Lars Kotthoff1,Dilpuneet Aidhy1
University of Wyoming1Show Abstract
Cost versus accuracy trade-offs are common in materials science and engineering, where a particular property of interest can be measured/computed at different levels of accuracy or fidelity. Intuitively, the higher the accuracy the most resource and time intensive, while the low-cost quicker alternatives tend to be noisy. In such situations, machine-learning-based multi-information source fusion (MISF) approaches can be employed to fuse information accessible from varying sources of fidelity and make predictions at higher levels of accuracy. In this study, we perform a comparative study on traditionally employed single-fidelity (SF) and MISF strategies, such as multi-fidelity co-kriging (CK), to compare their relative prediction accuracies and efficiencies for accelerated property predictions and high throughput chemical space explorations. We perform our analysis using DFT-computed Gibbs free energy data set of H adsorption for doped graphene for single atom catalyst applications. We discuss how Sure Independence Screening and Sparsifying Operator (SISSO) methods can be used to select relevant descriptors. Finally, we elucidate whether MISF based learning schemes outperform the traditional SF machine learning methods, and discuss if this can be generalized for cases involving large chemical space explorations.
CT05.08: Applications II
Monday PM, April 19, 2021
1:00 PM - *CT05.08.01
Investigating the Shapes of Bottlebrush Polymers Using Machine Learning
Virginia Tech1Show Abstract
Bottlebrush polymers (BBPs) of thermosensitive polymers have potential applications in the field of biomedicine. They are a type of graft polymers in which thermosensitive polymer side-chains are grafted onto a polymer backbone. Above their lower critical solution temperature (LCST), the thermosensitive polymers can exhibit a coil-to-globule conformational transition. In this talk, we will discuss the effect of grafting density on the conformation of poly(N-isopropylacrylamide) (PNIPAM; LCST= ~305 K) side-chains in the BBPs with worm-like, cone-like, and cake-like shapes. We have performed coarse-grained (CG) molecular dynamics (MD) simulations of these BBPs at 290 K (below LCST) and 320 K (above LCST) in the presence of explicit CG water for 500 ns of these BBPs. The effect of temperature on the structure and shape of these BBPs were quantified by analyzing the simulation trajectories using convolutional neural network (CNN) based machine learning approach.
1:25 PM - *CT05.08.02
Searching Order within Disorder with AI-Automation
Duke University1Show Abstract
Critical understanding of large amount of data leads to new spectral descriptors for discovering entropic ceramics (e.g. metastability, synthesizability), their phase transitions (e.g. mixing, melting) and their remarkable properties (hardness, elasticity) [Nat. Rev. Mater. 5, 295 (2020)]. Research sponsored by DOD.
1:50 PM - CT05.08.03
A Phase Mapping Algorithm to Accelerate High Throughput Experiments
Ming-Chiang Chang1,Sebastian Ament1,Maximillian Amsler1,2,Duncan Sutherland1,Carla Gomes1,R. Bruce van Dover1,Michael Thompson1
Cornell University1,University of Bern2Show Abstract
Developing high-throughput methods, leveraging both experimental automation and computer agent based autonomous exploration, is crucial to accelerate materials discovery. Advances continue to be necessary in advanced synthesis techniques and equally in improved materials characterization tools. In work with lateral gradient Laser Spike Annealing (lgLSA), we have demonstrated the power of spatially resolved spectroscopic and diffraction methods for rapidly assessing both composition and processing impacts on material properties. However, lgLSA dramatically increases the amount of such spectral data produced during experiments with even a single library, data which must be processed rapidly and efficiently to prevent bottle necks during the decision making in autonomous high-throughput workflows. X-ray diffraction (XRD) data is particularly difficult to manage because of the complicated behavior of spectra with changing composition, texture, and grain size distribution with processing conditions. Peak shifting, peak broadening, and modification of scattering intensities are but a few of the challenges to existing phase identification algorithms.
We introduce a general algorithm to efficiently identify constituent phases in XRD data from a library of ICSD patterns. This phase mapping algorithm was developed to be particularly appropriate for lgLSA (autonomous) explorations, exploiting the unique characteristics of the dense, temperature-dependent, diffraction data across a single laser scan experiment. While recent work has proposed deep-learning-based models for this problem, our algorithm is inspired by compressed sensing and is particularly inexpensive to run due to the physically-induced sparsity of the phases. Further, our algorithm incorporates peak-shifting due to alloying, something which even a recent deep-learning-based approach cannot handle. We demonstrate the capabilities of this algorithm using data from the Bi-Ti-O material system, resolving complex XRD patterns into constituents based on known phases from crystal structure databases. The algorithm has proven to be robust for phase identification even in the presence of doping, alloying, strain, and atomic disorder. Results from a variety of other systems demonstrate the general applicability of the algorithm. Finally, the ability to rapidly identify constituent phases in spectral data is especially valuable in an active learning context, where the phase identification can feed into experimental design. We will discuss how the algorithm is incorporated in an autonomous materials discovery workflow using in-situ experimental decisions and efficient synchrotron-based analysis. The algorithm has potential to be incorporated into many different high-throughput workflows, and can be equally used to provide physical knowledge to train machine learning models for accelerated materials discovery.
2:05 PM - CT05.08.04
High Dimensional Model Representation - Gaussian Process Regression—A Powerful Tool to Learn Multivariae Functions from Sparse Data
Sergei Manzhos1,Mohamed Boussaidi1,Owen Ren1,2,Dmitry Voytsekhovsky2
INRS1,Purefacts Inc2Show Abstract
Machine learning approaches including neural networks (NN) and Gaussian process regression (GPR) are finding widepread use to recover functional dependencies from multidimensional data. As powerful as these approaches are, they may fail when data density is low, which is always the case in highly-dimensional cases. Some methods like GPR also cannot easily work with large datasets. Using modified high dimensional model representation (HDMR) to represent a multivariate function with machine-learned lower-dimensional terms allows recovering functions from very sparse data, down to ~2 data per dimension. Sub-dimensional component functions are easier to fit and to use. Specifically here we present a HDMR-GPR combination where the use of GPR to represent component functions allows nonparametric (unbiased) representation and the possibility to work only with functions of desired dimensionality, obviating the need to build an expansion over orders of coupling. All component functions are determined from a single set of samples. We test the method by fitting potential energy surfaces of polyatomic molecules as well as by computing vibrational spectra.
2:20 PM - CT05.08.05
Comprehensive Comparison of Modern Sequential Design Approaches for Material Optimization—Application to Metal-Organic Frameworks
Giovanni Trezza1,Luca Bergamasco1,Matteo Fasano1,Eliodoro Chiavazzo1
Politecnico di Torino1Show Abstract
In a number of scientific contexts, the evaluation and optimization of an objective black-box function through the sequential performance of (physical and/or numerical) experiments can be prohibitively costly, especially when navigating in a high-dimensional domain. Modern sequential design algorithms based on machine learning are emerging as interesting options to address this issue, allowing to effectively explore particularly high-dimensional parameter spaces, with a minimal number of evaluations. In this framework, scientists can rationally choose the next experimental (or computational) setup to be analyzed while searching for the optimal candidate, without the need of relying on the luck of random guessing.
These general techniques find important applications in the broad area of material science (where the number of material descriptors or features can be impressively high), and hold promise to accelerate materials discovery and research. Here, we apply this approach to a database of 1000 Metal-Organic Frameworks (MOFs), which are innovative compounds employable in the energy field (e.g. thermal energy storage, carbon dioxide capture ). In particular, the aim is to find the best candidate in terms of a property of interest (e.g. water, carbon dioxide or other sorbate solubility) in the fewest number of trials navigating within a 58-dimensional feature space of properly selected material descriptors. In this work, we provide a comprehensive comparison of several state-of-the-art methodologies. This is accomplished both for the regression models over the initial set of 60 materials supposed to be known (random forest regression , Kriging , Gaussian process regression ) and for the strategies allowing to choose the next MOF to test (based on exploitation of high-performing candidates, exploration of poorly explored regions or entropic search ).
We observe that the convergence with these "smart" trajectories occurs before the average random guessing and, in many cases, in few dozens of evaluations. Differences among several strategies are shown and discussed.
 Boyd, P.G., Chidambaram, A., García-Díez, E. et al. Nature 576, 253–256 (2019).
 J. Ling, M. Hutchinson, E. Antono, S. Paradiso, and B. Meredig, Integrating Materials and Manufacturing Innovation 6, 207 (2017).
 S. N. Lophaven, H. B. Nielsen, J. Sondergaard, and A. Dace, Technical University of Denmark, Kongens Lyngby, Technical Report No. IMMTR-2002 12 (2002).
 E. Brochu, V. M. Cora, and N. De Freitas, arXiv preprint arXiv:1012.2599 (2010).
 Wang, Zi, and Stefanie Jegelka, arXiv preprint arXiv:1703.01968 (2017).
2:35 PM - CT05.08.06
Machine Learning Tools to Accelerate Scalable Perovskite PV Manufacturing
Nicholas Rolston1,Zhe Liu2,Austin Flick1,Thomas Colburn1,Zekun Ren2,Justin Chen1,Tonio Buonassisi2,Reinhold Dauskardt1
Stanford University1,Massachusetts Institute of Technology2Show Abstract
One of the key requirements for commercializing perovskite PV is to develop a scalable fabrication process to produce high-efficiency PV devices. We have identified that process optimization is the most time-consuming step during the development of scalable deposition technology for perovskite devices. Because the process optimization involves a large number of process variables (e.g., typically 10 – 20 variables or more), it is difficult for human intuition to solve efficiently, for example, with the trial-and-error approach or the conventional design of experiment method (e.g., one variable at a time). The state of the art for applying machine learning tools in process optimization is still at a preliminary stage. In this work, we address the challenges of ML-assisted optimization of scalable perovskite PV processing by developing the interpretable framework of sequential learning and establishing an adaptable ML model using transfer learning. The open-air rapid spray plasma process (RSPP) for depositing and curing of perovskite films is a unique platform to test and deploy the proposed ML-guided framework because RSPP is able to conduct optimization experiments at high throughput with easy access to adjusting a wide range of process variables.
In this work, we develop a novel Bayesian-optimization-based model that utilizes the visual inspection of film quality as a probabilistic constraint and conducts a local optimization of the trusted regions after the global optimization. Using the developed model, we adopt the sequential learning framework (also known as active learning), which utilizes a small dataset to initiate an ML regression model and iteratively suggests and collects new data points to achieve the optimization task. Specially, six process conditions are identified as the key parameters to systematically optimize (namely, substrate temperature, process speed, spray flow rate, plasma height, plasma gas flow, and plasma duty cycle) in RSPP perovskite devices. With less than 200 experimental conditions (which is a tiny fraction of 40k experimental conditions in a grid search), we achieve: (i) faster process optimization and (ii) a better understanding of the heuristic relationship between PV performance and process variables using an interpretable and transferable machine learning framework. Finally, with the help of this ML framework, we are able to produce the highest perovskite PV efficiencies ever reported in ambient.
CT05.09: Materials Informatics III
Monday PM, April 19, 2021
4:00 PM - *CT05.09.01
Towards Small-Data-Driven Materials Science
Fritz Haber Institute of the Max Planck Society1Show Abstract
The number of possible materials is practically infinite, while only few hundred thousands of (inorganic) materials are known to exist and for few of them even basic properties are systematically known. In order to speed up the identification and design of new and novel optimal materials for a desired property or process, strategies for quick and well-guided exploration of the materials space are highly needed. A desirable strategy would be to start from a large body of experimental or theoretical data, and by means of artificial-intelligence (AI) methods, to identify yet unseen patterns or structures in the data, and consequentially predictive (data-driven) models. This leads to the identification of maps (or charts) of materials where different regions correspond to materials with different properties. The main challenge on building such maps is to find the appropriate descriptive parameters (called descriptors) that define these regions of interest.
Here, I present novel methods for the AI-aided identification of descriptors and materials maps, based on symbolic regression, compressed sensing , and multi-task learning . The mehods are tailored to yield predictive models (also) with "small-data", and are shown applied to important materials-science challenges such as the prediction of stability of perovskite materials, of novel topological insulators, and more.
I focus on the (verified) predictive power of the learned maps, which goes beyond the mere interpolation of more "traditional" AI approaches, and analyze current and future challenges.
 R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, and L. M. Ghiringhelli. "SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates." Physical Review Materials 2, (2018): 083802.
 R. Ouyang, E. Ahmetcik, Ch. Carbogno, M. Scheffler, and L. M. Ghiringhelli. "Simultaneous learning of several materials properties from incomplete databases with multi-task SISSO." Journal of Physics: Materials 2, (2019): 024002.
4:25 PM - CT05.09.02
Data-Driven Quantum Dot Synthesis Development in Flow
North Carolina State University1Show Abstract
Inorganic lead halide perovskite (LHP) quantum dots (QDs) have recently emerged as a promising class of semiconducting materials for next-generation, solution-processed optoelectronic devices.1 The unique properties of LHP QDs are mainly attributed to their high photoluminescence quantum yield (PLQY), high defect tolerance, facile bandgap tunability, and narrow emission linewidth.
Despite the substantial improvements in synthesis development of LHP QDs over the past five years, the conventional trial and error-based synthesis and formulation optimization methods have hindered their rapid adoption by energy technologies. Existing material development strategies very often fail to overcome the demands of colloidal QDs’ vast synthesis and processing universe, resulting in time- and cost-intensive QD development efforts.
Recent advances in modular robotic material manufacturing and artificial intelligence (AI)-guided decision-making strategies, including deep neural networks (DNNs) and reinforcement learning (RL), provide an exciting opportunity to reshape the synthesis development and formulation discovery of precision-tailored QDs through the autonomous operation of a robotic QD synthesizer.2
In this work, we have developed the second generation of our Artificial Chemist technology;3 a modular microfluidic platform 4 for data-driven, multi-step synthesis, optimization, and end-to-end manufacturing of colloidal LHP QDs. The AI-driven, multi-step chemical synthesis technique incorporates the modeling and uncertainty quantification of experimental responses and uses such models to strategically explore multivariate reaction space in a sequential, closed-loop, adaptive manner. This data-driven approach effectively couples the task of developing and training a surrogate model with the multi-objective optimization problem, which in practice results in accelerated chemical synthesis process development. The autonomous robotic QD synthesizer uses advanced DNNs and multi-stage decision-making policies trained on experimentally measured QD properties (generated in house) in tandem with state-of-the-art RL algorithms to strategically and intelligently explore QD design space, and thereby accelerate the synthesis optimization of LHP QDs for energy and chemical technologies. The reconfigurable Artificial Chemist technology utilizes a multimodal in-situ material diagnostic probe (absorption/photoluminescence (PL) spectroscopy) in conjunction with a real-time, ensemble DNN adaptive algorithm to enable simultaneous optimization of PLQY and size distribution of LHP QDs for any desired emission color. The developed autonomous robotic experimentation technology with its modular flow reactors and QD processing modules can be readily adapted for data-driven development of other solution-processed nanomaterials.
1. Akkerman, Q. A., et al., Nat. Mater. 2018, 17 (5), 394-405.
2. Abdel-Latif, K., et al., Matter 2020, 3 (4), 1053-1086.
3. Epps, R. W., et al., 2020, 32 (30), 2001626.
4. (a) Abdel-Latif, K., et al., Advanced Functional Materials 2019, 29 (23), 1900712; (b) Epps, R. W., et al., Lab Chip 2017, 17 (23), 4040-4047.
4:40 PM - CT05.09.03
Machine Learning Prediction of Creep Rupture Behavior for Metal Alloys
University of Hartford1Show Abstract
The design of alloys and composites with high creep resistance is of great interest, especially in high-temperature applications. Here we show that machine learning algorithms can predict creep behavior of steel alloys, more specifically creep-rupture lifetime and rupture stress, as a function of the alloy composition and processing methods. The machine learning approach was applied on a data set of 2000 alloys, and the predicted creep-rupture lifetime and rupture stress were in good agreement with most of the experimentally measured values. The results were mutually confirmed by two algorithms.
4:45 PM - CT05.09.04
Development of an Artificial Intelligence (AI) Based Image Processing Tool to Detect Microstructural Variations in AM Ti-6Al-4V
Rohan Casukhela1,Sriram Vijayan1,Meiyue Shao1,Matthew Jacobsen2,Joerg Jinschek1
The Ohio State University1,Air Force Research Laboratory2Show Abstract
Recent research progress has already indicated the promising potential of metal additive manufacturing (AM), e.g. when producing Ti-6Al-4V (Ti64) parts for aerospace applications. The AM process is inherently non-equilibrium in nature, caused by rapid thermal cycling and consequent steep thermal gradients, producing parts with complex, anisotropic, and metastable microstructures . As a result, there is a limited understanding of the physical metallurgy of AM, which restricts the qualification of AM parts for use in critical applications. Therefore, in order to rapidly qualify AM Ti64 parts with targeted properties, AM processes must be optimized based on a fundamental understanding of the AM Ti64 processing – microstructure – property (PMP) space. This gap in understanding can be reduced by developing statistical models assisting mapping the PMP space. These models must be sufficiently flexible to be applicable on a vast variety of datasets obtained from processing, characterization, and testing methods across multiple length scales. However, the low volume of microstructural data obtained from characterization techniques across multiple length scales is often identified as a significant barrier when developing statistical models with low uncertainty.
In the case of AM Ti64, scanning electron microscopy (SEM) based techniques provide valuable information about the variation in α/β microstructure throughout the AM build. High-contrast SEM images require long(er) dwell times, which limits the volume of images acquired in a reasonable amount of time . Here, we also acquired large-area SEM images using low dwell times and utilized a deep-learning network to improve image quality by denoising these low-resolution images. The accuracy of the extracted microstructural data was compared with findings based on high-quality SEM images . Artificial intelligence (AI)-based image processing tools that were used to extract microstructural information from these denoised images can serve as an input to the regression-based PMP model for AM Ti64. The uncertainty associated with model predictions will be presented. Finally, significance of the model will be discussed and alternative designs for subsequent experimental iterations to refine the model will be proposed.
 Shao M., Vijayan S., Nandwana P., Jinschek J.R. The effect of beam scan strategies on microstructural variations in Ti-6Al-4V fabricated by electron beam powder bed fusion, Materials and Design 196, 109165 (2020)
4:50 PM - *CT05.09.05
Coupling Machine Learning and Physics-Based Simulations to Accelerate Materials Design
Citrine Informatics1Show Abstract
Machine learning (ML) and physics-based simulations are highly complementary tools in computational materials design. ML models are flexible and computationally cheap, but they require sufficient training data and their accuracy suffers under extrapolation; physics-based simulations embed deep domain knowledge and often have broad applicability, but are computationally expensive. In this talk, we discuss using ML as a "glue" layer for multiscale simulations; automating physics-based simulations with the goal of enabling ML-driven sequential learning; and using uncertainty quantification in ML and simulations to improve both approaches to materials modeling.
5:15 PM - CT05.09.06
Predicting Fracture Stress of Defective Graphene Samples Using Artificial Neural Networks
Nuwan Dewapriya1,2,Nimal Rajapakse1,3,Priyan Dias4,Ronald Miller2
Simon Fraser University1,Carleton University2,Sri Lanka Institute of Information Technology3,University of Moratuwa4Show Abstract
The computational cost associated with atomistic modeling is often a bottleneck in the design and characterization of nanomaterials. For example, first-principles methods such as density functional theory can only be used to model a few hundred atoms at a time due to the extremely high computational cost associated with them, and therefore such methods cannot be used to study atomistic processes such as crack propagation. On the other hand, computationally efficient continuum mechanical approaches (e.g., finite element method) are not directly applicable to the nanomechanical problems because such methods do not generally consider surface energy effects and the discrete nature of atomic structures, which are quite significant when the characteristic dimension of a system is a few nanometers. Molecular dynamics (MD) simulations strike a balance between the computational cost and accuracy for nanomaterial modeling. Alternatively, machine learning techniques (e.g., neural networks) provide an opportunity to extract underlying features of data obtained from MD simulations while offering novel insights into the mechanics of nanoscale structures.
In this work, we employed both shallow and deep neural networks to predict the fracture stress of graphene samples containing various numbers and distributions of vacancy defects. Data required to model the neural networks was obtained from MD simulations. First, we developed shallow neural networks to predict the fracture stress of defective graphene samples at various temperatures and vacancy concentrations. Sensitivity analysis was performed to explore the features learned by the neural networks, and their behavior under extrapolation was also investigated. Subsequently, we developed deep convolutional neural networks to predict the fracture stress of graphene samples containing random distributions of vacancy defects. Results reveal that the neural networks have a strong ability to predict the fracture stress of defective graphene under various processing conditions. This work demonstrates some opportunities and challenges in modeling of neural networks using data obtained from MD simulations to solve complex nanomechanical problems.
Acknowledgement: This work was supported by the Natural Sciences and Engineering Research Council of Canada.
5:30 PM - CT05.09.07
Explaining Neural Network Predictions of Material Strength
Terrell Mundhenk1,Ian Palmer2,Brian Gallagher1,Barry Chen1,Gerald Friedland3,Yong Han1
Lawrence Livermore National Laboratory1,Massachusetts Institute of Technology2,University of California, Berkeley3Show Abstract
We recently developed a deep learning method that can determine the critical peak stress of a material by looking at scanning electron microscope (SEM) images of the material’s crystals. However, it has been somewhat unclear what kind of image features the network is keying off of when it makes its prediction. It is common in computer vision to employ an explainable AI saliency map to tell one what parts of an image are important to the network’s decision. One can usually deduce the important features by looking at these salient locations. However, SEM images of crystals are more abstract to the human observer than natural image photographs. As a result, it is not easy to tell what features are important at the locations which are most salient. To solve this, we developed a method that helps us map features from important locations in SEM images to non-abstract textures. We do this as follows. We obtain the most salient locations from the network trained on SEM images using a method called FastCAM. We selected it because it has been empirically shown to produce saliency maps that correspond well to what is actually important in the image. We also run texture images from the Describable Texture Dataset (DTD) through the SEM image trained neural network. For both SEM and DTD images, we extract a feature vector from the most salient location in the activation tensors. This gives us a feature vector which has better definition than if we averaged across layers, which is the most common method. We then find the nearest neighbors between SEM feature vectors and DTD feature vectors. This tells us which textures are most like the salient locations in our SEM images. By correlating over the full set of SEM and DTD feature vectors we then have clear texture trends we can relate to critical peak stress. For instance, a low peak stress correlates strongly with textures described as stratified, braided or bumpy. High peak stress correlates with textures such as flecked, dotted or grid like. We had prior hypothesized that flecked and dotted like textures were associated with a higher peak stress. However, the other textures revealed yet unknown processes that might affect material strength. We will discuss the pro’s and con’s of using DTD to describe the textures of materials as it relates to critical peak stress. For instance, some textures in the set should probably be split into subcategories and in general can be ambiguous.
CT05.10: Deep Learning and Computer Vision
Tuesday AM, April 20, 2021
8:10 PM - CT05.10.02
Using Deep Learning to Find High Performance Phase-Change Switchable Metasurface Reflectors
Jonathan Thompson1,2,Matthew Mills2
Azimuth Corporation1,Air Force Research Laboratory2Show Abstract
Metasurfaces of 1D gratings and 2D pillars arrays of the phase-change material Ge2Sb2Te5 (GST) are investigated as an active switchable reflector. The large index contrast between the amorphous and crystalline phase-states, in conjunction with the highly tunable metasurface geometry, allows us to selectively engineer an agile device that shows both high reflectance and transmittance for 2 μm wavelengths when switched between states. While the many degrees of freedom metasurfaces introduce are beneficial for spectral tuning, the large parameters spaces are also problematic insofar they are often impossible to exhaustively search when using traditional simulation methods like rigorous coupled wave analysis (RCWA). Nevertheless, we are able to completely search these large parameter spaces with millions of designs by employing artificial neural networks that have been trained using only a fraction of the total number of possible designs. These spectra predicting networks (SPN) achieve an accuracy of 98% and a simulation speed up of 105x compared to the RCWA simulations that they were trained on. The exhaustive search using SPNs allows us to find optimal designs, where we discover several GST metasurface designs with ~95% reflectance and ~80% transmittance between phase-states.
8:25 PM - CT05.10.03
Late News: Automatic Characterization of Single-Walled Carbon Nanotube Film Morphologies Using Computer Vision
Phillip Williams1,Nicole Rice1,Benoit Lessard1
University of Ottawa1Show Abstract
When establishing the relationships between processing conditions, film morphologies and performance of Single-Walled Carbon Nanotube (SWNT) based devices, the device characteristics are easily measured and quantified, and the processing conditions are known and tightly controlled. On the other hand, the film morphology is often known either at a qualitative level, or through manual analysis which is both slow and error prone, as well as an inefficient use of a researcher’s time. This gap presents a significant challenge to the construction of next-generation devices, since the current approaches focus on establishing processing condition to device performance relationships without being able to control for the actual morphologies of the SWNT devices which incidentally can vary significantly even at fixed processing conditions.
The use of Artificial Intelligence (AI) and Computer Vision (CV) can augment the current approaches used for analyzing devices by providing detailed metrics on the film morphologies using a software only approach, meaning that fast, accurate and reproducible analysis can be done at much larger scales and levels of detail than previously feasible.. The analysis pipeline uses Atomic Force Microscopy (AFM) height data to calculate metrics that are currently collected manually – such as linear density – as well as providing entirely new information not available from any machine or manual technique, such as the positions, orientations, and lengths of each individual carbon nanotube in a device.
There are two main steps to the data analysis procedure. First, a file containing the height of the film at various positions is ingested and segmented into a bitmap which represents the presence or absence of a nanotube at a given position in the film. The second step is application of various algorithms to the segmentation bitmap to extract relevant metrics. The segmentation processes consists of several steps which are designed to identify if a given point in a film belongs to a nanotube or not. This is accomplished by using adaptive thresholding techniques such as Otsu’s method or Yen thresholding, as well as more traditional techniques such as edge detection, image filters and data preprocessing. The final output of this segmentation process is a matrix of true and false values, indicating if a given pixel belongs to a nanotube (true) or if belongs to the substrate that the nanotube was deposited on (false). Once the segmentation is complete, two solvers are applied to the data. The first is an algorithm that iterates through each row and column of the segmentation bitmap and detects how many nanotubes intersect with that row or column. This allows us to compute the linear density of the film at every point in the device, providing an obvious advantage over the current practice of graduate students hand-counting the intersections at a few positions in the AFM image. The other solver is a Genetic Algorithm (GA) based solver which attempts to find a set of nanotube parameters – orientations, positions and sizes – which would produce the same film morphology observed. Once the parameters of each individual nanotube is extracted, secondary processing can extract metrics such as the distribution of sizes and orientations of the nanotubes, to give a clear set of quantifiable observations describing the film morphology. Using our analysis methodology, we can automate characterization traditionally done manually, as well as produce entirely novel insights into the film morphology of SWNT devices in a matter of minutes or hours without the need for specialized equipment or multi-million dollar hardware.
8:40 PM - CT05.10.04
Image Deconvolution and Resolution Enhancement in Scanning Probe Microscopy Using Deep Learning
Lalith Krishna Samanth Bonagiri1,Harry Feldman1,Yingjie Zhang1
University of Illinois at Urbana-Champaign1Show Abstract
Machine learning has been proved to substantially improve the existing imaging and characterization techniques in the field of material science. Currently, researchers in this area are using deep learning for classification and segmentation of the images/data for pattern recognition and thus enabling the automation of imaging techniques, particularly scanning probe microscopy (SPM). However, none of these well-developed deep learning techniques focussed on removing the tip convolution effect in SPM which would truly help in enhancing the spatial resolution of the resulting images. Tip convolution in SPM is a well-known phenomenon. To model this process mathematically, set theory and mathematical morphological operators such as dilation and erosion have been used previously by many researchers. However, it is quite cumbersome to select the size of the structuring element and the morphological operation to obtain the tip shape and an accurate ground truth image. Apart from that, there are no existing learning-based algorithms, which are primarily required because the tip evolves with time and so does the artifact. Here in, we report a deep learning framework to remove the tip effect through a feature pyramid based neural network. We use nanoparticles of various shapes as model systems to perform AFM experiments and deconvolute tip-shape effects and are able to successfully reconstruct the actual nanoparticle shape using the neural network. This work can be generalized to all scanning probe microscopy techniques and can enable image deconvolution in real-time during experimental measurements.
9:10 PM - CT05.10.05
Rapid and Flexible Classification of Scanning Transmission Electron Microscopy Data Using Few Shot Learning
Sarah Akers1,Elizabeth Kautz1,Bethany Matthews1,Le Wang1,Yingge Du1,Steven Spurgeon1
Pacific Northwest National Laboratory1Show Abstract
Control of property-defining materials defects for quantum computing and energy storage depends on the ability to precisely probe structure and chemistry at the highest spatial and temporal resolutions. Modern scanning transmission electron microscopy (STEM) is well-suited to this task, having yielded rich insights into defect populations in many systems. However, the dilute nature and complexity of materials defects, coupled with their varied representations in STEM data, makes reliable, accurate, high-throughput statistical defect analysis a significant challenge. Possible analysis approaches include low-level pixel processing, or even the application of machine learning methods for classification and image segmentation. However, the latter requires large sets of labeled training data that are difficult to obtain for many practical materials science studies. Here, we describe the use of an emerging few shot learning capability for rapid and flexible STEM data classification. This approach requires minimal information at the start of the analysis and uses a generally pre-trained encoder network to make inferences on experimental data. Our results show drastic improvements in data annotation costs, reproducibility, and scalability in comparison to neural network training from scratch. We demonstrate how few shot techniques can quickly extract feature maps and global statistics from a variety of STEM data, enabling a new quantitative understanding of defect populations.
9:25 PM - CT05.10.06
Machine Learning to Reveal Nanoparticle Dynamics from Liquid-Phase TEM Videos
Lehan Yao1,Zihao Ou1,Binbin Luo1,Cong Xu1,Qian Chen1
University of Illinois at Urbana-Champaign1Show Abstract
Transmission electron microscopy (TEM) has been applied to nanomaterials characterization for decades, and with the recent development of liquid-phase TEM, the nanoscale dynamics including nanoparticle diffusion and superlattice crystallization can also be revealed. However, the quantitative parameter extraction directly from image/video data is still prone to challenges such as high noise and inhomogeneous background. Conventional manual measurement produces better accuracy but suffers from extremely low efficiency, while computer algorithm-assisted segmentation is fast but is tricky in terms of parameterization. Here, we integrate the computer-based fast analysis together with the smart vision of human beings by applying machine learning algorithms. The talk will be focused on automated image analysis of conventional TEM and nanoscale dynamics study from liquid-phase TEM, both enabled by machine learning. In our customized workflow, we develop methods to simulate TEM images, serving as the training dataset for the U-Net neural network. We then apply the trained neural network to liquid-phase TEM videos of different colloidal nanoparticle systems, revealing a diversity of properties including their diffusion and interaction, reaction kinetics, and assembly dynamics. We expect our framework to push the potency of TEM to its full quantitative level in a high-throughput and statistically significant fashion.
9:30 PM - CT05.10.07
Deep Learning for Super-Resolved Atomistic Predictions from Atom Probe Tomography
Aditi Sonal1,Jith Sarker1,Baishakhi Mazumder1,Kristofer Reyes1
University at Buffalo, The State University of New York1Show Abstract
Atomistically resolved techniques such as Atom Probe Tomography (APT) and Molecular Dynamics simulations provide rich, complex and high-dimensional data. Often, however, such information is coarse-grained for ease of human-based analysis. Such coarse-graining results in loss of information present in the raw data. In this talk, we propose an alternative methodology using deep residual convolutional neural networks. We show how using such models, we can identify spatially-local, atomistically-resolved characterization of material structure. As an example, we apply the technique to identifying crystal phase in AlxGa1-xO3 material characterized by APT. We demonstrate a procedure for obtaining super-resolved identification of crystal structure to attenuate classification results in the face of noisy and incomplete data typical of APT. We also demonstrate how the use of synthetic atomistic simulations of crystal structure can assist in this atomistically-resolved structural characterization task, allowing the use of such DL-based techniques in situations where the presence of specific structures is ambiguous.
9:45 PM - CT05.10.08
Late News: Advances in Image Driven Machine Learning for Microstructure Recognition and Characterization
Arun Baskaran1,Elizabeth Kautz2,Wufei Ma1,Aritra Chowdhury3,Bulent Yener1,Daniel Lewis1
Rensselaer Polytechnic Institute1,Pacific Northwest National Laboratory2,GE Global Research3Show Abstract
Machine learning (ML) or artificial intelligence (AI)-enabled materials design and discovery has recently emerged as a new paradigm in material science. An important sub-field within this broad area is image-driven machine learning(IDML), which has supplemented material characterization techniques beyond what was possible with conventional stereography and image analysis methods. Microstructure characterization is important for material design as it enables the development of processing-structure-property relationships that are critical to several research areas such as alloy design, assessment of corrosion resistance, failure analysis, etc. In this presentation, the state of the art of IDML for materials characterization is discussed using a set of functional modules, defined such that each module performs a specific role in the IDML workflow. Such an overview permits answering granular questions about the field such as the impact of IDML at different spatial length scales, the diversity of machine learning models adopted in the field, etc. One of the emerging techniques to have been adopted in the field is generative models for constructing synthetic and novel microstructures, and towards data augmentation. The results from applying a progressive growing generative adversarial network (pg-GAN) to generate a dataset of synthetic microstructures of a binary U-Mo alloy is detailed. The quality of the synthetic microstructures is evaluated by comparing these images and the real images in Fourier space, and investigating the presence of correlated background noise. Finally, the work in progress in using a transfer learning strategy on GANs to generate microstructures is reported, and the effect of this strategy on the microstructure quality, convergence time, and the size of the training dataset is discussed.
10:00 PM - CT05.10.09
Leveraging Uncertainty from Deep Learning for Trustworthy Materials Discovery Workflows
Jize Zhang1,Bhavya Kailkhura1,Yong Han1
Lawrence Livermore National Laboratory1Show Abstract
We are witnessing a significantly growth of works that integrate deep learning to material application workflows. In such context, the uncertainty (or confidence) associated with the prediction from deep neural net would be of utmost importance, because it can be further leveraged to aid decision makings in cost-effective material discovery and synthesis process. Here, we investigate into answering several common challenging questions material scientists might encounter in such material application workflows, by leveraging the deep neural network’s predictive uncertainty information. We first show that the uncertainty information enables the user to determine the necessary amount of training data for the deep neural net to achieve the desired accuracy level. Next, we present a framework that guides the deep learning model to avoid making predictions on confusing samples based on the uncertainty information. Finally, we show that the uncertainty is also helpful in detecting out-of-distribution data. Specifically, we find out that an uncertainty-based out-of-distribution (OOD) detection scheme is already accurate to identify a wide range of real-world shifts in data, e.g., changes in the image acquisition conditions or changes in the synthesis conditions. We demonstrate the effectiveness of the proposed approach on a real-world example, where we classify molecular crystals produced under various synthesis/processing conditions from their microstructure information (scanning electron microscope images) using deep neural networks.
Amanda Barnard, Australian National University
Bronwyn Fox, Swinburne University of Technology
Manyalibo Matthews, Lawrence Livermore National Laboratory
Krishna Rajan, University at Buffalo, The State University of New York
Army Research Office
CT05.11: Applications III
Tuesday AM, April 20, 2021
8:00 AM - *CT05.11.01
Machine Learning Aided Discovery of Patterns in Crystal Chemistry
University at Buffalo, The State University of New York1Show Abstract
In this presentation we provide an overview of how we can transform the “genomics” paradigm in materials discovery to a “connectomics” approach to materials design. We provide examples of our new approach of mapping diverse information and discovering connections between structure and functionality in materials chemistry, by harnessing machine learning methods that utilize both the statistical and topological aspects of data. This can help to aid targeted materials discovery based on discovering the best pathways for future discoveries. We show how this “connectomics” approach can uncover hidden patterns associated with structure-property relationships in complex crystal chemistries.
8:25 AM - CT05.11.02
Discovering Relationships Between OSDAs and Zeolites Through Data Mining and Generative Neural Networks
Zachary Jensen1,Soonhyoung Kwon2,Daniel Schwable-Koda1,Rafael Gomez-Bombarelli1,Yuriy Roman-Leshkov2,Manuel Moliner3,Elsa Olivetti1
Massachusetts Institute of Technology1,Masdar Institute of Science and Technology2,Universitat Politècnica de València3Show Abstract
Zeolites are crystalline, microporous materials extensively used in various industrial applications including catalysis, water decontamination, and NOx abatement. Many of these zeolites are synthesized using an organic structure directing agent to guide towards that specific zeolite structure. Several strategies exist for finding suitable OSDAs for different zeolites including domain-specific heuristics, molecular dynamics simulations, and transition state mimicking. However, predicting new OSDAs for both existing and novel zeolite structures remains inexact and very challenging. To advance the goal of understanding interactions between OSDAs and zeolites, we take a data-driven approach that has been largely missing from the current state of research. We use natural language processing and text mining techniques to extract an exhaustive data set of known pairs of OSDAs and zeolites from the scientific literature. Next, we mine this data to elaborate on trends between the three-dimensional structure of the OSDA and the zeolite. Specifically, we examine several small-cage zeolites such as CHA, LTA, and AEI which exhibit strong correlations with OSDAs and have significant industrial applications. Finally, we develop a generative neural network trained on the literature data to suggest OSDA molecules for specific zeolites. We then use this model to examine several interesting zeolite systems including CHA and SFW to suggest potential replacement OSDAs.
8:40 AM - CT05.11.03
Graph-Based Deep Learning for Designing Stable Interfaces for Solid-State Batteries
Shubham Pandey1,Vladan Stevanovic1,2,Peter St. John2,Prashun Gorai1,2
Colorado School of Mines1,National Renewable Energy Laboratory2Show Abstract
Solid-electrolytes (SEs) offer numerous advantages over their liquid-state counterparts in solid-state batteries including, increased safety, wider operating temperatures, and higher energy densities. However, commonly used SEs and electrodes are thermodynamically unstable at the solid-solid interfaces. Decomposition phases forming at the interfaces may be ionically resistive or electronically conducting, which is undesired for battery design. There is a pressing need to search for new SEs and cathodes that will form stable metal anode-electrolyte and electrolyte-cathode interfaces. Thermodynamic stability of solid interfaces can be assessed through a convex hull construction based on the formation enthalpies of competing phases, which can be calculated from DFT total energies. To allow fast prediction of stability, in this work, we built crystal graph convolutional neural networks (CGCNNs) as a surrogate for predicting DFT total energies. Current implementations of CGCNN models for predicting total energy/formation enthalpy of inorganic materials are inaccurate for high-energy hypothetical structures. A model that is accurate for total energy predictions of not only ground-state (GS) but also high-energy hypothetical structures is desired when considering new materials for designing stable solid interfaces. To this end, we trained CGCNN models on DFT total energies from NREL Materials Database that contains GS or near-GS structures and a unique dataset containing ~10,000 hypothetical structures, including high-energy structures. The trained models achieve mean absolute errors of ~40 meV/atom for both GS and high-energy structures. These models will enable materials discovery efforts for designing stable interfaces in solid-state batteries as well as for other functional applications.
8:55 AM - CT05.11.04
Machine Learning Stability Rules for Complex Ionic Compounds and Its Application in the Discovery of New NASICON Materials
Bin Ouyang1,Jingyang Wang1,Tanjin He1,Christopher Bartel1,Haoyan Huo1,Yan Wang2,Valentina Lacivita2,Haegyeom Kim3,Gerbrand Ceder3
University of California, Berkeley1,Samsung Research America2,Lawrence Berkeley National Laboratory3Show Abstract
Nowadays, various tools are readily available to perform high-throughput computational discovery of new materials. The next grand challenge is to bridge the gap between computational prediction and experimental accessibility. To fulfill this goal, elemental compatibility is the first thing to be understood. In the field of ionic solids established stability rules such as the Goldschmidt tolerance factor and the Pauling’s rules work reasonably well in ionic solids that have relatively simple compositions and bond topologies, such as perovskites. However, when it comes to complex ionic compounds, the combinatorial space becomes very large and a handy stability model is mostly absent. To establish stability rules for complex ionic solids with the aid of machine learning tools, we take the NASICON type materials as a prototype system. The complexity of NASICONs is a challenge for ML as these compounds have four types of cation sites and are able to have both cation and polyanion mixing. In this work, we developed and applied a suite of high-throughput computation, physical interpretation, and machine learning tools to reveal the stability rules of NASICONs across all compositional and elemental space. We have decoupled the stability origin of NASICON materials into bond compatibility and site miscibility as well as present a two-dimensional “tolerance factor” that can be used to estimate the synthetic accessibility of NASICON materials with only basic elemental information. With the established stability principle, we have successfully synthesized more than ten new NASICONs in unexplored chemical spaces. We hope our work can provide insights in bridging the state-of-the-art machine-learning techniques and data-mining tools to accelerate the discovery of complex ionic compounds.
9:10 AM - *CT05.11.05
Combining Machine Learning and Multiscale Modeling for Accelerated Battery Manufacturing Optimization
Université de Picardie Jules Verne1,Réseau du Stockage Electrochimique de l'Energie (RS2E)2,Institut Universitaire de France3Show Abstract
Lithium ion batteries (LIBs) are playing a crucial role in the ongoing energy transition, in particular through the renewed emergence of electric vehicles. However, the increasing climate change requires us to develop innovative approaches to accelerate the optimization of LIBs. In this lecture i will present an innovative hybrid computational approach, combining machine learning (ML) and multi-scale modeling (MSM), allowing to predict the impact of manufacturing parameters on lithium ion battery (LIB) electrode properties. Manufacturing parameters include electrode slurry composition and solid to liquid ratio, coating speed, slurry drying temperature, calendering pressure, temperature and rolls speed. Resulting electrode properties include mesostructure information (spatial organization of active and inactive material particles, particles percolation, tortuosity factors, porosity, etc.) and electrochemical performance indicators (overpotentials and specific capacities upon galvanostatic discharge and charge). The ML techniques are used for predictive classification and regression, based on in house experimental databases, and for the acceleration of the parameterization of the physical-based models within the MSM workflow. The overall approach is developed in the context of the ARTISTIC project  and allows performing both direct and inverse design of the optimal manufacturing conditions maximizing given electrode descriptors such as energy and power density. Concrete demonstration examples will be provided on the basis of LIB electrodes made of graphite and Nickel-Manganese-Cobalt active materials, illustrating the strong capabilities of the approach to accelerate the optimization of LIB manufacturing processes.
 ERC Consolidator Project ARTISTIC (Advanced and Reusable Theory for the In Silico-optimization of composite electrode fabrication processes for rechargeable battery Technologies with Innovative Chemistries) (https://www.u-picardie.fr/erc-artistic/).
 Ngandjong, A.C., Rucci, A., Maiza M., Shukla, G., Vazquez-Arenas J., Franco, A.A., J. Phys. Chem. Lett., 8 (23) (2017) 5966.
 Lombardo, T., Hoock, J. B., Primo, E., Ngandjong, A. C., Duquesnoy, M., & Franco, A. A. (2020). Accelerated Optimization Methods for Force Field Parametrization in Battery Electrode Manufacturing Modeling. Batteries & Supercaps. https://doi.org/10.1002/batt.202000049
 Rucci, A., Ngandjong, A. C., Primo, E. N., Maiza, M., & Franco, A. A. (2019). Tracking variabilities in the simulation of Lithium Ion Battery electrode fabrication and its impact on electrochemical performance. Electrochimica Acta, 312, 168-178.
 Chouchane, M., Rucci, A., Lombardo, T., Ngandjong, A. C., & Franco, A. A. (2019). Lithium ion battery electrodes predicted from manufacturing simulations: Assessing the impact of the carbon-binder spatial location on the electrochemical performance. Journal of Power Sources, 444, 227285.
 Shodiev, A., Primo, E. N., Chouchane, M., Lombardo, T., Ngandjong, A. C., Rucci, A., & Franco, A. A. (2020). 4D-resolved physical model for Electrochemical Impedance Spectroscopy of Li (Ni1-x-yMnxCoy)O2-based cathodes in symmetric cells: Consequences in tortuosity calculations. Journal of Power Sources, 227871.
 Cunha, R. P., Lombardo, T., Primo, E. N., & Franco, A. A. (2020). Artificial Intelligence Investigation of NMC Cathode Manufacturing Parameters Interdependencies. Batteries & Supercaps, 3(1), 60-67.
 Duquesnoy, M., Lombardo, T., M., Chouchane, Primo, E. & Franco, A. A. (2020). Data-driven assessment of electrode calendering process by combining experimental results, in silico mesostructures generation and machine learning. Journal of Power Sources, in press (2020).
9:35 AM - CT05.11.06
Calibration of Thermal Spray Microstructure Simulations to Experimental Data Using Bayesian Optimization
David Montes de Oca Zapiain1,Theron Rodgers1,Dan Bolintineanu1,Carianne Martinez1,Aaron Olson1,Nathan Moore1
Sandia National Laboratories1Show Abstract
Thermal spray deposition is able to generate thick coatings of metals, ceramics and composites materials. This process is inherently stochastic and thus yields coatings that exhibit hierarchically complex internal structures that affect the overall properties of the coating. While, rules-based simulations are able to model the coating process; their accuracy and efficacy are governed by the set of pre-defined rules which are calibrated to specific material and processing conditions to accurately model particle spreading upon deposition. Nevertheless, not all parameters are able to be calibrated to experimental results. We present a protocol that automatically and efficiently determines the parameters that yield the synthetic microstructure with the closest statistics to the experimentally observed coating. This protocol starts by robustly quantifying the microstructure using 2-point statistics and then representing the statistical quantification in a low-dimensional space using Principal Component Analysis. Subsequently, our protocol leverages a combination of Gaussian Processes Regression and Bayesian Optimization to determine the parameters that yield the minimum distance between synthetic microstructure and the experimental coating in this low-dimensional space in an accurate and computationally-efficient manner.
This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy National Nuclear Security Administration under contract DE-NA0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sand no. SAND2020-11942 A
CT05.12: Data-Driven Chemistry II
Tuesday PM, April 20, 2021
11:45 AM - *CT05.12.01
Machine-Learning the Structural Stability of Intermetallic Phases with Domain Knowledge of the Interatomic Bond
Ruhr University Bochum1Show Abstract
The performance of machine-learning depends critically on the quality of the descriptors. In the case of learning atomic-scale properties, like formation energies obtained from density-functional theory (DFT) calculations, the descriptors are typically based on measures of the atomistic geometry and the distribution of chemical elements. Here, we construct descriptors that additionally include domain knowledge of the interatomic bond from a hierarchy of coarse-grained electronic-structure methods. In particular, we use tight-binding (TB) and analytic bond-order potentials (BOPs) that are derived from a second-order expansion of DFT. We demonstrate that a recursive solution of the TB problem and the closely related moments of the electronic density-of-states at the BOP level establish a smooth relation between the local atomic environment and the formation energy. This first level of domain knowledge of the interatomic bond shows high predictive power in machine-learning applications already with simple, qualitative TB/BOP models. We demonstrate this for the prediction of the formation energy and the band gaps of transparent conductors. As second level of domain knowledge we include the bond chemistry in terms of bond-specific TB Hamiltonians that are obtained from downfolding the DFT eigenspectrum of molecular dimers. As third level of domain knowledge we include the role of the valence electrons by determining approximate non-selfconsistent bond energies with BOP using bond-specific TB Hamiltonians. We demonstrate the application of these descriptors of the second and third level to the prediction of the formation energy of intermetallic phases and discuss their high relative importance as compared to other descriptors.
12:10 PM - CT05.12.02
Late News: Machine Learning Potentials for Copper Alloys
Angel Diaz Carral1,Xiang Xu2,Azade Yazdan Yar1,Siegfried Schmauder2,Maria Fyta1
Institute for Computational Physics, University of Stuttgart1,Institut für Materialprüfung, Werkstoffkunde und Festigkeitslehre (IMWF)2Show Abstract
Copper based alloys, due to their high electrical conductivity and high strength, are of great importance for electric and electronic applications such as connectors or lead frames. We investigate Cu-Ni-Si-Cr alloys with a different type and concentration of impurities through computational means. Our simulations provide us with structural data and energetics, which are further training Machine Learning (ML) algorithms to build ML potentials. These will in turn be used in larger scale simulations to design alloys with desired properties. We first provide a methodological approach based on the comparison of different ML descriptors in order to better understand and model the Cu alloys. We discuss the impact of this approach in providing a framework for an optimum design of multicomponent materials, such as alloys.
12:25 PM - CT05.12.03
Late News: Investigating Representations of Local Atomic Environments with Topology Optimization
Arindam Debnath1,Wesley Reinhart1
The Pennsylvania State University1Show Abstract
Advances in machine learning (ML) continue to accelerate materials discovery, but key challenges remain in knowledge representation for local atomic environments. State-of-the-art predictive performance can be obtained with many different types of features ranging from hand-crafted and human-interpretable features in the case of models like decision trees to learned features only meaningful to a computer in the case of approaches based on deep neural networks. Hand-crafted features informed by fundamental physics and human intuition often require less training to achieve robust performance, while purely learned features may provide less bias and greater predictive power since the model is free to identify the most salient descriptions of the data. In fact, many models fall in between these two extremes; a prevalent example are the rotation-invariant features which have been utilized to learn force fields for use in classical Molecular Dynamics (MD) simulations from ab initio calculations. While model development for these applications typically focuses on predictive performance, we are interested in the information that can be gleaned from the low-dimensional latent spaces which are learned as a by-product of these ML workflows.
We have recently developed a new representation of local atomic environments which is highly effective when combined with unsupervised manifold learning to embed the local environments of particles observed in classical MD simulations into a low-dimensional latent space. Our unsupervised scheme provides comparable performance to recent supervised methods while requiring only a fraction of the training data. However, this comes at a cost – our representation cannot be directly inverted to sample real-space structures from the latent space. Here we introduce a topology optimization scheme to generate ensembles of local atomic environments from the low-dimensional latent space. Representative samples of our rotation- and permutation-invariant features are drawn from the latent space in regions of interest such as metastable transition states. Then backpropagation is applied to optimize the real-space positions of atoms in the local environment to match the target features. This is similar in spirit to reverse Monte Carlo, except faster convergence can be achieved using gradient descent.
While less efficient for the inverse problem than deterministic, deep-neural-network-based approaches, our sampling scheme incidentally allows us to study the efficacy of different representations of local atomic environment. We investigate different choices of rotation- and permutation-invariant features by attempting to invert the environments observed in colloidal crystallization, ice nucleation, and binary mesophases back to real-space configurations. The results provide valuable information regarding advantages and limitations for different types of features, which should help select better features for predictive models in the future.
12:40 PM - CT05.12.04
Late News: Machine Learning Prediction of the Hubbard U for Materials Containing Transition Metals
Casey Brock1,Anand Chandrasekaran1,Yuling An1,Shaun Kwak1,Mathew Halls1
Schrödinger, Inc.1Show Abstract
Standard approximations to the exchange-correlation functional for density functional theory (DFT) calculations tend to over-delocalize d and f electrons, causing critical systematic errors in predictions of material properties. A standard approach to correcting these errors is applying a Hubbard potential via an additional parameter, U. Often, the value of U is chosen empirically to reproduce measured values of the band gap or other properties. Alternatively, the Hubbard U value can be calculated explicitly using linear response or perturbation theory approaches, but these calculations are tedious and computationally expensive. In this work, we outline a machine learning (ML) framework for predicting Hubbard U values for inorganic solids containing a single transition metal element. For this purpose, we generate a training set of over 200 periodic structures and the corresponding U values for the constituent transition metal sites based on density functional perturbation theory calculations. We then utilize descriptors for inorganic periodic systems to build ML models to predict Hubbard U values for novel structures outside the training set. The most important set of descriptors for predicting the Hubbard U potential for the examined materials systems are highlighted, with an analysis of outstanding trends and variations of its value across the d-block elements. Finally, we investigate the sensitivity of critical materials properties such as the formation energy and band-gap, to variations in the Hubbard U correction.
12:55 PM - CT05.12.05
Automated Training of Many-Body Machine Learned Force Fields
Jonathan Vandermause1,Boris Kozinsky1
Harvard University1Show Abstract
Machine learned (ML) force fields have emerged as a powerful tool for performing large-scale molecular dynamics simulations at near-DFT accuracy, but training many-body force fields that are interpretable, efficient, and uncertainty-aware remains an important open challenge. In this talk, we present Bayesian force fields that unite three popular frameworks—the Atomic Cluster Expansion (ACE), Gaussian Approximation Potentials (GAP), and Spectral Neighbor Analysis Potentials (SNAP)—opening the door to scalable, uncertainty-aware molecular dynamics simulations of complex materials. We first show how a multi-species generalization of the N-body ACE descriptor can be used to define a tunable many-body kernel similar to the Smooth Overlap of Atomic Positions kernel used in GAP models. We use this kernel to construct sparse Gaussian process (GP) force fields, and show that uncertainties derived from the predictive posterior distribution of the GP correlate with true model error on independent test sets.
The reliability of these uncertainties is the key feature of our approach that enables FLARE—Fast Learning of Atomistic Rare Events—an adaptive method for training force fields on the fly during molecular dynamics. This automated learning procedure takes an arbitrary structure as input and begins with a call to DFT, which is used to train an initial GP model on the forces acting on an arbitrarily chosen subset of atoms in the structure. The GP then proposes an MD step by predicting the forces on all atoms, at which point a decision is made about whether to accept the predictions of the GP or to perform a DFT calculation. The decision is based on the epistemic uncertainty of each GP force component prediction, which estimates the error of the prediction due to dissimilarity between the atom’s environment and the local environments stored in the training set of the GP. In particular, if any uncertainty exceeds a chosen multiple of the current noise uncertainty of the model, a call to DFT is made and the training set is augmented with the forces acting on the highest uncertainty local environments, the precise number of which can be tuned to increase training efficiency. All hyperparameters are optimized whenever a local environment and its force components are added to the training set, allowing the error threshold to adapt to novel environments encountered during the simulation. We show that the final trained GP can be mapped onto an equivalent and much faster linear model resembling SNAP and qSNAP models, which we implement in the molecular dynamics program LAMMPS to scale our ML-driven simulations to tens of thousands of atoms over nanosecond timescales.
Our automated procedure for training fast Bayesian force fields allows a wide range of material compositions to be explored in a closed-loop fashion with minimal human supervision. As a demonstration, we apply our method to the shape memory alloy nitinol, and show that we accurately capture the martensite/austenite phase transition that occurs in this material at ambient pressure. We discuss how model uncertainties can be used to systematically expand the training set, enabling exploration of the effects of dopants, defects, and nonequiatomic compositions on the martensitic transition temperature.
CT05.13: Automation and High Throughput II
Tuesday PM, April 20, 2021
2:15 PM - *CT05.13.01
Heterogeneous Sensing and Scientific Machine Learning for Quality Assurance in Laser Powder Bed Fusion
Prahalada Rao1,Aniruddha Gaikwad1,Brian Giera2,Gabe Guss2,Jean-Baptiste Forien2,Manyalibo Matthews2
University of Nebraska–Lincoln1,Lawrence Livermore National Laboratory2Show Abstract
Laser Powder Bed Fusion (LPBF) is the predominant metal Additive Manufacturing (AM) technique that benefits from a significant body of academic study and industrial investment. Despite LPBF’s widespread use, there still exists a need for process monitoring to ensure reliable part production and reduce post-build quality assessments. Towards this end, we develop and evaluate machine learning-based predictive models using height map derived quality metrics for single tracks and the accompanying pyrometer and high-speed video camera data collected under a wide range of laser power and laser velocity settings. We extract physically intuitive low-level features representative of the meltpool dynamics from these sensing modalities and explore how these vary with the linear energy density. We find our Sequential Decision Analysis Neural Network (SeDANN) model – a scientific machine learning model that incorporates physical process insights – outperforms other purely data-driven black-box models in both accuracy and speed. The general approach to data curation and adaptable nature of SeDANN’s scientifically informed architecture should benefit LPBF systems with an evolving suite of sensing modalities and post-build quality measurements.
2:40 PM - CT05.13.02
High-Throughput Correlative Microscopy and Spectroscopy for Nano-Laser Development
Patrick Parkinson1,Ruqaiya Al-Abri1,Hoyeon Choi1
The University of Manchester1Show Abstract
Single-element monolithic lasers which can be heteroepitaxially grown or heterogeneously integrated onto a silicon platform have been long sought for photonic integrated circuitry, and as nanoscale light sources for research.
A promising solution for integrated coherent light is using semiconductor nanowires produced with III-V1 or other emerging materials2. These structures allow easy hetero-integration onto silicon3, and they act as both waveguide and gain material, however to date, silicon-integrated, room-temperature and continuous lasing has proven elusive. One key reason for this has been the challenge of repeatable characterisation – a small variation in the geometry or material properties (such as doping) of each nanowire can have an amplified effect on lasing due to the inherent non-linearity of this process, masking systematic properties of this system. In other words, ensemble measurements are meaningless, while single nanowire measurements are insufficient in the presence of heterogeneity.
High-throughput imaging and spectroscopy can be used to provide geometrical, material and functional measurements of large numbers of single nanowire lasers. By studying 100 to 100,000 nanowire lasers from each growth, we correlate performance metrics such as threshold or lasing wavelength with controllable characteristics such as wire length, diameter, doping or transfer process. Using this methodology we have demonstrated material optimization for high quantum efficiency via doping4, cavity analysis using time-resolved interferometry5,6, and statistical improvements in fabrication7 processes. Each study has enabled record low optically pumped thresholds for III-V nanolasers towards room-temperature continuous lasing, and by integrating pick-and-place techniques8, we have demonstrated a combined high-throughput analysis and fabrication platform.
Our methodology – big-data for nano-optoelectronics – makes use of a high-speed data acquisition platform to generate large amounts of single-element data. We correlate electron microscopy, optical microscopy, spectroscopy and functional measurements on each wire through a self-registration process9 and stream this data to a commodity database. This data can be analysed through a MATLAB or Python package, maximising the potential for data reuse. We demonstrate this approach for two applications: optimization of single perovskite nanowire laser, and for an intra-material study of end-facet reflectivity. This methodology is widely transferable to other material systems where single-element correlative study is essential to separate underlying physical processes from high levels of inter-object heterogeneity.
(1) Eaton. Nat. Rev. Mater. 2016, 1 (6), 16028.
(2) Zhu. Nat. Mater. 2015, 14 (6), 636–642.
(3) Koblmüller. Semicond. Sci. Technol. 2017, 32 (5), 053001.
(4) Alanis. Nano Lett. 2018, 19 (1), 362–368.
(5) Zhang. ACS Nano 2019, 13 (5), 5931–5938.
(6) Skalsky. Light Sci. Appl. 2020, 9 (1), 43.
(7) Alanis. Nanoscale Adv. 2019, 1 (11), 4393–4397.
(8) Jevtics. Nano Lett. 2020, 20 (3), 1862–1868.
(9) Parkinson. Nano Futur. 2018, 2 (3), 035004.
2:55 PM - CT05.13.03
Implementation of Benchtop NMR as an Online, High-Throughput Sensor in Automated Synthesis Systems
Magritek Inc.1Show Abstract
The recent development of the high-performance benchtop Nuclear Magnetic Resonance (NMR) spectrometers equipped with high-resolution permanent magnets and flow cells enables a new way to do on-line/in-line monitoring of chemical reactions. Monitoring reactions with NMR provides not only the structural information about different chemical species generated during chemical reactions, but also their quantitative measurements (without the need of a calibration curve) to obtain the reaction’s kinetics. Benchtop NMR with reaction monitoring capability can be conveniently incorporated as an online, high-throughput sensor in automated synthesis systems working with artificial intelligence (AI). This presentation will highlight the implementation of benchtop NMR in monitoring chemical reactions, as well as serving as a sensor for an organic synthesis robot using a convolutional neural network.
3:10 PM - CT05.13.04
High-Throughput and Data-Driven Strategies for the Design of Deep Eutectic Solvent Electrolytes
Jaime Rodriguez1,Maria Politi1,Lilo Pozzo1
University of Washington1Show Abstract
Within the framework of green chemistry, Deep Eutectic Solvents (DES) have been identified as promising candidates for use in many applications, including battery electrolytes. DES are characterized by two or three materials that associate with each other through hydrogen bond interactions, resulting in a eutectic mixture whose freezing point is below that of the individual materials. This design space is overwhelmingly large and poses a challenge for screening a vast and diverse set of materials. Here we present a strategic approach consisting of high throughput experimentation (HTE) coupled with data science driven analysis to identify exceptional DES candidates based on key physiochemical and electrochemical properties. Much of our HTE adopts methods that are already used frequently in the biotech and pharmaceutical industries, most notably performing parallel syntheses and analyses in 96-well-plate formats. DES samples are first synthesized using an open-sourced automated liquid handling robot. DES melting points are then determined by monitoring the melting process with an infrared camera and identifying the temperature at which the thermal conductivity of the samples changes abruptly. The solubility of battery redox-species is determined via UV-VIS well-spectrophotometers. Finally, the electrochemical stability window and cycling properties of DES electrolytes are measured in high-throughput by using screen-printed electrodes on 96-well plates adapted for use with a standard potentiostat. The ability to rapidly and efficiently collect data also creates a need for the development and use of automated processes for data analysis, which have been developed in an open-sourced format by our group. This approach to HTE also allows for the incorporation of data science techniques, such as feature extraction and machine learning, that further aid in probing a design space that is ultimately too large for experimental methods alone.
3:25 PM - CT05.13.05
Machine Learning Modeling of Photodiode Signal for Selection of Laser Parameters in Laser Powder Bed Fusion Additive Manufacturing
Simon Lapointe1,Clara Druzgalski1,Gabe Guss1,Manyalibo Matthews1
Lawrence Livermore National Laboratory1Show Abstract
Parts produced through metal additive manufacturing suffer from irregular quality, often exhibiting dimensional inaccuracies and defects such as cracks, pores, underfill, spatter, and balling. The development of approaches to control and optimize the additive manufacturing process are essential to improve part quality. Process parameters such as the laser power, velocity, beam size, and scan path should be optimized throughout the manufacturing process. However, incorporating the combined effects of material, geometry, and complex underlying physics into the optimization strategy can be particularly challenging. In this work, a data-driven approach for the selection of laser process parameters is proposed. Stainless steel parts of different shapes and sizes are printed with varying laser power and velocity strategies while collecting photodiode signal data. Using machine learning, a forward model is built to predict the track-wise photodiode signal along the scan path using laser parameters and geometry features as inputs. This model helps understand and quantify how the photodiode signal is influenced by the laser process parameters (laser power and velocity) and the part geometry (dimensions, overhangs, corners, etc.). Additionally, an inverse model is built to predict the laser parameters corresponding to a given track-wise photodiode signal and geometry features. The inverse model allows the selection of laser parameters to maintain a desired photodiode signal. The performance of the inverse model is assessed by deploying it on a test part which geometry differs from the parts used for training.
3:55 PM - CT05.13.07
Late News: High-Throughput Reaction Screening for Accelerated Materials Research
All specialty chemicals and advanced materials are produced from lower value feedstock or precursors by specialized chemical reactions. Empirical reaction tuning is a laborious and expensive process. In contrast, reliable and efficient first-principles workflows can be employed in a high-throughput fashion to survey chemical design space and inform experimental development. Systematic evaluation of steric and electronic contributions provides an unprecedented fundamental understanding of factors controlling a target reaction. With these structure-property relationships, one can re-design a reaction or catalyst to achieve desired activity. Paired with automated high throughput screening the rate of discovery and understanding is accelerated.
● Reaction/Catalysts activity and selectivity predictions from simulations
● High-Throughput automated methods for in silico reaction screening
● Examples in areas of hydroformylation, epoxy-amine thermosets
CT05.14: Data-Driven Chemistry III
Wednesday AM, April 21, 2021
9:25 PM - CT05.14.02
Unique Challenges on NNP Development and Ways to Overcome Them
Wonseok Jeong1,Sungwoo Kang1,Changho Hong1,Jeong Min Choi1,Seungwu Han1
Seoul National University1Show Abstract
Recently, machine-learning (ML) approaches to developing interatomic potentials are attracting considerable attention because it is poised to overcome the major shortcoming inherent to the classical potential and density functional theory (DFT), i.e., difficulty in potential development and huge computational cost, respectively. In particular, the high-dimensional neural network potential (NNP) suggested by Behler and Parrinello is attracting wide interests with applications demonstrated over various materials. However, as ML potentials have a fundamental difference from traditional physics-based force fields, they have unique challenges to overcome for reliable accuracy. In this presentation, we share some of the challenges and our experiences in overcoming them with detailed examples.
First, we discuss the challenge of modeling defective systems. We found that to model defective systems with high accuracy, the sampling bias must be treated with care. Next, we show that when modeling dynamic systems with a large number of chemical reactions, the uncertainty estimation in atomic-resolution is necessary. The efficient uncertainty estimation in atomic-resolution can be obtained through a replica NNP ensemble. Furthermore, we will show that NNPs can be used as highly accurate surrogate models in exploring large space of crystal structures. This enables finding the stable crystal structure for complicated multicomponent systems. Finally, we discuss some of the future challenges in NNP development, and possible solutions to the challenges.
9:40 PM - CT05.14.03
Late News: Analysis on the Strengthening Mechanism of Aluminum Alloys with Bayesian Learning for Neural Networks
Shimpei Takemoto1,Kenji Nagata2,Takeshi Kaneshita1,Yoshishige Okuno1,Junya Inoue3,Manabu Enoki3
Showa Denko K.K.1,National Institute for Materials Science2,The University of Tokyo3Show Abstract
We discuss the strengthening mechanism of 2000 series aluminum alloy using neural networks. To understand the process-structure-property relationship in aluminum alloys, we have constructed a linear neural network with a single hidden layer whose input, hidden, and output layer nodes correspond to process parameters, structure features, and mechanical properties, respectively. We have applied the replica-exchange Monte Carlo method, an extended Markov chain Monte Carlo (MCMC) method, for the Bayesian inference of the optimal neural network architecture. When predicting ultimate tensile strength and tensile yield strength simultaneously, the Bayesian inference suggests that the neural network with two hidden layer nodes is optimal. This approach enables us to identify dominant combinations of additive elements and heat treatments for strengthening aluminum alloys. We have also conducted thermodynamic calculations of the equilibrium phase fraction using the Thermo-Calc software for each of the hidden layer nodes to discuss the strengthening mechanism of aluminum alloys.
9:55 PM - CT05.14.04
Accurate Band-Gap Database for Semiconducting Inorganic Materials—Implementation of Hybrid Functional
Sangtae Kim1,Miso Lee1,Changho Hong1,Youngchae Yoon1,Hyungmin An1,Dongheon Lee1,Wonseok Jeong1,Dongsun Yoo1,Youngho Kang2,Yong Youn1,Seungwu Han1
Seoul National University1,Incheon National University2Show Abstract
Among the various physical properties of materials, one of the fundamental characteristics of materials is band gap (Eg). The Eg of material provides a detailed description of the material’s optoelectronic, optical, thermoelectric, and electronic properties, and is a good indicator in distinguishing materials (conductors, semiconductors, and insulators). In particular, semiconductors are used in various fields because of the unique characteristics of materials. For instance, in photovoltaic devices, materials with a direct Eg of ~1.3 eV, corresponding to the Shockley-Queisser limit, are favored as photo-absorbers that maximize the solar-cell efficiency. In power electronics, semiconductors with Eg ≥ 3 eV are employed to sustain high electric fields. Currently, there are several inorganic material databases providing band gaps based on the Generalized Gradient Approximation (GGA) functional, including Materials Project, the Automatic Flow of Materials Discovery Library (AFLOWLIB), the Open Quantum Materials Database (OQMD), and the Joint Automated Repository for Various Integrated Simulations (JARVIS) (the JARVIS provides Eg based on meta-GGA, which significantly improves the accuracy). In spite of the importance of the band gap, however, calculated band gaps are underestimated due to the limitation of the conventional density functional theory (DFT) scheme. As a related issue, many small-gap semiconductors such as Ge, InAs, PdO, Zn3As2, and Ag2O, are misclassified as metals, which can affect selection of narrow-gap semiconductors in IR sensors, for instance. (In JARVIS, some of these errors are resolved by meta-GGA). In addition, the material containing un-filled d-bands shows an incorrect band gap because magnetics involved in spin configuration affect band splitting. For instance, the antiferromagnetic NiO has an experimental Eg of 4.3 eV, but the computational Eg ranges over 2.2-2.6 eV in the ferromagnetic ordering and GGA functional while the correct antiferromagnetic ordering produces 4.5 eV within the hybrid functional.
In this study, we construct an accurate bandgap database using Automated Ab initio Modeling of Materials Property Package (AMP2), which is a fully automated package for DFT calculations. The AMP2 has an exclusive algorithm to correct the band gap more accurately: Density of state (DOS) Indicator and ‘One-shot’ hybrid calculation. In order to consider the magnetic properties of spin ordering, the Ising model was also introduced to improve the accuracy of the Eg. The Ising model simply describes the magnetic exchange energy from the spin pair and exchange parameters. We newly devised a method to calculate the exchange parameter through the fitting method and optimize that. The database consists of 10,481 materials that encompass most inorganic solids with Eg ranging between 0 and 5 eV. To verify our database, we compared it with the experimental band gap dataset and magnetic structure database (MAGNDATA). Through 116 benchmark materials for band gap, the root-mean-square error (RMSE) with respect to experimental data is 0.36 eV, significantly smaller than 0.75-1.05 eV in the existing databases. The resulting data are available online at SNUMAT.
10:00 PM - CT05.14.05
Developing Machine-Learning Potentials from Disordered Structures for Crystal Structure Prediction
Changho Hong1,Jeong Min Choi1,Wonseok Jeong1,Sungwoo Kang1,Suyeon Ju1,Kyeongpung Lee1,Jisu Jung1,Yong Youn2,Seungwu Han1
Seoul National University1,National Institute for Materials Science2Show Abstract
Crystal structure prediction (CSP) is a problem of finding a structure with global-minimum free energy. It can be broken down into two problems: a searching algorithm in configuration space of structures and a method of evaluating free energy. Various heuristic algorithms such as genetic algorithms have been developed for efficient search. First-principles calculation based on density functional theory (DFT) has been chosen as its non-empirical nature allows accurate evaluation of energies. The DFT-based CSP has been successful in identifying inorganic crystals under extreme conditions and organic crystals. One interesting field to apply CSP is structure prediction of ternary or higher (simply multinary hereafter) inorganic crystals at ambient conditions. CSP can accelerate the low throughput of experimental synthesis. Considering major technological breakthroughs in display and battery industries have been achieved from new materials such as InGaZnO4 and Li10GeP2S12, more rapid discovery of novel materials is very important. However, multinary materials have a momentous increase of atomic arrangements where permutation of different species becomes an important factor. This demands much more efficient methods of evaluating free energy than DFT. Recently, machine-learning potentials such as neural network potential (NNP) draw much attention for its accuracy of DFT and yet much faster speed. But absence of information on structures imposes a big hurdle to constructing training sets.
To deal with this impediment, we propose a way to build NNP as a robust surrogate model of DFT for predicting the most stable structure of multinary compounds at ambient conditions. The central approach is to train an MLP over disordered structures such as liquid and amorphous phases. The molecular dynamics (MD) simulation of liquid phase can start from a random distribution and reach equilibrium quickly at high temperature. Then, the liquid phase is quenched to an amorphous phase. This makes it possible to forge training sets without any knowledge of unknown crystals except for the chemical composition. The short-range order in disordered phases resemble that of crystalline phases and local fluctuations during MD can also sample diverse local environments. To demonstrate accuracy of NNP, we compare NNP and DFT energies for Ba2AgSi3, Mg2SiO4, LiAlCl4, and InTe2O5F over experimental phases as well as low-energy crystal structures that are generated theoretically. For every material, we find strong correlations between DFT and NNP energies, ensuring that the NNPs can identify the most stable structures and properly rank low-energy crystalline structures. We also find that the evolutionary search combined with NNPs can evaluate low-energy metastable phases more efficiently than one with DFT. With this proposed way to training reliable machine-learning potentials for the crystal structure prediction, this research offers efficient method to search uncharted multinary phases.
10:05 PM - CT05.14.06
Efficient Sampling for Training Set of Machine Learning Potentials Using Metadynamics
Jisu Jung1,Dongsun Yoo1,Wonseok Jeong1,Seungwu Han1
Seoul National University1Show Abstract
Machine learning potentials (MLPs) are attracting enormous interests as a promising computational tool, which provides the accuracy of quantum mechanical calculations with the speed of classical interatomic potential. MLPs have extended their application to the complex system, including phase-change materials, nanoclusters, and catalysts. We have to make the training set cover diverse configurations, which can appear during the simulation, to get reliable simulation because MLPs can guarantee certainty only within the training set. The training set is made from a specific target static structure or DFT-based molecular dynamics (MD) simulations typically. However, such a manual sampling requires expertise in the target system and iterative trial-and-error data augmentation with enormous computation time.
Therefore, we suggest a metadynamics that uses the local environment of an atom, which is the input vector of MLPs, as collective variables. These collective variables keep the system from revisiting the symmetrically identical structure. We can sample a wide range of configurations autonomously where just high-temperature MD cannot visit. We demonstrate the metadynamics with a Pt surface with hydrogen, followed by an application on the GeTe system.