Jason Hattrick-Simpers, National Institute of Standards and Technology
Barnabas Poczos, Carnegie Mellon University
Markus Reiher, ETH Zurich
Aleksandra Vojvodic, University of Pennsylvania
Machine Learning: Science and Technology | IOP Publishing
Matter & Patterns | Cell Press
MT02.01/MT03.01: Joint Session: Autonomous Science I
Monday PM, December 02, 2019
Hynes, Level 2, Room 210
8:00 AM - MT02.01.01/MT03.01.01
Autonomous Research Systems for Materials Development—2019 Workshop Summary
Benji Maruyama1,Eric Stach2,Gilad Kusne3,Jason Hattrick-Simpers3,Brian DeCost3
Air Force Research Laboratory1,University of Pennsylvania2,National Institute of Standards and Technology3Show Abstract
This presentation will summarize the results of our “Autonomous Research for Materials Development Workshop,” where a multidisciplinary group of materials researchers, computer scientists and AI/ML experts explored the opportunities, barriers and future investments. Closed-loop autonomous research systems are disrupting the research process.
The current materials research process is slow and expensive; taking decades from invention to commercialization. Researchers are now exploiting advances in artificial intelligence (AI), autonomy & robotics, along with modeling and simulation to create research robots capable of doing iterative experimentation orders of magnitude faster than today.
We propose a “Moore’s Law for the Speed of Research,” where the rate of advancement increases exponentially, and the cost of research drops exponentially. We consider a renaissance in “Citizen Science” where access to online research robots makes science widely available. This presentation will highlight advances in autonomous research and consider the implications of AI-driven experimentation on the materials landscape.
8:30 AM - MT02.01.02/MT03.01.02
Self-Driving Laboratories for Accelerating Discovery of Thin-Film Materials
Curtis Berlinguette1,Jason Hein1,Alan Aspuru-Guzik2,3,Benjamin MacLeod1,Fraser Parlane1,Brian Lam1
The University of British Columbia1,Canadian Institute for Advanced Research (CIFAR)2,The University of Toronto3Show Abstract
This presentation will focus on our self-driving laboratory for thin film materials discovery and optimization. Discovering high-performance, low-cost materials is an integral component of technology innovation cycles, particularly in the clean energy sector. The linear methodology currently used to develop optimal materials can take decades, which impedes the translation of innovative technologies from conception to market. Our interdisciplinary team is utilizing advanced robotics and machine learning to overcome this challenge. We are closing the feedback loop in thin film materials research by enabling our self-driving robotics platform named “Ada” to design, perform, and learn from its own experiments efficiently and in real time. As a proof-of-principle set of experiments, we will show how Ada discovers and optimizes high-performance, low-cost hole transport materials for use in advanced solar cells. I will also showcase how Ada’s modular design can enable the automated and autonomous discovery of materials for other clean energy technologies.
9:00 AM - MT02.01.03/MT03.01.03
An Inter-Laboratory High Throughput Experimental and Open Materials Data Study of Sn-Zn-Ti-O
Jason Hattrick-Simpers1,Andriy Zakutayev2,Sara Barron1,Zachary Trautt1,Nam Nguyen1,Kamal Choudhary1,John Perkins2,Caleb Phillips2,Gilad Kusne1,Feng Yi1,Apurva Mehta3,Martin Green1
National Institute of Standards and Technology1,National Renewable Energy Laboratory2,SLAC National Accelerator Laboratory3Show Abstract
We present the results of an inter-laboratory high-throughput experimental (HTE) study which focused on measurement reproducibility and data exchange. Over the past 20 years, a great number of HTE techniques for synthesizing and characterizing thin-film oxides have been developed and reported. To date, however, there has not been a comprehensive study of how values measured for a series of properties (e.g. conductivity or optical band gap) on the same library compare across labs. Nor has there been a study that has attempted to normalize the hand-off of HTE samples and data. Here we report on the first such study using the Sn-Zn-Ti-O transparent conducting oxide system.
A series of Sn-Zn-Ti-O samples were deposited via Pulsed Laser Deposition and magnetron co-sputtering. At each institution a set of HTE measurements were made for typical properties including structure, thickness, conductivity, and optical bandgap. The samples were then exchanged between the two labs and the same set of properties were measured at the other lab and the data exchanged via an agreed upon uniform format.
A few lessons learned and several scientific observations regarding the reproducibility of HTE results gathered during this process will be discussed. An important lesson was the importance of deciding upon, and using, consistent measurement grids within a lab (and during exchanges) for all measurements, as this will impact future data archiving and retrieval. It was observed that qualitative trends are well reproduced even when two labs use very different methods for measuring a property, for instance ellipsometry versus transmission-reflection UV-VIS spectroscopy. However, quantitative comparisons were found to be measurement specific and spanned from excellent (bandgaps measured within a mean absolute error < 0.1 eV) to relatively poor (log resistivity measurements within a mean absolute error of 2). In the latter case, we believe that differences in sample probe geometries coupled to large changes in the properties of small composition regions were the most likely source of the poor correlation. The lessons learned and best practices obtained will be discussed.
9:15 AM - MT02.01.04/MT03.01.04
Automatic Microcrack Inspection in Photovoltaics Silicon Wafers by Unsupervised Anomaly Detection via Variational Auto-Encoder
Zhe Liu1,Felipe Oviedo1,Emanuel Sachs1,Tonio Buonassisi1
Massachusetts Institute of Technology1Show Abstract
The presence of microcracks in silicon wafers significantly reduces wafer strength, yielding wafer breakage during the manufacturing process, transportation and field operation. With the trend of decreasing wafer thickness for cost reduction purposes, thinner wafers are more prone to breakage in the presence of microcracks . To enable a smooth transition to thin wafers for even cheaper photovoltaic modules, we recently developed a high-throughput prototype for in-line crack detection for silicon wafers . This tool scans silicon wafer in the near-edge regions for micro-cracks and outputs linescan signals from a linescan camera, where no crack shows a smooth, undisrupted profile. As an in-line detection tool, it also requires a rapid and reliable algorithm that automatically identifies the presence of a micro-crack within a second after wafer scanning. In this work, we adopted an unsupervised machine learning method for anomaly detection, because the presence of microcracks above the critical length is typically a statistically rare event in the current PV production line (typically less than 5%). Specifically, a generative machine learning algorithm variational auto-encoder (VAE) is used to identify scans with microcracks . The working principle of this algorithm is that: (1) VAE encodes the linescan profiles into lower-dimension vectors of latent variables, and then the latent variables are reconstructed back to linescan profile with the goal of minimized error; (2) because of most linescan profiles are very similar smooth curves without any cracks, the VAE model is trained to be biased toward linescan without cracks; (3) whenever a linescan profile for a crack appears, the trained VAE model generates a vastly different profile with a significant reconstruction error; (4) the crack is then detected by monitoring anomalous reconstruction error. The advantage of this unsupervised VAE method over the previous neural network method  is that it does not require a large amount of labelled crack data with different crack shapes (which can be very difficult to obtain). We demonstrate successful crack detections with several different wafer types (e.g., multi, mono, as-cut, and textured) and crack shapes (e.g., line-shape, cross-star, L-shape). We show that, with statistical analysis, this VAE-based anomaly detection could be a reliable and versatile method to enable the rapid detection of microcracks in silicon wafers.
 S. Wieghold, Z. Liu, S. J. Raymond, L. T. Meyer, J. R. Williams, T. Buonassisi, and E. M. Sachs, “Detection of sub-500-μm cracks in multicrystalline silicon wafer using edge-illuminated dark-field imaging to enable thin solar cell manufacturing,” Solar Energy Materials and Solar Cells, vol. 196, pp. 70–77, Jul. 2019.
 Z. Liu, S. Wieghold, L. T. Meyer, L. K. Cavill, T. Buonassisi, and E. M. Sachs, “Design of a Submillimeter Crack-Detection Tool for Si Photovoltaic Wafers Using Vicinal Illumination and Dark-Field Scattering,” IEEE Journal of Photovoltaics, vol. 8, no. 6, pp. 1449–1456, Nov. 2018.
 H. Xu, W. Chen, N. Zhao, Z. Li, J. Bu, Z. Li, Y. Liu, Y. Zhao, D. Pei, Y. Feng, J. Chen, Z. Wang, and H. Qiao, “Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications,” Feb. 2018.
 M. Demant, T. Welschehold, M. Oswald, S. Bartsch, T. Brox, S. Schoenfelder, and S. Rein, “Microcracks in Silicon Wafers I: Inline Detection and Implications of Crack Morphology on Wafer Strength,” IEEE Journal of Photovoltaics, vol. 6, no. 1, pp. 126–135, Jan. 2016.
9:30 AM - MT02.01.05/MT03.01.05
Screening of High-Capacity Oxygen Storage Materials with Machine Learning Approach
Nobuko Ohba1,Takuro Yokoya2,Seiji Kajita1,Kensuke Takechi1
Toyota Central R&D Laboratories, Inc.1,Toyota Motor Corporation2Show Abstract
The oxygen storage material (OSM), such as CeO2 or pyrochlore type CeO2-ZrO2 (p-CZ), is used as a catalyst support for a three-way catalyst in automotive emission control systems. It has oxygen storage capacity (OSC) that is an ability to release and store oxygen reversibly by the fluctuation of cation valence depending on the reducing and oxidizing atmosphere. In this study, we explore high-capacity OSMs by using materials informatics (MI), which combines material science with inference algorithms in machine learning.
The OSC of 60 metal oxides supported Pd were experimentally estimated by the amount of produced CO2 while switching between oxidizing (O2) and reducing (CO) atmosphere every 2 minutes at temperature of 973, 773, and 573K. These experimentally measured OSC data were used as supervised data in our MI scheme. The support vector machine regression model was trained for the prediction of the OSC at each measured temperature. This model uses descriptors in which physical properties are considered to represent the features of the OSC. These features were automatically extracted using grid search to achieve each model with the highest accuracy. It is found that the features related to the stability of the oxygen atoms in the crystal and the crystal structure itself such as cohesive energy, which is obtained from the first-principles calculation, are highly correlated with the OSC. The present model predicts the OSC of 1,300 existing oxides registered in the in-house electronic structure calculation database. Several dozen materials with promising high OSC were proposed through this virtual screening. We synthesized one of the screened materials and experimentally confirmed that it indicates higher OSC than the conventional OSM, p-CZ.
9:45 AM - MT02.01/MT03.01
10:15 AM - MT02.01.06/MT03.01.06
The Metaphysics of Chemical Reactivity and Materials Discovery
University of Glasgow1Show Abstract
Discovery in chemistry falls mainly into one of four types of areas with the discovery of new molecules, new reactions, new reactivity, and finally new physical properties of the resulting compounds or materials. Establishing new reactivity leads to new reactions which also leads to new molecules. This is therefore the order of impact for discoveries in terms of the amount of chemical knowledge that they contribute. Such findings must, by definition, belong outside the known or predictable; and they are outliers and as such can oppose conventions, assumptions and biases. By developing the meta-physics of chemistry and chemical reactivity we should be able to establish a new set of ontologies in chemistry that relate back to the practical core operations, but also can be translated into molecular structures and the discovery of function. The truth of chemistry lies with finding the intrinsic reactivity of the input chemicals, and then encouraging or enabling reactivity by process control. Whilst the new discovery and reaction should be translatable to chemical bonding theory, chemists need to grapple with the fact that the application of the current rules will not allow discovery, instead they will act to restrict it to the known rules. So chemical discovery requires that the current rules are updated, broken, or new ones are made where before there were none. The discovery of Diels-Alder or cross-coupling reactions are excellent examples of new rules that were just discovered without any prior warning.
Without a deeper development of a meta-physics of chemistry the use of big data and artificial intelligence will just tell us what we already know we know, and maybe predictable extensions, rather than enabling discovery. The challenge for the chemist is not the use of artificial intelligence, but the intelligent use of algorithms and automation for novel discoveries. In this talk I will explain how this might be possible.
 J. Granda, L Donina, V. Dragone, D. -L. Long, L. Cronin, Nature, 2018, 559, 377-381
 L. J. Points, J. W. Taylor, J. Grizou, K. Donkers, L. Cronin, Proc. Natl. Acad. Sci. USA, 2018, 115, 885-890.
 V. Duros, J. Grizou, W. Xuan, Z. Hosni, D. -L. Long, H. N. Miras, L. Cronin, Angew. Chem. Int. Ed., 2017, 56, 10815-10820.
 S. Steiner, J. Wolf, S. Glatzel, A. Andreou, J. Granda, G. Keenan, T. Hinkley, G. Aragon-Camarasa, P. J. Kitson, D. Angelone, L. Cronin, Science, 2019, 363, eaav2211.
 A. Henson, P. S. Gromski, L. Cronin, ACS Cent. Sci, 2018, 4, 793-804
10:45 AM - MT02.01.07/MT03.01.07
Robot-Accelerated Perovskite Investigation and Discovery (RAPID)—A High-throughput Approach Towards Metal Halide Perovskite Single Crystal Discovery
Zhi Li1,Mansoor Ani Nellikkal2,Liana Alves2,Peter Parrilla2,Ian Pendleton2,Matthias Zeller3,Joshua Schrier4,Alexander Norquist2,Emory Chan1
Lawrence Berkeley National Lab1,Haverford College2,Purdue University3,Fordham University4Show Abstract
Metal halide perovskites have emerged as promising materials for next-generation photovoltaic and optoelectronic devices. The discovery and full characterization of new metal halide perovskite-derived materials have been limited by the difficulty of growing high quality single crystals needed for single crystal X-ray diffraction studies. The formation of large single crystals is non-trivial, owing to the vastness of the chemical search space with enormous compositional degrees of freedom. We present the first automated, high-throughput approach for metal halide perovskite single crystal discovery based on inverse temperature crystallization (ITC) as a means to rapidly identify and optimize synthesis conditions for the formation of high quality single crystals. Using our automated approach, we have carried out a total of over 5000 metal halide perovskite synthesis reactions spanning six chemical systems. Through this unbiased search of the experimental space, we have more than doubled the number of metal halide perovskite materials accessible by ITC method and discovered a new perovskite structure. Combining machine learning and other statistical methods, we quantify the total experimental space and the likelihood of large single crystal formation. Moreover, machine learning models have been constructed for each chemical system, in which single crystal formation is modeled. This work is a proof of concept that a combined approach of high throughput experimentation and machine learning can be effective in the study of metal halide perovskite crystallization. The approach presented here is designed to be generalizable to different synthetic routes for the acceleration of materials discovery.
11:00 AM - MT02.01.08/MT03.01.08
Optimizing Hole Transport Materials with a Self-Driving Thin-Film Laboratory
Benjamin MacLeod1,Fraser Parlane1,Thomas Morrissey1,Florian Häse2,3,4,Loïc Roch2,3,4,Kevan Dettelbach1,Raphaell Moreira1,Lars Yunker1,Michael Rooney1,Joseph Deeth1,Veronica Lai1,Gordon Ng,Henry Situ1,Ray Zhang1,Alán Aspuru-Guzik2,3,4,Jason Hein1,Curtis Berlinguette1
The University of British Columbia1,Harvard University2,University of Toronto3,Vector Institute for Artificial Intelligence4Show Abstract
Self-driving laboratories combine algorithmic data analysis and experiment planning with robotic workflows to autonomously optimize one or more experimental figures of merit. This approach is applicable to challenging multi-parameter and multi-objective optimization problems such as the optimization of thin film materials within the vast design space of composition, deposition, and processing conditions. Here we describe a self-driving laboratory capable of formulating inks, depositing films via spin-coating, characterizing the resulting thin films using multiple techniques, and planning new experiments based on previous experimental data using the ChemOS experiment orchestration software1. The utility of this self-driving laboratory is demonstrated by autonomously optimizing optical and electronic properties of hole transport materials, which are crucial to the operation of a variety of thin-film-based optoelectronic devices. The autonomous optimization manipulates the film composition and annealing protocol to maximize a hole-mobility surrogate obtained by fusing data from transmission-reflection UV-Vis-NIR spectroscopy and 4-point probe conductivity measurements.
1. MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. arXiv [physics.app-ph] (2019)
11:15 AM - MT02.01.09/MT03.01.09
Convergence of Microfluidics, Colloidal Synthesis and Machine Learning—Real-Time Optimization of Halide Exchange Reactions of Colloidal Inorganic Perovskites Quantum Dots
Robert Epps1,Michael Bowen1,Kameel Abdel-Latif1,Milad Abolhasani1
North Carolina State University1Show Abstract
In the development of next-generation photovoltaics and light-emitting diodes, colloidal inorganic perovskite quantum dots (PQDs) have drawn notable attention for their highly tunable bandgap properties, high-charge carrier mobility and defect tolerance, and adaptability towards solution phase processing. However, studies of this material group and other colloidal semiconductor nanocrystals requires extensive exploration of their massive reaction parameter space within highly controlled reaction environments. Conventional flask-based, trial-and-error approaches are, therefore, highly unlikely to effectively capture the full potential and optimal synthesis conditions of these high-priority materials. Further complicating this process, across the accessible bandgap range, optimal synthesis parameters will vary significantly. Flow synthesis platforms have recently been demonstrated as a time- and material-efficient reaction monitoring strategy for synthesis, screening, and optimization of colloidal nanomaterials. The high sampling rate, low chemical consumption, and precise process control (automation) of flow reactors greatly reduces the challenges in exploring complex reaction spaces; however, high-throughput reaction screening technologies alone are likely not able to make significant breakthroughs, due to the massive scope of relevant colloidal synthesis conditions.
In this work, we present a modular microfluidic platform integrated with a machine learning (ML)-enhanced reaction optimization algorithm for on-demand synthesis of high-quality inorganic perovskite QDs with desired optical properties using a homogeneous anion exchange reaction. The intelligent QD synthesis platform consists of multiple computer-controlled pumps for on-demand reagent delivery/dosing, a flow path selector valve for automated selection of the halide salt source, and an in-line flow cell for automatic UV-Vis absorption and photoluminescence spectroscopy. Utilizing a utility function, an array of trained neural networks, and a global optimization algorithm, the intelligent QD manufacturing platform, approaches a target emission bandgap, while minimizing emission linewidth and maximizing quantum yield by tuning the concentrations of the precursors. Halide salt precursors are mixed within a highly efficient inline micromixer before combining with the perovskite QDs and gas-liquid segmentation. Monitoring each halide exchange condition requires less than 180 uL of total halide salt precursor and 170 uL of starting perovskite QDs per sample.
Integration with a ML-enhanced optimization algorithm enables the system to reach optimized synthesis conditions, across all six variables, for a target emission energy of 2.2 eV in 238 samples and 83 mL of chemicals without any prior training. More advanced optimization methods and pre-training with archived experimental data will further reduce this optimization time and cost. The versatility and modularity of the developed intelligent QD synthesis platform make it readily adaptable for on-demand synthesis of other colloidal nanomaterials.
11:30 AM - MT02.01.10/MT03.01.10
Autonomously Optimizing Thin Film Morphologies Using Machine Vision
Fraser Parlane1,Benjamin MacLeod1,Nina Taherimakhsousi1,Alan Aspuru-Guzik2,Jason Hein1,Curtis Berlinguette1
The University of British Columbia1,University of Toronto2Show Abstract
The morphologies of solution-deposited thin films are frequently governed by complex combinations of processes from domains including multi-phase fluid flow, heat transfer, nucleation, solid mechanics, and interfacial phenomena. This complexity can frustrate both theoretical and empirical attempts to understand and control the morphologies of such films. Here we report an autonomous robotic system that uses machine vision feedback to determine optimal experimental parameters to achieve homogeneous, high-quality films via spin coating. An ink-formulating and spin-coating robot equipped with an imaging system1 provides images of thin films to a computer vision algorithm which grades the quality of the thin films. This grading assessment provides input to an optimization algorithm that chooses the next ink formulation with the goal of identifying regions in the parameter space of ink formulation and spin-coating conditions that result in high-quality films.
 MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. arXiv [physics.app-ph] (2019).
MT02.02/MT03.02: Joint Session: Autonomous Science II
Monday PM, December 02, 2019
Hynes, Level 2, Room 210
1:30 PM - MT02.02.01/MT03.02.01
Quantum Machine Learning in Chemical Space
Guido Falk von Rudorff1,Anatole von Lilienfeld1
University of Basel1Show Abstract
Many of the most relevant chemical properties of matter depend explicitly on atomistic and electronic details, rendering a first principles approach to chemistry mandatory. Alas, even when using high-performance computers, brute force high-throughput screening of compounds is beyond any capacity for all but the simplest systems and properties due to the combinatorial nature of chemical space, i.e. all compositional, constitutional, and conformational isomers. Consequently, efficient exploration algorithms need to exploit all implicit redundancies present in chemical space. I will discuss recently developed alchemical perturbation theory and quantum machine learning based approaches for interpolating quantum mechanical observables in compositional and constitutional space. Numerical results of our models indicate controlled accuracy and favourable computational efficiency.
2:00 PM - MT02.02.02/MT03.02.02
AI for Automating Materials Discovery
Bruce van Dover1,Carla Gomes1
Cornell University1Show Abstract
Artificial Intelligence (AI) is a rapidly advancing field. Novel machine learning methods combined with reasoning and search techniques have led us to reach new milestones with increasing frequency, from self-driving cars to computer vision, machine translation, computer Go trained on human play, to Go and Chess world-champion level play using pure self-training strategies. These ever-expanding AI capabilities open up exciting new avenues for automating scientific discovery. I will discuss our work on using AI for accelerating and automating materials discovery. In particular, we have focused on high-throughput structure determination for combinatorial materials discovery and on solving the phase map diagram problem for composition libraries. While standard statistical and machine learning methods are important to address this challenge, they fail to incorporate relationships arising from the physics of the underlying materials. I will introduce an effective approach based on a tight integration of machine learning methods, to deal with noise and uncertainty in the measurement data, with optimization and inference techniques, to incorporate the rich set of constraints arising from the underlying physics. Finally, I will describe our vision and progress concerning Scientific Autonomous Reasoning Agent (SARA), a multi-Agent system to accelerate materials discovery integrating in a synergistic and complementary way, first principles quantum physics, experimental materials synthesis, processing, and characterization, and AI based algorithms for reasoning and scientific discovery, including the representation, planning, optimization, and learning of materials knowledge.
2:30 PM - MT02.02.03/MT03.02.03
Machine Learning Methodologies to Enhance Automated Synthesis of New Materials
Gaurav Chopra1,Jonathan Fine1,Armen Beck1
Purdue University1Show Abstract
Functional groups link analytical, physical, organic, and materials chemistry and are therefore central to the chemical sciences. In both analytical and organic chemistry functional groups are used to predict the reactivity of molecules, select a solvent for a given reaction, and validate a reaction using measurable changes in the properties of a molecule. Current approaches to incorporate functional groups in the prediction, planning and verification of reaction conditions rely on human intervention and input. For example, the solvent used for a given transformation is chosen by a skilled organic chemist using intuition gained from the study of how the functional groups in a molecule dictate its solubility in a solvent. To verify if the reaction took place resulting in an unknown chemical entity, the state-of-the-art method is to accurately identify all functional groups of the reactants and products. This process is time-consuming, involves manual or database dependent analysis and interpretation of a Fourier Transform Infra-Red (FTIR) spectrum or Mass Spectroscopy (MS) data using previously established rules and experience of a skilled spectroscopist. These processes are subject to trial and errorfor compounds with multiple functional groups and for compounds that not well characterized in the literature. Such issues hinder the automated development and characterization of truly new materials with minimal human intervention. We present fast deep learning methods to select the optimal solvent for a given reaction in a transformation-free manner and identify all the functional groups for both the products and reactants for any given reaction. Our methods do not use any database, pre-established rules or procedures to perform either task and use the general definition of functional groups as a ‘collection of atoms’ instead of simple chemical groups traditionally assigned by chemists. We use Artificial Neural Networks (ANNs) to derive patterns and correlations between these collections of atoms and the solvents used to carryout a given chemical reaction using 2.3 million patented reactions available from the United States Patent and Trademark Office. Our methodology is the first to differentiate solvents by their precise chemical structure instead of simply choosing a solvent class and yields a 5-fold cross-validated average F1-score greater than 0.9. Solvent predictions obtained from this model have been validated both in silico using Density Functional Theory and using experimental in situ techniques. To verify that a reaction has occurred, we trained separate ANNs on 7393 publicly available FTIR and MS combined spectra obtained from the NIST Webbook. Instead of using multiple binary classifiers used in previous works to assign functional groups, our approach treats the classification problem in a multi-class, multi-label fashion. The model has a cross-validated F1 score higherthan 0.82 for 14 out of 17 defined functional groups. To showcase the practical utility of our method, we introduce two new metrics (Molecular F1 score andMolecular Perfection rate) to measure the performance of identifying all functional groups on molecules. The optimized model has a Molecular F1 score of 0.92 and a Molecular Perfection rate of 72%. Additionally, backpropagation of our model reveals IR patterns typically used by human chemists to identify standard groups, suggesting a convergence of the model on known spectral features that are diagnostic of particular functional groups. We further show that the introduction of additional functional groups does not decrease the performance of our model. Finally, we show redundancy in FTIR and MS data by encoding all our features in a latent space that retains the accuracy of the original model. These results reveal the importance of using machine learning for automated identification of new reaction conditions and functional groups to achieve autonomous processes in the future.
2:45 PM - MT02.02.04/MT03.02.04
Autonomous Research Systems—Phase Mapping & Materials Optimization
National Institute of Standards and Technology1Show Abstract
The last few decades have seen significant advancements in materials research tools, allowing researchers to rapidly synthesis and characterize large numbers of samples - a major step toward high-throughput materials discovery. Machine learning has been tasked to aid in converting the collected materials property data into actionable knowledge, and more recently it has been used to assist in experiment design. In this talk we present the next step in machine learning for materials research - autonomous materials research systems. We first demonstrate autonomous measurement systems for phase mapping, followed by a discussion of ongoing work in building fully autonomous systems. For the autonomous measurement systems, machine learning controls X-ray diffraction measurement equipment both in the lab and at the beamline to identify phase maps from composition spreads with a minimum number of measurements. The algorithm also capitalizes on prior knowledge in the form of physics theory and external databases, both theory-based and experiment-based, to more rapidly hone in on the optimal phase map. The phase map is then exploited for functional material optimization.
3:00 PM - MT02.02/MT03.02
3:30 PM - MT02.02.05/MT03.02.05
Information Extraction and Learning by Large-Scale Text-Mining of the Scientific Literature
University of California, Berkeley1Show Abstract
The overwhelming majority of scientific knowledge is published as text, which is difficult to analyze by either traditional statistical analysis or modern machine learning methods. In contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between the data items as interpreted by the authors. I will show multiple ways to extract useful information from scientific text in both supervised and unsupervised ways. I will show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings (vector representations of words) without human labelling or supervision. These embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications.
In a more supervised way, we have also demonstrated the extraction of codified synthesis recipes from text. Extraction the details of synthesis, including precursor compounds, synthesis operations and their numerical details, requires a very high precision of information extraction, and a tolerance to deal with imprecise and non-standard language. I will show how a large data set of codified solid-state synthesis reactions has been obtained and be queried to obtain interesting information on choice of synthesis operations and precursors is related to the target material.
4:00 PM - MT02.02.06/MT03.02.06
Autonomous Scanning Droplet Cell for On-Demand Alloy Electrodeposition and Characterization
Brian DeCost1,Howie Joress1,Trevor Braun1,Zachary Trautt1,Gilad Kusne1,Jason Hattrick-Simpers1
National Institute of Standards and Technology1Show Abstract
We are developing an autonomous scanning droplet cell (ASDC) capable of on-demand electrodeposition and real-time electrochemical characterization for investigating multicomponent alloy systems for favorable corrosion-resistance properties. The ASDC consists of a millimeter-scale electrochemical cell and an array of programmable pumps that can be used to electrodeposit an alloy film and immediately acquire polarization curves to obtain electrochemical quantities of interest, such as the passive current density and oxide breakdown potential. We model these quantities using Gaussian Process regression to select the most informative series of alloys to synthesize and characterize, continuously updating the model as new electrochemical data is acquired. Our initial studies focus on systems that are likely to form corrosion-resistant metallic glasses (MGs) and single-phase multi-principle element alloys (MPEAs).
The ASDC is an open exemplar autonomous system that provides insight into both technical and methodological aspects of building and deploying robust closed-loop synthesis and characterization platforms. Our approach is to build loosely-coupled modular experimental, automation, and communications systems to 1. support rapid prototyping, debugging, and verification while producing meaningful scientific output and 2. enable integration into future large-scale multi-user and multi-instrument distributed laboratory systems. We address both of the main challenges that autonomous science systems face: learning to reliably synthesize materials and mapping material specification and processing to structure and properties. We will discuss the incorporation of prior knowledge in the form of theoretical and data-driven predictive models, as well as the integration of online and offline multi-modal experimental data streams. Ultimately, closed-loop automated materials synthesis and characterization platforms offer much more than a means of engineering materials properties and performance through black-box optimization algorithms: they offer the potential to develop and deploy new algorithms for generating and testing scientific hypotheses.
4:30 PM - MT02.02.07/MT03.02.07
Autonomous Electrolyte Discovery for Batteries with Experimentally Informed Bayesian Optimization
Adarsh Dave1,Sven Burke1,Jared Mitchell1,Kirthevasan Kandasamy1,Biswajit Paria1,Barnabas Poczos1,Venkatasubramanian Viswanathan1,Jay Whitacre1
Carnegie Mellon University1Show Abstract
An autonomous battery electrolyte experimental platform capable of mixing multi-component electrolyte systems and characterizing the transport and electrochemical properties in a high-throughput manner is disclosed. A Bayesian optimization software package found novel electrolyte compositions through optimization of the high-dimensional electrolyte design space over key design objectives like electrochemical stability and conductivity.
Electrolyte optimization is difficult because 1) electrolyte evaluation is expensive and takes time, and 2) the space of possible electrolytes is expansive, formed by many possible choices for solvents (often in ternary blends), salts (often in binary mixtures), and trace additives. Bayesian optimization methods are well suited for the optimization of high-dimensional functions with costly evaluations, often producing an efficient design-of-experiments to converge on multi-objective optimal formulations in few experiments. To expedite the optimization over the expansive design space, theoretical predictions of electrolyte properties via the Advanced Electrolyte Model were utilized as “priors” in the statistical model.
Implementation of the novel experimental platform was carried out by two novel test stands developed to automate the mixing and characterization of electrolytes: Otto (for aqueous systems) and Clio (for aprotic/organic systems). The test stands characterized the ionic conductivity and electrochemical stability of electrolyte systems, featuring a four-electrode conductivity probe, pH meter, and a flow-through three-electrode cell and potentiostat. Clio also integrated electrochemically active electrodes for optimization of electrolyte/electrode systems. The active electrode systems used were common functional ceramic metal oxides.
A software orchestration and data layer linked the test-stands to human experimenters and machine-learning packages through a web-services architecture; all experiment data and meta-data is saved in a database. Additional out-of-the-loop characterization was conducted on cathodic systems to validate composition, structure, and oxidation state.
The aqueous design space consisted of aqueous blends of lithium and sodium salts, including nitrates, sulfates, and other commonly-used battery salts. High-concentration aqueous electrolyte candidates were discovered by optimizing of electrochemical voltage stability and conductivity, including low-cost, high-performing alternatives to known but costly aqueous electrolytes (e.g. LiTFSI). Clio’s design space includes blends of both aprotic organic solvents and solutes in additional to various compositions of electrochemically active electrodes. The test-stands are demonstrated to be significantly faster than common human experimentation techniques, converging on novel, optimized electrolyte mixtures in mere hours or days of experimentation.
MT02.03: Poster Session I: Autonomous Science
Tuesday AM, December 03, 2019
Hynes, Level 1, Hall B
8:00 PM - MT02.03.01
Autonomous Experimental Phase Analysis of Oxide Systems Demonstrated via Optical Imaging and Spectroscopy
Aine Connolly1,Duncan Sutherland1,Max Amsler1,Sebastian Ament1,Michael Thompson1,Bruce van Dover1,Carla Gomes1
Cornell University1Show Abstract
Efficient autonomous exploration of phase behavior requires real-time dynamically coupled experiments and characterization that can lead to intelligent search trajectories through high-dimensional system parameter space. Employing lateral gradient Laser Spike Annealing (lg-LSA) with a combination of optical microscopy and optical spectroscopy, we demonstrate such a closed loop experimental protocol that enables efficient and rapid mapping of transition boundaries across a wide range of thin film oxide stoichiometry and synthesis conditions. Laser Spike Annealing (lg-LSA) rapidly heats thin oxide films which quench to various metastable phases depending on the peak temperature and cooling rate. With a lateral annealing temperature gradient of 2 – 3 K/μm, spatially localized probes (e.g. microbeam X-ray diffraction or optical spectroscopy) can readily characterize the resulting structures. However, only a few temperatures near transitions are necessary to establish the phase behavior, and an autonomous search must efficiently probe those neighborhoods. Coupling immediate post-exposure imaging and spectroscopy of critical temperature bands after each individual lgLSA exposure, subsequent experiments can be optimized to guide the search with loop-times of one to two minutes. This autonomous closed-loop experimental process thus enables rapid identification of synthesis condition boundaries at phase transitions. Results are shown for a variety of oxide thin films including Bi2O3 and MnTiO3.
8:00 PM - MT02.03.02
Accelerating Materials Discovery with Autonomous Job Control Systems Aided by Machine Learning
Chenru Duan1,Jon Paul Janet1,Aditya Nandy1,Fang Liu1,Heather Kulik1
Massachusetts Institute of Technology1Show Abstract
Computational high throughput data generation is essential in materials discovery. Although automation of first-principles simulation has enabled rapid data generation, challenges remain in the most compelling materials spaces (e.g., open-shell transition metal chemistry) to ensure that the data is of sufficient quality, either for automated determination of design rules or in training of machine learning property prediction models. For inorganic chemistry, two key challenges remain at the stage of first-principles data generation: i) chosen ligands and metals may not form a stable complex and ii) calculations may fall outside the domain of applicability of the first-principles method. When such challenges are encountered, they can erode the value of data points in a data set, and such points may need to be removed. Typically, one detects these failures only after a calculation is completed, wasting computational resources. To address this problem, we incorporated machine learning (ML) models into our high-throughput discovery tools for inorganic chemistry. These ML models are capable of predicting the outcomes of a calculation. We built a calculation outcome classifier directly from topological, heuristic features prior to simulation to prescreen a large pool of candidate materials without requiring any first-principles calculations. Inspired by the data distribution in the latent space, we designed an uncertainty quantification metric specifically for classification tasks, lowering the risk of terminating jobs that are likely to be fruitful. To achieve higher transferability in inorganic complex discovery, an ML model that utilizes the information generated during simulation is also developed, which directs the on-the-fly decision of whether to abandon an in-progress calculation. Upon implementing this “dynamic” classifier in current high-throughput screening workflows, we achieve around two-fold acceleration in data generation with no loss of feasible lead compounds. We expect our meta-decision approach to be broadly useful in data set generation with first-principles calculations for the accelerated design of materials.
8:00 PM - MT02.03.03
Autonomous Experimentation for Mechanical Design
Aldair Gongora1,Bowen Xu1,Wyatt Perry1,Chika Okoye1,Patrick Riley2,Kristofer Reyes3,Elise Morgan1,Keith Brown1
Boston University1,Google Research2,University at Buffalo, The State University of New York3Show Abstract
The high level of control afforded by additive manufacturing presents innumerable possibilities for design with each design choice potentially influencing mechanical performance. While tools exist for optimizing many facets of mechanical performance, improving failure properties, however, poses a challenge due to the stochastic nature of failure and the difficulty in reliably predicting the role of the microstructure introduced during manufacturing. As a result, design for failure performance typically occurs manually through iterative manufacture and testing. In this work, we present the realization and use of an autonomous research system (ARS) for designing and optimizing a model family of additively manufactured structures for mechanical performance. This novel approach to mechanical design combines high-throughput automated experimentation with active learning-guided experimentation to enhance the speed of experimental campaigns and the knowledge gained from each experiment. In order to evaluate this approach, the ARS was tasked with exploring and optimizing a parametric model family of structures for energy absorption. Not only was the ARS able to identify higher performing structures with an order of magnitude fewer experiments than a grid-based design of experiments approach, but it was able to do so in 36 hours. Additionally, evaluating the performance of different learning approaches on the ARS showed fascinating deviations from purely computational studies, highlighting the importance of experimental validation in the active learning community. The use of autonomous research systems for the design of structures for properties that cannot be effectively simulated represents a shift in the conventional design process and could have an impact in materials development and mechanical design in a manner that facilities the convergence of machine learning, physical experimentation, and design.
8:00 PM - MT02.03.04
Efficient Selection of Categorical Process Variables for Autonomous Experimentation
Florian Häse1,2,3,Loïc Roch1,2,3,Alan Aspuru-Guzik3,2
Harvard University1,Vector Institute for Artificial Intelligence2,University of Toronto3Show Abstract
Self-driving laboratories provide promising opportunities to enable autonomous experimentation for a substantial acceleration of scientific discovery. By combining automated experimentation platforms with algorithmic experiment planning strategies based on machine learning, self-driving laboratories implement a closed-loop approach to experimentation. The performance and acceleration of a self-driving laboratory critically depend on the efficiency of its experiment planning strategies to operate in low-data regimes while executing only the most informative experiments.
Experimentation procedures can be governed by gradually changeable parameters such as temperature or residence time, but can also involve distinct choices, for example the selection of a catalyst or a solvent. Although a plethora of algorithms have been developed to determine the optimal values of continuous variables, the efficient selection of categorical variables still remains challenging under the constraints posed by typical experimentation settings.
Here, we introduce Gryffin, a framework developed to facilitate autonomous experimentation for processes involving categorical variables. Gryffin is built upon concepts from Bayesian optimization and Bayesian kernel density estimation, which have previously been shown promising performance for continuous parameter selection. Contrary to existing methods, Gryffin can accelerate the optimization process by leveraging domain knowledge in the form of user-provided descriptors for the categorical choices. Balancing the provided domain knowledge with the collected experimental feedback on-the-fly, Gryffin is capable of learning more informative representations of the categorical experimentation parameters. These learned representations can evidence relevant properties of the categorical parameters to enable the derivation of design choices.
We highlight the applicability and performance of Gryffin on three different problems bridging materials science and chemistry: (i) the design of hybrid organic-inorganic perovskites, (ii) the discovery of non-fullerene acceptor candidates for organic photovoltaics, and (iii) the optimization of Suzuki-Miyaura cross-coupling reactions. Gryffin outperforms existing methods in all three applications, and identifies the best performing material compositions or experimental conditions by probing only a fraction of the search space. Finally, we outline our recent results with Gryffin as an experiment planning strategy in self-driving laboratories for thin-film materials and reaction optimization.
Jason Hattrick-Simpers, National Institute of Standards and Technology
Barnabas Poczos, Carnegie Mellon University
Markus Reiher, ETH Zurich
Aleksandra Vojvodic, University of Pennsylvania
Machine Learning: Science and Technology | IOP Publishing
Matter & Patterns | Cell Press
MT02.04: Machine Learning for Potentials
Tuesday AM, December 03, 2019
Hynes, Level 2, Room 210
9:00 AM - MT02.04.02
Understanding the Atomic Scale Dynamics in Materials with Unsupervised Learning from Molecular Dynamics
Tian Xie1,Arthur France-Lanord1,Yanming Wang1,Yang Shao-Horn1,Jeffrey Grossman1
Massachusetts Institute of Technology1Show Abstract
Understanding the dynamical processes that govern the performance of functional materials is essential for the design of next generation materials to tackle global energy and environmental challenges. Many of these processes involve the dynamics of individual atoms or small molecules in condensed phases, e.g. lithium ions in electrolytes, water molecules in membranes, molten atoms at interfaces, etc., which are difficult to understand due to the complexity of local environments.
In this talk, we will present Graph Dynamical Networks (GDyNets) , an unsupervised learning approach to learn atomic scale dynamics from molecular dynamics trajectories for arbitrary phases and environments. We demonstrate that learning local dynamics is exponentially easier than global dynamics in material systems. Then, we apply this approach to two complex systems -- silicon atoms at liquid-solid interfaces, and lithium ions in amorphous polymer electrolytes. We show that our approach gains important dynamical information for both materials that is otherwise difficult to obtain, and provides atomic scale explanations to some experimental observed phenomena. With the large amounts of molecular dynamics data generated every day in nearly every aspect of materials design, this approach has the potential to provide a broadly applicable, automated tool to understand atomic scale dynamics in material systems.
 Xie, Tian, et al. "Graph Dynamical Networks for Unsupervised Learning of Atomic Scale Dynamics in Materials." Nature Communications, in press.
9:15 AM - MT02.04.03
Fast and Accurate Interatomic Potentials by Genetic Programming
Alberto Hernandez1,Adarsh Balasubramanian1,Fenglin Yuan1,Simon Mason1,Tim Mueller1
Johns Hopkins University1Show Abstract
In recent years there has been great progress in the use of machine learning algorithms to develop interatomic potential models. Machine-learned potential models are typically orders of magnitude faster than density functional theory but also orders of magnitude slower than physics-derived models such as the embedded atom method. We demonstrate that machine learning, in the form of genetic programming, can be used to develop accurate and transferable many-body potential models that are as fast as the embedded atom method, making them suitable to model materials on extreme time and length scales. The key to our approach is to explore a hypothesis space of models based on fundamental physical principles and to select models from this hypothesis space based on their accuracy, speed, and simplicity. We demonstrate our approach by developing fast and accurate interatomic potential models for copper that generalize well to properties they were not trained on. Our approach requires relatively small sets of training data, making it possible to generate training data using highly accurate methods at a reasonable computational cost. We discuss the open-source implementation of our algorithm and additional features including extensions to systems with multiple elements.
9:30 AM - MT02.04.04
On-the-Fly Bayesian Active Learning of Interpretable Force Fields for Atomistic Rare Events
Jonathan Vandermause1,Steven Torrisi1,Simon Batzner1,Yu Xie1,Lixin Sun1,Alexie Kolpak2,Boris Kozinsky1
Harvard University1,Massachusetts Institute of Technology2Show Abstract
Recent machine learning (ML) approaches to modeling the Born-Oppenheimer potential energy surface have been shown to approach first principles accuracy for a number of molecular and solid-state systems. However, most ML potentials return point estimates of the quantities of interest (typically energies, forces, and stresses) rather than a predictive distribution that reflects model uncertainty. Without uncertainty estimates, a laborious fitting procedure is required, which usually involves selecting thousands of reference structures from a database of first principles calculations. Here, we show that active learning based on Gaussian process (GP) regression can accelerate the training of high-quality force fields by making use of accurate Bayesian estimates of model error.
Through a series of tests on randomly generated bulk aluminum structures, we demonstrate that the Bayesian error of our GP potentials correlates strongly with the true error of the model on out-of-sample structures. We then use the Bayesian error to implement an active learning molecular dynamics method, in which DFT is called only when the Bayesian error of the GP model rises above an adaptive threshold based on the optimized noise hyperparameter of the GP. All model hyperparameters are optimized whenever an atomic environment and its force components are added to the training set, allowing the error threshold to adapt to novel environments encountered during the simulation.
We discuss applications of this on-the-fly learning method to crystal melts, vacancy diffusion, and adatom diffusion for a range of single- and multi-component systems. By combining DFT with GP regression in a single molecular dynamics simulation, an accurate multi-phase force field for bulk aluminum is obtained with fewer than 100 DFT calls. Moreover, we demonstrate that our Bayesian potentials can be flexibly and automatically updated when the system deviates from previous training data. Because of the low-dimensional structure of our kernel function, we show that our potentials can be mapped to a much cheaper regression model approaching the efficiency of classical potentials. Such a reduction in the computational cost of training, updating, and applying machine-learned potentials promises to extend ML modeling to a wider class of materials than has been possible to date.
9:45 AM - MT02.04.05
Predicting Potential Energy Surfaces with Machine Learning
Software for Chemistry & Materials BV1Show Abstract
Atomistic simulations of molecules and materials require a reliable way to evaluate the underlying potential energy surface. Furthermore, realistic simulations of materials with complexities such as grain boundaries, vacancies, impurities, and interfaces, often imply large scale models comprising many thousands or even millions of atoms and thus, the use of very efficient computational chemistry methods. Machine learning (ML) methods can bridge the gap between often prohibitively expensive electronic structure calculations and efficient but often not sufficiently accurate force fields. In this contribution, we describe how an ML method based on artificial neural networks can be applied to describe chemical reactions in several complex systems, such as electrolyte solutions  and at solid/liquid interfaces . These simulations provide unprecedented detail and insight into the working and degradation mechanisms of batteries, fuel cells, and other electrochemical systems.
Although neural networks are ideally suited for describing complex non-linear functions like potential energy surfaces, they are still not routinely employed for this purpose . Here, we will present our work for overcoming the main obstacles for widespread adoption, relating to (i) construction of suitable training and validation sets, (ii) automation of the featurization of different molecules and materials, and (iii) the choice of loss function for the neural network optimization.
The neural network method is implemented as part of the Amsterdam Modeling Suite, a software package containing a sophisticated molecular dynamics engine and first-principles, semi-empirical, and atomistic potential methods. This software environment allows for seamless transitions between the different levels of theory and greatly simplifies the construction of the neural network potential. Our ultimate goal is to provide the chemistry, biochemistry, and materials science communities with an all-purpose computationally inexpensive method that can be used in simulations related to solving challenging problems in energy, renewables, climate change reduction, and health care.
 Hellström, M., & Behler, J. (2016). Concentration-Dependent Proton Transfer Mechanisms in Aqueous NaOH Solutions: From Acceptor-Driven to Donor-Driven and Back. The Journal of PhysicalChemistry Letters, 7, 3302-3306.
 Hellström, M., Quaranta, V., & Behler, J. (2019). One-dimensional vs. two-dimensional proton transport processes at solid–liquid zinc-oxide–water interfaces. Chemical Science 10, 1232-1243.
 Hellström, M., & Behler, J. (2018). Neural Network Potentials in Materials Modeling. In Handbook of Materials Modeling: Methods: Theory and Modeling, 1-20, Springer: Cham
10:30 AM - MT02.04.06
Data-Driven Materials Design and Machine Learning Using the Materials Project
University of California, Berkeley1,Lawrence Berkeley National Laboratory2Show Abstract
The powerful combination of supercomputing resources, robust algorithms for solving the laws of physics and state-of-the-art software infrastructure are enabling rapid, systematic calculations of real materials properties from quantum mechanics across chemistry and structure. A result of this paradigm change are databases like the Materials Project (www.materialsproject.org) which is charting the properties of all known inorganic materials and beyond, designing novel materials and offering the data free of charge to the community together with online analysis and design algorithms. The growing body of available, reliable data has reached the stage where automated learning algorithms can be effectively trained and utilized to accelerate all aspects of the materials design cycle: from property prediction, to materials synthesis and characterization. To exemplify the approach of data-driven materials design, we will survey a few in-house case studies and the application of accelerated learning – from prediction, to synthesis and characterization - showcasing rapid iteration between ideas, computations, insight and new materials development.
11:00 AM - MT02.04.07
Constructing Reliable Machine-Learning Potential for Solid-State Reaction: Example of Ni Silicidation
Wonseok Jeong1,Dongsun Yoo1,Kyuhyun Lee1,Seungwu Han1
Seoul National University1Show Abstract
Molecular dynamics (MD) using classical interatomic potentials can provide valuable information at the atomistic scale. However, when the simulation involves chemical reactions of bond breaking and forming along with mixed bonding characters, it is challenging to develop an accurate force field for the system and sometimes practically impossible. In this respect, the machine-learning potentials (MLPs) are highly anticipated since they are based on flexible mathematical structures with no pre-fixed form. The parameters in the machine-learning model are automatically optimized on the reference data, often from the density functional theory. Nevertheless, MLP has a critical drawback; it can produce unphysical results when MLP is applied on configurations that are not included in the training set. This is particularly problematic in the solid state reaction that involves constant bonding breaking and formation. In this presentation, we discuss the process of developing a reliable neural network potential (NNP) for a challenging solid-state reaction - thermal silicidation of nickel at the Ni-Si interface. We present a systematic way to build up the training set that can describe the interface reaction. In order to obtain the prediction uncertainty for certain local configurations, we utilize replica NNPs that are trained directly on the atomic energy of the reference NNP. We find that the temporal variation of compositions across the interface agrees well with the experiment.
11:15 AM - MT02.04.08
Gaussian Process-Based Refinement of Dispersion Corrections
Stefan Gugler1,Markus Reiher1,Jonny Proppe2
ETH Zürich1,University of Goettingen2Show Abstract
We employ Gaussian process (GP) regression to account for systematic errors in D3-type dispersion corrections. We refer to the associated, statistically improved model as D3-GP. It is trained on differences between interaction energies obtained from PBE-D3(BJ)/ma-def2-QZVPP and DLPNO-CCSD(T)/CBS calculations of 1,248 molecular dimers, which resemble the dispersion-dominated systems contained in the S66 data set.
To train our D3-GP model, we used features derived from the matrix of atom-pairwise D3(BJ) interaction terms:
(a) a distance-resolved interaction energy histogram, histD3(BJ), and (b) eigenvalues of the interaction matrix ordered according to their decreasing absolute value, eigD3(BJ).
We demonstrate that the posterior variance can be approximately updated from only the input variables (features) of the new training system, which are obtained efficiently from D3(BJ) calculations. In this way, we can collect a batch of new training systems before the corresponding electronic-structure calculations are carried out at the same time. We refer to this selection approach as batchwise variance-based sampling (BVS).
BVS-guided active learning is an essential component of our D3-GP workflow, which is implemented in a black-box fashion.
Once provided with reference data for new molecular systems, the underlying GP model automatically learns to adapt to these and similar systems.
This approach leads overall to a self-improving model (D3-GP) that predicts system-focused and GP-refined D3-type dispersion corrections for any given system of reference data.
 J. Chem. Theory Comput. 2019, in press, DOI: 10.1021/acs.jctc.9b00627
11:30 AM - MT02.04.09
Doing Less for More—Multi-Information Bayesian Optimization and the Computational Sciences
Henry Herbol1,Matthias Poloczek2,Paulette Clancy1
Johns Hopkins University1,The University of Arizona2Show Abstract
For half a century, computational methods such as Density Functional Theory (DFT) and Molecular Dynamics (MD) have allowed scientists to study atomic-scale systems. These methods and approaches have consistently been characterized as either “slow, small, and accurate” or “fast, large, and inaccurate” and virtually any combination in-between (with the exception of the most desirable combination: “fast, large, and accurate”). For instance, DFT is well established in modeling electronic properties of materials; however, due to scaling issues, DFT is typically limited to smaller atomic systems, or fully periodic crystals. For phenomena, such as nanoparticle nucleation, which demand large system sizes, MD is typically the method of choice. Adding to the complexity, within each method the user has to select a choice of functional or force field, which possess different degrees of computational complexity.
In recent years, Machine Learning (ML) methods have begun to be used to ameliorate these computational costs, while maintaining acceptable levels of accuracy. From Neural Nets, to Random Forests, to Gaussian Process Regressions, machine learning approaches are increasingly being used to replace DFT and MD as objective functions. Methods such as Bayesian Optimization are of particular interest in situations where data is scarce and expensive, which covers a number of important materials processes. As a result, ML is becoming an attractive tool in the computational sciences as a means to minimize the number of required expensive calculations.
In this work, we discuss the merging of these fields, ML and computational sciences, with a primary focus on showing how to utilize multiple information sources in the context of computational science problems. Multi Information Source Optimization using a Knowledge Gradient (misoKG) as an acquisition function was developed in 2017 by Poloczek and Frazier for a Gaussian Process Bayesian Optimization approach. Their initial design, tested against the well-known Rosenbrock benchmark, used a noisy alternative with a presumed lower cost to minimize the Rosenbrock at a smaller total cost than that of standard (“workhorse”) Expected Improvement and Knowledge Gradient methods. The statistical model in the original paper involved replicating the kernel for each information source, potentially scaling the number of hyperparameters to unreasonably high levels. Here, we extend their original statistical model to produce a more scalable approach, as well as applying misoKG to several computational problems, showcasing the benefits of using cheap, but at times inaccurate, calculations to model atomic systems. Our new method employs an Intrinsic Coregionalization Model that avoids the addition of hyperparameters, and is shown to minimize the Rosenbrock benchmark in considerably less cost than the alternatives. We test the base method against the interatomic DFT binding energy of metal halide perovskite salts and solvents, using different solvents and different levels of theory as alternative information sources. A more direct study of functionals and their uses as alternative information sources is then made with the study of carbon monoxide.
11:45 AM - MT02.04.10
Machine Learning-Assisted Acceleration of DFT without Machine-Learning Errors
Skolkovo Institute of Science and Technology1Show Abstract
Machine-learning potentials are a promising alternative combining both, efficiency of the empirical interatomic potentials and accuracy of quantum-mechanical interatomic interaction models. Unlike empirical potentials that are reused for solving a new problem with the old materials, machine-learning potentials are usually re-parametrized for a new problem at hand. Hence a major bottleneck of such potentials: it takes large amounts manual work to construct a database and fit a new potential.
I will present an approach based on moment tensor potentials [Shapeev (2016)] active learning of automatically constructing machine-learning potentials [Gubaev (2019)]. For example, in the problem of computing the vibrational free energy of a compound, an actively learning potential will select those configurations for which predicting energy and forces results in extrapolation during molecular dynamics. These configurations will be evaluated with DFT and added to the training set. This procedure will run until the potential can run molecular dynamics completely without extrapolation. Special care it taken to make sure that changing the potential on-the-fly would not hinder the accuracy of statistical averaging. Active learning, thus, closes the loop and makes the algorithm of computing free energy fully automatic.
The resulting automatic algorithm would still contain small systematic and often uncontrollable errors due to approximation of quantum mechanics with a surrogate model. However, in some applications, this error can be corrected for at the last stage of the algorithm. For instance, the last step of computation of the free energy can be a thermodynamic integration computing the difference between the free energy as given by the machine-learning potential and the DFT free energy. This results into an automatic algorithm without the systematic error in the final answer. Only the statistical error remains which is still much smaller compared to using DFT alone.
The proposed approach of accelerating DFT calculations by automatically constructing machine-learning potentials and correcting for their systematic errors will also be illustrated on the problems of constructing of convex hulls of stable alloy structures and computing high-temperature elastic constants of crystals.
[Shapeev (2016)] Shapeev, A. V. (2016). Moment tensor potentials: A class of systematically improvable interatomic potentials. Multiscale Modeling & Simulation, 14(3), 1153-1173.
[Gubaev (2019)] Gubaev, K., Podryabinkin, E.V., Hart, G.L. and Shapeev, A.V., 2019. Accelerating high-throughput searches for new alloys with active learning of interatomic potentials. Computational Materials Science, 156, pp.148-156.
MT02.05: Machine Learning from Theory
Anatole von Lilienfeld
Tuesday PM, December 03, 2019
Hynes, Level 2, Room 210
1:30 PM - MT02.05.01
Machine-Learning Framework for the Discovery of MOFs for Enhanced Hydrogen Storage
Sanket Deshmukh1,Samrendra Singh1,Abhishek Sose1,Karteek Bejagam1
Virginia Tech1Show Abstract
We have developed a novel computational framework that integrates in-house MOF structure generation code with machine-learning (ML), and optimization algorithms. Initially, hypothetical structures of MOFs (H-MOFs) with multiple functional groups were generated by using in-house structure generation code. Use of multiple functional groups of different lengths allowed us to generate structures with diverse structural and chemical features. These H-MOFs were screened for hydrogen adsorption by performing grand canonical monte carlo (GCMC) simulations at 1 atm and 77 K. The primary structural features of H-MOFs that were responsible for hydrogen adsorption, and corresponding values of hydrogen adsorption were used as input and outputs, respectively, to train the artificial neural network (ANN) based ML model. In the next stage, this ML model was integrated with the in-house code, and optimization algorithm to discover new structures that maximize these H-MOFs features responsible for their higher adsorption. The structures discovered by this approach were validated by performing GCMC simulations.
1:45 PM - MT02.05.02
Prediction of Microstructure Stress-Strain Curves Using Convolutional Neural Networks
Charles Yang1,Youngsoo Kim2,Seunghwa Ryu2,Grace Gu1
University of California, Berkeley1,Korea Advanced Institute of Science and Technology2Show Abstract
Stress-strain curves are a foundational characterization tool, from which important material properties such as elastic modulus, strength, and toughness modulus, are defined. However, obtaining stress-strain curves from numerical methods such as finite element method (FEM) is computationally intensive, especially when considering the entire failure path. As a result, it is difficult to use high throughput computation to predict material behavior beyond the elastic limit, and ultimately optimize for strength or energy dissipation of materials with large design spaces, such as composites. In this work, a combination of principle component analysis (PCA) and convolutional neural networks (CNN) were used to predict the stress-strain behavior of composites evaluated over the entire failure path, motivated by the significantly faster inference speed of empirical models. By visualizing the eigenbasis learned by principal component analysis, we show that principal component analysis transforms the stress strain curves into an effective latent space for the CNN to learn in. Several novel methodological approaches were developed, including using the derived material descriptors from the stress strain curves as interpretable metrics for model performance and dimensionality reduction techniques applied to stress-strain curves. Results show that CNN is able to effectively predict latent space representations of stress-strain curves, with a training set orders of magnitude smaller than the design space. These results demonstrate the potential to use machine learning to accelerate composite design characterization and optimization.
2:00 PM - MT02.05.03
Accelerating Discovery in Inorganic Chemistry with Machine Learning
Heather Kulik1,Jon Paul Janet1,Chenru Duan1,Aditya Nandy1,Naveen Arunachalam1,Daniel Harper1
Massachusetts Institute of Technology1Show Abstract
Although the discovery and synthesis of new materials, catalysts, and functional molecules represents the foremost effort that unifies thousands of researchers in the chemical science community, presently characterized compounds represent a minute fraction of chemical space. The highly tunable electronic structure properties of inorganic complexes (i.e., variable spin, oxidation state, and coordination number) make them attractive targets for applications in energy storage, functional materials, and catalysis but present a daunting combinatorial challenge. This vast transition metal compound space cannot be fully enumerated by any traditional Edisonian approach. In order to advance quantitative structure-activity relationships, reveal emergent phenomena, and accelerate design of materials and catalysts, smarter and faster computational approaches are needed. I will outline our efforts to accelerate first-principles (i.e., with density functional theory, or DFT) screening of inorganic complexes for catalysis and materials science applications. We develop machine learning (ML) models (e.g., artificial neural networks) that predict both properties and simultaion outcomes. We integrate these tools into an automated design workflow that can make essential decisions on which simulations are best to carry out and why, with awareness of ML model and DFT model uncertainty. I will describe applications of these tools for advancing understanding in catalysis and functional spin crossover materials.
2:30 PM - MT02.05.04
Advances in Interatomic Potentials for Materials
University of Cambridge1Show Abstract
Modelling the deformation of metals is one of the success stories of atomic scale modelling over the past four decades. Increasingly complex functional forms, from pair potentials to embedded atom models and bond order potentials, allowed the quantitative description of different crystal structures, point and line defects, shedding light on many elementary processes governing failure, phase stability, surface phenomena etc. Interestingly, the accuracy with which these models describe the potential energy surface corresponding to the electronic ground state has not changed over the decades and is rather limited. The success is thus largely empirical in nature - and follows from the sophistication of the modeller and the judicious compromises made in order to solve specific problems. The parallel developments in electronic structure theory on the other hand provided exquisite quantitative agreement with experiments e.g. for thermomechanical properties, phase stability, and defect energetics. I will report on recent work of a growing community, who have managed to bring these two worlds together, and construct extremely accurate functional representations of the interatomic potential. These developments rely on a very large amount of highly accurate electronic structure data, on non-parametric function fitting, and on sophisticated representation theory that brings with it guarantees of completeness and convergence.
3:30 PM - MT02.05.05
Polymer Informatics—Current Status and Critical Next Steps
Georgia Institute of Technology1Show Abstract
The Materials Genome Initiative (MGI) has heralded a sea change in the philosophy of materials design. In an increasing number of applications, the successful deployment of novel materials has benefited from the use of computational, experimental and informatics methodologies. Here, we describe the role played by computational and experimental data generation and capture, polymer fingerprinting, machine-learning based property prediction models, and algorithms for designing polymers meeting target property requirements. These efforts have culminated in the creation of an online Polymer Informatics platform (https://www.polymergenome.org) to guide ongoing and future polymer discovery and design [1-3]. Challenges that remain will be examined, and systematic steps that may be taken to extend the applicability of such informatics efforts to a wide range of technological domains will be discussed. These include strategies to deal with the data bottleneck, new methods to represent polymer morphology and processing conditions, and the applicability of emerging algorithms for design.
 C. Kim, A. Chandrasekaran, T. D. Huan, D. Das, R. Ramprasad, “Polymer Genome: A Data-Powered Polymer Informatics Platform for Property Predictions,” Journal of Physical Chemistry C, J. 122, 31, 17575-17585 (2018).
 A. Mannodi-Kanakkithodi, A. Chandrasekaran, C. Kim, T. D. Huan, G. Pilania, V. Botu, R. Ramprasad, “Scoping the Polymer Genome: A Roadmap for Rational Polymer Dielectrics Design and Beyond”, Materials Today, 21, 785 (2018).
 R. Ramprasad, R. Batra, G. Pilania, A. Mannodi-Kanakkithodi, C. Kim, “Machine Learning and Materials Informatics: Recent Applications and Prospects”, npj Computational Materials 3, 54 (2017).
4:00 PM - MT02.05.06
Machine Learning Augmented Polymer Design—Challenging the Edisonian Status Quo
Institute of Materials Research and Engineering1Show Abstract
Present polymer synthesis techniques excessively strain research resources properties as they are largely Edisonian in approach. Coupled with largely manual experimentation, the timeline for new polymer innovation are prohibitively slow.
While the global materials science community has recognized that machine learning could be a viable technique to accelerate materials discovery, it’s application towards experimental polymer design has been lacking. One of the reasons could be the inherent difficulty of modelling polymer systems due to their complexity multiple length scales. There have been efforts in fingerprinting, which have shown success to DFT generated data. However, these are gross over-simplifications which do not generalize well towards experimental data.
Through a case study, we demonstrate the critical steps that are needed when building machine learning algorithms in polymer property prediction and polymer design. Building on this, we present the future of polymer design via a combination of intelligent data management, high throughput experimentation, high speed diagnostics and machine learning towards closed-loop autonomous experimentation.
 Nature, 2016, 536, 266-268, doi:10.1038/536266a
 J Phys Chem C, 2018, 122, 17575-17585, doi:10.1021/acs.jpcc.8b02913
4:30 PM - MT02.05.07
A Machine-Learning based Hierarchical Screening Strategy to Expedite Search of Novel Scintillator Chemistries
Anjana Talapatra1,Blas Uberuaga1,Chris Stanek1,Ghanshyam Pilania1
Los Alamos National Laboratory1Show Abstract
Scintillators have a wide variety of applications, ranging from medical imaging to radiation detection for global security. Despite a pressing need for new and improved scintillators for these diverse applications, the discovery and design of new scintillator materials relies on a laborious, time-intensive, trial-and-error approach; yielding little physical insight and leaving a vast space of potentially revolutionary materials unexplored. To accelerate the discovery of optimal scintillator materials with targeted properties and performance, we are currently pursuing an adaptive design framework that closely couples high throughput experiments, first principles computations and machine learning to (1) quickly screen a large chemical space of potentially promising scintillator chemistries and (2) optimize a selected chemistry via direct and iterative inputs from theory and experiments enabling further tuning of the underlying electronic structure for band edge and defect engineering. This talk will focus on the details of the screening strategy. Specifically, we will present a novel hierarchical down-selection approach that employs non-traditional structure maps (empirical screening), DFT-based stability analysis, multi-fidelity machine learning models for accurate bandgap predictions and a physics based classification to efficiently predict minimal favorable electronic structure for a viable scintillator. We not only validate our approach by demonstrating a successful screening of many already-known elpasolite scintillators, but also make predictions for potentially new double perovskite halide scintillators with mixed anion chemistries. The developed framework is general and has implications beyond scintillator discovery.
4:45 PM - MT02.05.08
Predicting Densities and Elastic Moduli of SiO2-Based Glasses by Machine Learning
Yong-Jie Hu1,Ge Zhao2,Tyler Del Rose1,Maarten de Jong3,Liang Qi1
University of Michigan1,The Pennsylvania State University2,University of California, Berkeley3Show Abstract
Chemical design of SiO2-based glasses with high elastic moduli and low weight is of great interest. However, it is difficult to find a universal expression to predict the elastic moduli according to the glass composition before synthesis since the elastic moduli are a complex function of interatomic bonds and their ordering at different length scales. Here we show that the densities and elastic moduli of SiO2-based glasses can be efficiently predicted by machine learning (ML) techniques across a complex compositional space with multiple (>10) types of additive oxides besides SiO2. Our machine learning approach relies on a training set generated by high-throughput molecular dynamic (MD) simulations, a set of elaborately constructed descriptors that bridges the empirical statistical modeling with the fundamental physics of interatomic bonding, and a statistical learning/predicting model developed by implementing least absolute shrinkage and selection operator with a gradient boost machine (GBM-LASSO). By just training with a dataset that only composed of binary and ternary glass samples, our model shows remarkable learning accuracy and outstanding extrapolation ability to predict the density and elastic moduli for k-nary glasses that beyond the training set. Finally, as an example to illustrate the potential applications of our ML model, we perform a rapid screening ~100,000 compositions of a quinary glass system to construct a compositional-property database that allows for a fruitful overview on the glass density and elastic properties.
MT02.06: Poster Session II: Machine Learning for Potentials and from Theory
Wednesday AM, December 04, 2019
Hynes, Level 1, Hall B
8:00 PM - MT02.06.01
Combining Polymorphism and Machine Learning for Materials Discovery
Fadwa El Mellouhi1
The computer-aided design of materials has witnessed important progress over the past few years. This being said, it depends crucially on the crystal structure and the polymorphs considered. I will show how we considered various polymorphism to obtain new stable and undiscovered compounds based on the assessment of the relative stability of various phases with respect to a reference structure. These calculations rely on the calculation of the thermodynamic stability by computing the convex hull energy aided by open-source computational databases.
I will also summarize some of our recent findings using DFT combined with machine learning to perform a systematic analysis of the structure-to-property relations exploring fully inorganic ABC3 chalcogenide (I-V-VI3), halide (I-II-VII3) and some hybrid perovskites. The analysis focused on the role of BC6 octahedral deformations, rotations and tilts over the thermodynamic stability and optical properties of the compounds. Machine learning algorithms helped to estimate the relations between the octahedral deformation and the bandgap, and established a similarity map among all the calculated compounds. We propose that compositions grouped together on the similarity map are amenable to form mix-ion compounds, offering interesting guidelines on how to engineer mix-phase perovskites.
This work have been supported by the Qatar National Research Fund (QNRF) through the National Priorities Research Program (NPRP8-090-2-047).
8:00 PM - MT02.06.02
Data-Driven Accurate Positioning of the Band Edges of MXenes
Avanish Mishra1,2,Arunkumar Rajan2,Rinkle Juneja2,Abhishek Singh2
University of Connecticut1,Indian Institute of Science, Bangalore2Show Abstract
MXenes are a vast class of two-dimensional (2D) materials exfoliated from corresponding MAX phases, which get functionalized due to the availability of unsaturated surface charges. A total of 25,000 MXenes has already been generated and are stored in a functional materials database named, aNANt , which are metallic or semiconducting in nature depending upon the surface termination. MXenes possess variability in their properties and are considered promising for electronic, photovoltaic, and photocatalytic applications. However, other than the band gap, these properties rely on the accurate position of the band edges. Hence, to synthesize MXenes for various applications, a prior knowledge of the accurate position of their band edges at an absolute scale is essential; computing these with conventional methods would take years for all the MXenes. A local or semilocal functional-based approach within density functional theory always underestimates the band gap. Furthermore, it fails to predict the accurate position of the band edges. We develop a machine learning model for positioning the band edges with GW level of accuracy having a minimum error of 0.12 eV . An intuitive model is proposed based on the combination of Perdew−Burke−Ernzerhof band edge and vacuum potential having a correlation of 0.93 with GW band edges, which is able to capture the physical origin behind the shift of reference level and unravel the role of surface functionalization in controlling it. These models can be utilized to identify MXenes for the desired application in an accelerated manner.
8:00 PM - MT02.06.03
Predicting Nanoscale Static Friction of 2D Materials via Machine Learning Techniques
Behnoosh Sattari Baboukani1,Kristofer Reyes1,Zhijiang Ye2,Prathima Nalam1
University at Buffalo1,Miami university2Show Abstract
Shear properties of two-dimensional (2D) materials such as graphene, molybdenum disulfide (MoS2) or boron nitride exhibit significant dependence on interlayer interactions and lattice orientations. The static friction of 2D materials originates from the stick-slip pattern generated during shearing of the layers. Within the framework of Prandtl-Tomlinson model, the corrugated potential Energy Surface (PES) barrier of the shearing interface results in stick-slip instabilities and tuning the interlayer interactions such as van der Waals, electrostatic forces and identifying the appropriate lattice orientations between two similar 2D materials can enable the identification of promising ‘candidates’ with super lubricating properties. Currently, over few dozen 2D materials have been successfully synthesized and other thousand 3D materials have been identified with potential exfoliation properties, and hence to identify the 2D material with highest lubricity machine learning tools are highly valuable.
In this study, 15 different 2D materials from two different families i.e. the graphene family (includes graphene, hydrogenated graphene, fluorographene and hexagonal boron nitride) and transition metal dichalcogenide family (TMDs includes MX2, M: Mo, Nb, W, Ti - S: S, Se, Te) were selected. PES for five 2D materials, estimated via molecular dynamics (MD) simulations or density function theory (DFT) approximations, were extracted from the literature and employed as the training data. We use a combination of geometric, electronic, mechanical, phonon vibration-related and physical descriptors of 2D material in Bayesian modeling and transfer learning techniques to predict maximum PES for 10 different 2D materials. Strong pairwise correlations were observed among the 2D materials within the same family. Posterior predictions showed hydrogenated graphene presented the lowest corrugation barrier i.e. ~45 % smaller than graphene. To validate the model, potentials for hydrogenated graphene were established and molecular dynamics simulations was performed to estimate a PES value of 1.8 meV/A°2, which was only ~ 10% smaller and within the range of the uncertainty as estimated by the Bayesian modeling. Further, the band gap energies for 3 TMDs were predicted using same descriptor set and the predicted values were found to be similar with the DFT-calculated band gap values. Descriptor sensitivity analysis indicates that the maximum energy barrier values were controlled mostly by structural, mechanical and phonon vibrational properties of the system, not by the electronic properties. Finally, the PES for TMDs were found to be highly correlated to the polarizability and the size of the chalcogen atom. The robust model generated in our current study creates a platform for predicting and estimating the lubrication of properties for other novel and unexplored 2D materials and as well for 2D-based heterostructures.
8:00 PM - MT02.06.04
Linking Predictions of Protein Structure and Disorder through Molecular Simulation
Claire Hsu1,Anna Tarakanova2,Markus Buehler1
Massachusetts Institute of Technology1,University of Connecticut2Show Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions within proteins (IDRs) have been shown to serve an increasingly expansive list of biological functions, including regulation of transcription and translation, protein phosphorylation, cellular signal transduction, regulation of self-assembly of large protein complexes, as well as mechanical roles. The link between function and protein disorder has motivated several recent advances in experimental techniques for identifying disordered regions. Common current techniques include X-ray crystallography, NMR spectroscopy, mass spectrometry, electron microscopy, and small-angle X-ray scattering; recent advances have extended the length and time-scales of such methods, often combining multiple techniques together to better capture dynamic regions that signal disorder. However, experimental methods for disorder classification still do not scale to the level of experimentally curated structural information in folded protein structure databases. In addition, disorder predictors rely on several different definitions of disorder and fail to account for the continuous order-disorder spectrum. To better capture disorder in protein structure, we link disorder predictions to the performance of secondary structure predictor algorithms developed for folded proteins and conduct molecular dynamics simulations on representative proteins from the Protein Data Bank to determine regions of high motion. We find that secondary structure predictor performance can be leveraged to confirm regions of disorder identified by disorder predictors. In addition, low accuracy secondary structure predictions coupled with high dynamics as suggested by molecular simulations suggest a lack of static structure and, thus, a presence of disorder in regions that may not have a consensus disorder prediction. While disorder databases scale to size and experimental techniques continue to develop, secondary structure predictors and molecular simulations can improve disorder predictor performance, which can lead to the discovery of novel functions of IDPs and IDRs.
8:00 PM - MT02.06.05
Metadynamics Sampling for Training Machine-Learning Interatomic Potential
Dongsun Yoo1,Wonseok Jeong1,Kyuhyun Lee1,Seungwu Han1
Seoul National University1Show Abstract
Machine-learning interatomic potential (MLIP) is getting much attention as a promising computational tool that can give the accuracy of quantum mechanical calculations with the cost of classical force-fields. Among many forms of MLIP suggested to date, neural network potential and Gaussian approximation potential are the two most popular models.
The construction of reliable MLIP requires a careful selection of training set. The training set is usually selected by intuition and experience; practitioners select a set of structures that are expected to appear during the simulations. However, unexpected structures can emerge during the simulation, giving unreliable results or even catastrophic failures. This is a major obstacle against wide use of MLIP. Several methods were proposed to remedy this problem: the detection of out-of-training-set structures, e.g. query-by-committee method, and on-the-fly training. Still, the selection of training set is the major bottleneck to the construction of MLIP.
In this presentation, we suggest a new form of metadynamics to efficiently sample a wide range of local environments, enabling easier construction of MLIP. We use a descriptor vector, which is a direct input to machine-learning model, as collective variables. That is to say, collective variables represent atomistic local environment rather than the state of the entire system. Metadynamics simulations starting from certain structures can sample a wide range of local environments so that it is easier to construct general-purpose potential. Metadynamics sampling also increases the stability of trained potential. We will demonstrate its performance with neural network potential.
8:00 PM - MT02.06.07
Computational Exploration of Near-Infrared Absorbing Polymethine Dyes
Daniele Padula1,Roland Hany1,Frank Nuesch1,Mark Waller2
Empa–Swiss Federal Laboratories for Materials Science and Technology1,Pending.ai2Show Abstract
In this contribution, we present a Machine Learning/Quantum Chemical approach aimed at identifying new organic molecules able to absorb in the NIR region of the spectrum.
We obtain a pool of potential candidates exploiting Machine Learning models able to generate text representations of molecules. The molecules obtained through this generative model are biased towards classes of compounds that are well known to absorb in the NIR region, such as squaraines, cyanines, croconates and other organic semiconductors. The generated molecules are then clustered with Cheminformatics approaches, and their synthetic accessibility assessed through Monte Carlo Tree Search approaches to retrosynthesis.
Finally, their electronic properties are screened through quantum chemical methods of increasing complexity to progressively reduce the pool of candidate materials. Additionally, to reduce the computational cost of future quantum chemical screenings, we will assess whether it is possible to formulate new descriptors that are good predictors of the electronic properties of a molecule.
8:00 PM - MT02.06.09
A General Machine Learning Framework for Impurity Level Prediction in Semiconductors
Arun Kumar Mannodi Kanakkithodi1,Maria Chan1
Argonne National Laboratory1Show Abstract
Electronic levels introduced by impurities and defects in the middle of the band gap are critically important in semiconductors for optoelectronic, photovoltaic (PV) and quantum sensing applications. While “deep” defect levels can prove catastrophic for PV performance by causing non-radiative carrier recombination, impurity levels in the band gap could also be entangled for quantum sensing or lead to increased absorption of sub-gap photons which can enhance efficiencies. Predicting formation energies and charge transition levels for defects in semiconductors is thus paramount; density functional theory (DFT) calculations have been widely applied for such studies to overcome experimental bottlenecks. However, the requirement of large supercells, advanced levels of theory, and inclusion of charge corrections make these computations very expensive, and trends and knowledge from previous calculations are not exploited in subsequent ones.
In this work, we combine high-throughput DFT and machine learning (ML) to develop general predictive models for the formation enthalpy and charge transition levels of impurities in two broad semiconductor classes: (a) ABX3 halide perovskites, and (b) group IV, III-V and II-VI semiconductors. DFT data is generated for impurity atoms selected from across the periodic table and simulated in various possible cation, anion or interstitial sites. Any “semiconductor + impurity” combination is converted into a unique feature vector based on the identity and electronic properties of the semiconductor, the tabulated elemental properties of the impurity atom, information about the defect site coordination environment, as well as some electronic and energetic properties computed using low cost unit cell defect calculations. State-of-the-art neural networks, random forest and ridge regression methods are applied to train models for (a) impurity formation enthalpy, and (b) impurity charge transition levels, based on the input feature vector. Model performance is evaluated for different ML techniques, sets of features, and size and nature of training dataset; the best predictive models thus obtained are deployed for comprehensive prediction and design purposes. It is seen that models trained on defects and impurities in end point compositions are applicable to intermediate compositions as well: for instance, by training on data from pure canonical AB semiconductors like CdTe, ZnO, GaAs and SiC, the ML models can make accurate predictions for impurities in alloyed compositions of the same compounds, such as CdTe1-xSex, Cd1-xZnxO, Al1-xGaxAs1-ySby, etc. This versatility of the machine learned-models provides an avenue to access the energetic and optoelectronic impact of any atomic impurity in any possible pure or mixed composition semiconductor belonging to the same chemical space. We use the predictive models to quickly screen for dominating impurities, that is impurities that shift the equilibrium Fermi level in the semiconductor as determined by dominant native defects, in hundreds of possible compositions in the halide perovskite and groups IV, III-V and II-VI semiconductor spaces. The quick and accurate estimation of interesting dopants/impurities in semiconductors would not be possible without machine learned models, and the strategy applied in this work is applicable to any class of semiconductors.
8:00 PM - MT02.06.10
Machine Learning Study of Magnetic Two-Dimensional Materials
Trevor David Rhone1,Shaan Desai1,Wei Chen1,Amir Yacoby1,Efthimios Kaxiras1
Harvard University1Show Abstract
The discovery of intrinsic magnetism in monolayer CrI3 and bilayer Cr2Ge2Te6 created great interest in two-dimensional (2D) materials with intrinsic magnetic order. How many of these materials exist? What are their properties? We present a study of 2D materials with intrinsic magnetic order, materials at the forefront of physics research. We use materials informatics (machine learning applied to materials science) to study the magnetic and thermodynamic properties of 2D materials. Crystal structures based on monolayer Cr2Ge2Te6, of the form A2B2X6, are studied using density functional theory (DFT) calculations and machine learning tools. Magnetic properties, such as the magnetic moment are determined. The formation energies are also calculated and used to estimate the chemical stability. We show that machine learning, combined with DFT, provides a computationally efficient means to predict properties of two-dimensional (2D) magnets. In addition, data analytics provides insights into the microscopic origins of magnetic ordering in 2D. This novel approach to materials research paves the way for the rapid discovery of chemically stable 2D magnetic materials.
8:00 PM - MT02.06.11
Machine-Learning-Based Band Gap Predictions of Functionalized MXenes
Abhishek Singh1,Arunkumar Rajan1,Avanish Mishra1,Swanti Satsangi1
Indian Institute of Science Bangalore1Show Abstract
MXene is a recent addition to the ever-growing family of two-dimensional (2D)-materials, promising for optical, electronic, energy storage and photocatalytic applications. Utilizing a statistical learning-based approach, we electronically characterize this vast class of materials by predicting their band gaps with GW level accuracy. Using a classification model, MXene having finite band gaps are filtered out and few of them are selected to build a machine learning model. The model is built by correlating the easily available elemental and computed properties as features with respect to calculated GW band gaps of selected MXene. Depending upon feature combinations, Gaussian process regression method resulted in an optimized model yielding low root-mean-squared-error of 0.14 eV, which can be employed to estimate the accurate GW band gaps of tens of thousands of MXenes  within minutes. Our results demonstrate that machine learning model can bypass band gap underestimation problem of local and semi-local functionals used in DFT calculations, without subsequent correction using time-consuming GW approach .
 aNANt: a functional materials database, http://anant.mrc.iisc.ac.in
 A. C. Rajan, A. Mishra, S. Satsangi, R. Vaish, H. Mizuseki, K. R. Lee, A.K. Singh. Chem. Mater. 2018, 30, 4031-4038
8:00 PM - MT02.06.12
Machine Learning the Fundamental Tradeoffs between Conductivity and Voltage Stability in Solid State Electrolytes
Karun Kumar Rao1,Michael Nikolaou1,Yan Yao1,Lars Grabow1
University of Houston1Show Abstract
All solid state batteries provide many safety advantages over traditional lithium-ion batteries by replacing the combustible organic liquid electrolyte with a ceramic solid-state electrolyte. However, reported superionic conductors with conductivities approaching that of liquid electrolytes are unstable in contact with a lithium anode leading to increased internal cell resistance and poor cyclability. Conversely, compounds stable at the anode or cathode interfaces often do not exhibit useful bulk ionic conductivities. Although ab initio methods exist to study each ionic conductivity and voltage stability range independently, there is no established theory to connect these two properties. Here, we leverage machine learning (ML) to investigate the role of crystal structure in the tradeoff between voltage stability and ionic conductivity. To this end, we trained a partial least squares (PLS) machine learning algorithm using the valence electronic density as a descriptor of 60 known solid-state electrolytes along with their corresponding ionic conductivity, anodic voltage limit, and cathodic voltage limit. In addition to exploring electron density, we also evaluate translationally invariant Fourier-based descriptors. The trained model has an 80% prediction accuracy and suggests that within the search space of crystal structures, the voltage stability and ionic conductivity are inherently inversely correlated. A multi-objective optimization also suggests that materials with positively correlated ionic conductivity and voltage stability may be highly anisotropic. Our PLS machine learning model, compared to the more conventional neural networks and other such models, has the benefit of being able to predict and explore the relationship between multiple properties and retains a high level of interpretability versus other ‘black box’ machine learning models. Moreover, we successfully quantify the uncertainty and confidence intervals in our model predictions which are often overlooked in other methods. The PLS model successfully identifies and quantifies the BCC anion substructure and channels as effective descriptors, which is in good agreement with prior work.1 Using this model, we screened through a database of ca. 14,000 materials and identified five new promising solid state electrolyte candidates to have conductivities greater than 16 mS/cm. The model predictions were subsequently verified with ab initio molecular dynamics simulations. The proposed ML model and electron density based descriptors may be used in future studies to elucidate other complicated structure-property relationships for other applications with high accuracy and without sacrificing interpretability.
1. Wang, Y. et al. Design principles for solid-state lithium superionic conductors. Nat. Mater. 14, 1026–1031 (2015).
8:00 PM - MT02.06.13
Spectral Optimization and Temperature Control for Electronic and Optoelectronic Devices Using Machine Learning
Po-Ying Chen1,Quang-Tuyen Le1,Nan-Yow Chen2,An-Cheng Yang2,Yu-Chieh Lo1
National Chiao Tung University1,National Center for High-Performance Computing2Show Abstract
The temperature and the cooling power of electronic components are important to many kinds of commercial electronic devices. There are lots of factors that could raise the temperature, and sunlight is one of them. However, when using solar cell as the power resource of devices, we can’t prevent it from sun exposure. Therefore, finding a proper material which has the best ability to reduce the temperature becomes inevitable. Recently, applying machine learning on material design has been significantly promoted, so we introduced machine learning to help us find out the better cooling materials. TensorFlow is the machine learning module in this work, and an optical simulation software called DiffractMOD is used to produce training data. At first, besides the refractive index and extinction coefficient of different materials, a variety of geometric structures were also set as input. Next, DiffractMOD would output the transmittance, reflectance, and absorptance of corresponding wavelength. Furthermore, to calculate the equilibrium temperature and cooling power, we performed Fourier-transform infrared spectroscopy. By feeding those data into an autoencoder which is based on convolutional neural network, we could train the model and predict the optical coefficients and geometries for optimal performance. In this case, it can cool down the devices most effectively.
8:00 PM - MT02.06.14
Deep Learning for Multiscale Atomistic Modeling of Multicomponent Crystal Chemistries Coupled with Hirshfeld Surface Analyses
Arpan Mukherjee1,Aparajita Dasgupta1,Tianmu Zhang1,Scott Broderick1,Krishna Rajan1
University at Buffalo1Show Abstract
Hirshfeld Surfaces are coupled to machine learning to map information on the impact of each pair-wise interaction between bond chemistry and bond geometry in multicomponent systems. The Hirshfeld Surfaces encode both chemical bonding and molecular geometry information and are extremely effective in providing a multiscale electronic structure signature for accelerated materials selection and design, while providing an electronic fingerprint which captures both bond geometry and bond chemistry. Based on their geometric and chemical bonding interactions, we have developed libraries of fingerprints which are computationally ready to be analyzed with various machine learning methods. From the new library of Hirshfeld Surface calculations, when coupled with new analysis approach, we rapidly define similarity in compounds computationally. We show the application of machine learning approaches, such as convolutional neural networks, to quantitatively find and extract characteristics in the material fingerprints to develop rapid classifications across multiple material classes, chemistries and properties.
8:00 PM - MT02.06.15
Phase-Field Modeling and Machine Learning of Electric-Thermal-Mechanical Breakdown of Polymer-Based Dielectrics
Jianjun Wang1,Zhonghui Shen2,Yang Shen2,Long-Qing Chen2
The Pennsylvania State University1,Tsinghua University2Show Abstract
Polymer nanocomposites are attracting rapidly increasing attention due to their many of advantages and promising potentials, arising from the designable and optimizable synergetic interaction between the polymer matrix and the functional filler nanoparticles. For example, they can be used in capacitors which are crucial components in energy storage for advanced electrical and power systems, such as electric hybrid vehicles and solar power generators. To achieve a high-energy-density storage in a dielectric capacitor, a combination of high dielectric constant and high breakdown strength is required. However, normal dielectrics with high dielectric constants tend to have low breakdown strength while those with high breakdown strength tend to have low dielectric constant. Therefore, optimizing a combination of dielectric constant and breakdown strength has been a grand challenge. A common approach to overcoming this dilemma is making polymer-ceramic nanocomposites which own the high breakdown strength of the polymer and the high dielectric constant of ceramics, so as to give rise to a high energy density which is proportional to the product of the dielectric constant and the square of the breakdown strength. In my presentation, I will show a computational approach by combining high-throughput phase-field simulations and machine learning to understand the breakdown mechanisms of polymer nanocomposite dielectrics under different operating conditions, as well as to design novel microstructures to achieve optimized breakdown strength, maximized energy storage and operating temperature. An analytical expression of the breakdown strength as function of the dielectric constant, electrical conductivity, and Young’s modulus was obtained from machine learning, which can be used to semiquantitatively predict the breakdown strength of the P(VDF-HFP)-based nanocomposites. I will also take about some targeted experiments which were designed to verify the high-throughput phase-field simulations and machine learning results.
References: (* and # represent corresponding author and co-first author, respectively)
 Zhong-Hui Shen, Jian-Jun Wang (*, #), Jian-Yong Jiang, Sharon X Huang, Yuan-Hua Lin, Ce-Wen Nan, Long Qing Chen, Yang Shen, “Phase-field modeling and machine learning of electric-thermal-mechanical breakdown of polymer-based dielectrics”, Nature Communications 10, 1843 (2019).
 He Li, Ding Ai, Lulu Ren, Yao Bin, Zhubing Han, Zhonghui Shen, Jian-Jun Wang, Long-Qing Chen, Qing Wang, “Scalable Polymer Nanocomposites with Record High Temperature Capacitive Performance Enabled by Rationally Designed Nanostructured Inorganic Fillers”, Advanced Materials (2019), doi.org/10.1002/adma.201900875.
 Zhong-Hui Shen, Jian-Jun Wang (*, #), Jian-Yong Jiang, Yuan-Hua Lin, Ce-Wen Nan, Long-Qing Chen, Yang Shen, “Phase Field Model of Electrothermal Breakdown in Flexible High Temperature Nanocomposites under Extreme Conditions”, Advanced Energy Materials 2018, 1800509.
 Zhong-Hui Shen, Jian-Jun Wang (*, #), Yuan-Hua Lin, Ce-Wen Nan, Long Qing Chen, Yang Shen, “High-Throughput Phase-Field Design of High-Energy-Density Polymer Nanocomposites”, Advanced Materials 30, 1704380 (2017).
8:00 PM - MT02.06.16
Using Data Driven Models to Gain Insight on Spin- and Oxidation-State Dependent Behavior of Reaction Energetics for Light Alkane Oxidation
Aditya Nandy1,Jon Paul Janet1,Chenru Duan1,Heather Kulik1
Massachusetts Institute of Technology1Show Abstract
Biological systems readily catalyze difficult chemical transformations such as direct methane to methanol conversion with high selectivity using earth abundant transition metals (e.g. Fe and Cu). Computational high-throughput virtual screening (HTVS) with first-principles density functional theory (DFT) can play a valuable role in unearthing design rules for scalable, industrially viable synthetic analogues that preserve this selectivity and activity. Single-site catalysts represent the most promising analogues to these enzymes, often enabling atom-economy and selectivity not possible with bulk heterogeneous catalysts. Simultaneously, a wide chemical space must be explored to uncover potential single-site catalysts to simultaneously address other constraints, such as high turnover number or robustness, earth abundance, and synthesizability. Single-site catalysts have the added dimensionality of spin- and oxidation-state—which can drastically impact the structure-property relationships of molecular complexes—and remain unexplored for catalyst reaction energetics. The large dimensionality of the single-site catalyst chemical space becomes quickly intractable with DFT. Thus, we demonstrate our developments on data-driven models for key steps in single-site light alkane oxidation catalysts, which enable the prediction of reaction energetics in a spin- and oxidation-state dependent manner close to the accuracy of DFT. We first compare prediction of reaction energetics to other quantum mechanical properties that we have predicted, such as spin-splitting energetics, ionization potential, and frontier orbital energetics. We then discuss insights on representative catalytic reaction steps such as oxo formation and hydrogen atom transfer, including their dependence on spin- and oxidation-state. Lastly, we demonstrate the power of artificial neural-network and kernel ridge regression models for purposes of screening, enabling screens of large design spaces that would be infeasible to screen by DFT, even with the reduction in number of calculations through the use of descriptor energies. We uncover cases of molecules that would be overlooked by quantum mechanical descriptors, or would be missed by chemical intuition. Having separate data driven models for different reaction energy steps allows predictions of reaction energies that contain weak or nonexistent linear free energy relationships (LFERs), commonly employed in heterogeneous catalysis, but less prevalent in homogeneous catalysis.
8:00 PM - MT02.06.17
Self-Evolving Neural Network Potentials for Supramolecular Interactions
Wujie Wang1,William Harris1,Rafael Gomez-Bombarelli1
Massachusetts Institute of Technology1Show Abstract
Neural networks based on atomic embeddings have demonstrated accurate force and energy evaluations to simulate physical systems [1-5]. However, training neural potentials requires expensive sampling of configurational space and one needs to perform ab initio MD simulations to generate training data . For the purpose of high throughput discovery, such brute force sampling is computationally infeasible to screen thousands of molecules. To this end, we propose a self-consistent strategy that accelerates the sampling and training of transferable neural networks over molecular space of interests, particularly to capture supramolecular interactions that are critical for applications like designing chelating agents and electrolytes. The proposed method combines deep neural network training and active sampling of both chemical and configurational space in an end-to-end fashion to 1) bypass expensive ab initio MD sampling and 2) leverage the transferable configurational information across species. We show that this active learning  method can improve neural network potentials in a self-consistent way and demonstrate its use in performing high throughput screening of lithium binding molecules
1. Zhang, L., Han, J., Wang, H., Car, R. & E, W. Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics. (2017).
2. Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet–A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
3. Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 1–4 (2007).
4. Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R. & Parkhill, J. The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci. 9, 2261–2269 (2018).
5. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
6. Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. data 1, 140022 (2014).
7. Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).48, 241733 (2018).
8:00 PM - MT02.06.18
Artificial Intelligence Design of Tunable Nanocomposites for Crack Resistance
Chi-Hua Yu1,Zhao Qin1,Markus Buehler1
Massachusetts Institute of Technology1Show Abstract
Here we report a new design approach for nanocomposite materials using artificial intelligence (AI). The algorithm consists of a machine learning predictor conjoined with an AI improved genetic algorithm, applied to discover materials designs in a vast space of possible solutions. Facilitated by a generative neural network that is trained with a dataset of thousands of combinations of soft and brittle materials to generate high resolution tessellate geometries, we design the material properties of novel graphene nanocomposites without running conventional simulations.
Through the algorithm, we extend the capability of physical simulations beyond property predictions to optimize the fracture toughness by altering the material distribution. The solutions are generated by our AI model at a dramatically lower computational cost compared to brute-force searching methods. We further investigate the physical mechanism for improving the performance behind the AI approach and demonstrate the ability of AI to search for optimal designs with very limited sampling. Molecular dynamics simulations of the nanocomposite designs show that our AI design improves the performance by effectively decreasing the stress concentration at crack tips.
The AI approach reported here can be easily applied to other nanocomposites, biomaterials, and other material classes, and provides a transferrable, efficient and reliable design approach.
Jason Hattrick-Simpers, National Institute of Standards and Technology
Barnabas Poczos, Carnegie Mellon University
Markus Reiher, ETH Zurich
Aleksandra Vojvodic, University of Pennsylvania
Machine Learning: Science and Technology | IOP Publishing
Matter & Patterns | Cell Press
MT02.07/MT03.08: Joint Session: Machine Learning Augmented High-Throughput Experimentation I
Bruce van Dover
Wednesday AM, December 04, 2019
Hynes, Level 2, Room 210
8:00 AM - MT02.07.01/MT03.08.01
Automating Experiments and Data Interpretation in Solar Fuels and Catalysis Research
California Institute of Technology1Show Abstract
Automating critical steps of synthesis and screening experiments enables a variety of modes of materials exploration. High throughput experimentation comprises a family of techniques wherein materials systems can be comprehensively explored, and the resulting data relationships, e.g. composition-property and composition-structure-property relationships, are emblematic of the knowledge obtained from the experiments. Application of high throughput experimentation for solar fuels technology, in particular (photo)electrocatalysis of the oxygen evolution reaction, has led to a breadth of discoveries, many of which are based on high throughput computational screening. The resulting database of experiments, which is publicly released as the JCAP Materials Experiment and Analysis Database (MEAD) containing 6.5 million measurement files collected on 1.5 million materials samples, is a key resource for developing and evaluating algorithms that automate data interpretation. Successes to date include application of machine learning techniques to learn, identify, and communicate hidden data relationships. At a high level, these algorithms automatically generate answers to human-identified research questions, moving the frontier of artificial intelligence in materials discovery to the automatic identification of the interesting research questions.
8:30 AM - MT02.07.02/MT03.08.02
Cooperative Learning for Materials Systems
University of Maryland, College Park1Show Abstract
Recently, materials scientists have started to utilize machine learning to accelerate experimental research. Active learning – an AI field dedicated to optimal experimental design – is a particularly promising tool; it provides systematic means to identify the shortest path toward a material with some desired properties or the experiments that maximize knowledge of the explored space. In many materials science tasks, however, the goal is to obtain a mapping between two or more experimentally measured quantities. Standard active learning algorithms may not be optimal for such complex scientific problems. Reducing the experimental effort to obtain such mappings can be optimized not by independently running several active learning tasks but rather by a strategy coordinating different experiments performed simultaneously. In this talk I will present the idea of cooperative learning, and illustrate it with examples from several different high-throughput studies.
8:45 AM - MT02.07.03/MT03.08.03
Exploring Catalyst Chemistries beyond Scaling Laws using Statistical Learning
Scott Broderick1,Aparajita Dasgupta1,Thaicia Stona1,Krishna Rajan1
University at Buffalo1Show Abstract
We have significantly expanded the knowledge-base of metal catalysts through a unique combination of manifold learning, Gaussian process regression and clustering approaches. Given the complexity in performing catalytic measurements, the amount of data available for selecting ‘optimal’ catalysts for specific reactions is limited. The work described here develops an analysis framework suitable to the small number of measurements available, while also developing a large relevant descriptor-base. We have performed the foundational work needed to develop a catalyst discovery toolkit. Using volcano plots as a platform, we have fused manifold learning methods and graph network methods from which one can rapidly explore new chemistries for single atom alloy (SAA) catalysts. We use single atom systems for testing our robustness, with the added benefit that prior work on single atom systems has not utilized machine learning. Using SAAs allows for a rapid screening of the combinatorial design space. We developed a machine learning logic for screening chemistries to define necessary detailed DFT calculations and have identified 28 alloys which are most promising for further exploration.
9:00 AM - MT02.07.04/MT03.08.04
Graph Theory and Machine Learning Uncover Zeolite Transformation Pathways
Daniel Schwalbe Koda1,Wujie Wang1,Rafael Gomez-Bombarelli1
Massachusetts Institute of Technology1Show Abstract
Zeolites are inorganic nanoporous materials with broad industrial applications as catalysts, ion exchangers, and separators. Despite sustained research, controlling polymorphism is still a critical challenge in their design, relying mostly on trial-and-error. First-principles calculations could aid the search for new frameworks, but the number of theoretically accessible topologies and the complexity of their kinetic mechanisms render this approach computationally prohibitive. Here, we employ a suite of computational tools such as big-data, graph theory, structural kernels, density functional theory (DFT), and machine learning to explain and predict zeolite transformations. We first relate solid-state transformations to materials descriptors by combining crystallography with a graph-theoretical metric. Supported by exhaustive literature, we then show that interzeolite diffusionless transformations occur only between graph-similar pairs. Moreover, all known instances of intergrowth take place between either structurally- or topologically-similar structures. Our metric suggests hundreds of low-distance pairs between known frameworks and thousands of hypothetical frameworks for realizing novel transformations and intergrown crystals. Such insights are further refined by atomistic simulations. Building on millions of DFT calculations, we parameterize interatomic interactions in pure-silica zeolites using neural network models and active learning. The method enables accurate structural optimizations and off-equilibrium energy sampling with low computational cost, allowing the selection of favorable graph-driven phase transitions between frameworks and uncovering new synthetic pathways for zeolites.
9:15 AM - MT02.07.05/MT03.08.05
Automatic Processing of the Scientific Literature to Accelerate Nanomaterials Design and Discovery
Anna Hiszpanski1,Brian Gallagher1,Karthik Chellappan1,Peggy Pk Li1,Shusen Liu1,Hyojin Kim1,Jinkyu Han1,Bhavya Kailkhura1,David Buttler1,T. Yong-Jin Han1
Lawrence Livermore National Laboratory1Show Abstract
A significant challenge in utilizing machine learning approaches to accelerate materials development is the lack of large and structured data sets. While there are ongoing community efforts to create collaborative materials databases and repositories for this purpose, the diversity and breadth of data types, length scales, and applications still makes it challenging to create such all-encompassing materials databases that are of broad practical use. However, if tools are developed to automatically process the vast scientific literature and extract and structure information of interest to a given user, then such tools can enable the easy creation of personalized databases with user-specified relevant information to which data mining approaches can then be applied.
We developed such tools for the automated extraction of a suite of information from the text of articles pertaining to nanomaterials synthesis and demonstrate their utility for nanomaterials synthesis. Attaining nanomaterials of desired composition, dimension, and morphology is critical for end-use applications but challenging to do, often requiring time-consuming iterations of synthesis and characterization. Using a corpus of 35k nanomaterials-related articles, we first use a simple unsupervised classification algorithm based on the frequency of occurring terms to identify the primary nanomaterial composition and morphology in each article. Classifying and analyzing articles based on their targeted nanomaterial composition and morphology by itself provides a bird’s eye view and can help identify “hot topics” in the field or alternatively under-studied or challenge-to-synthesize nanomaterials. Next, we apply a supervised machine learning approach to our corpus to identify and extract from articles’ text the sentences related to the nanomaterials’ synthesis protocols, thereby yielding a useful synthesis reference library. Interesting, we find that function words (i.e., to, in, for, of, at) commonly omitted in natural language processing of non-scientific text are in fact a characteristic trait in discriminating between synthesis- and non-synthesis-related sentences in scientific text. With synthesis protocols in-hand, we further process these via chemical entity recognition (CER) to identify and extract the chemicals used in various nanomaterials’ syntheses. We evaluate a variety of open-access CER tools, as well as our own in-house developed CER tool, that each utilize different tokenizers for parsing the text and techniques for identifying chemicals, and we find that, despite the variety of approaches undertaken, most tools have comparable performance with a peak f1 score of 87%. Normalizing the chemicals names extracted from articles, we then have the opportunity to compare the frequency of use of chemicals for various nanomaterial morphologies. We demonstrate how such analysis provides useful insights as to the importance of chemicals in directing the growth of nanoparticles during synthesis to desired morphologies, like for example nanowires versus nanospheres versus nanocubes. We package this database created entirely by extracting information from existing nanomaterials literature into a browser-based visualization tool we developed that enables easy exploration of the data, thereby helping guide hypothesis generation and reduce the potential parameter space during experimental design.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-777678
9:30 AM - MT02.07/MT03.08
10:00 AM - MT02.07.07/MT03.08.07
High Throughput Experimental Materials Research Methods at NREL
National Renewable Energy Laboratory1Show Abstract
Bridging the gap between computational predictions and industrial applications requires acceleration and automation of experimental synthesis, characterization and data analysis. High Throughput Experimentation (HTE). also known as combinatorial experiments, is one possible approach to accelerate materials research. Thus, HTE combinatorial methods have been regarded as a promising approach to fulfill the promise of the materials genome initiative (MGI), complementary to high throughput computations and industrial research and development.
This presentation will focus on the state of high throughput experimental materials research methods at National Renewable Energy Laboratory (NREL). First, I will discuss methods for creating later gradients of thin film sample properties, in particular going beyond chemical composition of metals . Then, I will talk about spatially-resolved characterization methods, including interlaboratory exchange of samples . Finally, I will highlight our efforts in streamlining data analysis in combinatorial materials science, including a recently published COMBIgor software package . These methods will be illustrated by a broad range of materials examples, including oxides, chalcogenides, and nitrides.
 ACS Combinatorial Science (2018) 20 436
 ACS Combinatorial Science (2019) 21 350
 ACS Combinatorial Science (2019), DOI: 10.1021/acscombsci.9b00077
10:30 AM - MT02.07.08/MT03.08.08
Machine Learning-Assisted High Throughput Synthesis and Characterization of Hybrid Polymer-Carbon Nanotubes Composites for Thermoelectric Application
Daniil Bash1,2,Anas Abutaha2,Yang Xu2,Yee Fun Lim2,Vijila Chellappan2,Zekun Ren3,Isaac Tian3,1,Pawan Kumar2,Swee Liang Wong2,Jose Recatala Gomez2,4,Jayce Cheng2,Tonio Buonassisi5,3,Kedar Hippalgaonkar2
National University of Singapore1,Institute of Materials Research and Engineering2,Singapore-MIT Alliance for Research and Technology (SMART)3,University of Southampton4,Massachusetts Institute of Technology5Show Abstract
The so-called 4th revolution in science began with the advent of machine learning (ML), as well as high-throughput (HT) experimentation and robotization. Herein, we describe a workflow that enables rapid screening of a parameter space of hybrid composites, comprised by carbon nanotubes and poly-3-hexylthiophene (CNTs:P3HT) for thermoelectric (TE) applications, with Bayesian optimization embedded in the feedback loop in order to explore the space in a more efficient fashion.
The parameter space under scrutiny includes 4 types of nanotubes (both single- and multiwall), 16 CNTs:P3HT ratios, 2 solvents (o-DCB, and chloroform) and 3 doping conditions, which result in 384 unique synthetic parameters. The setup used involves a robotic pipettor and the microfluidic flow reactor with XY stage for automatic drop casting. We synthesize more than 150 samples per hour, as compared to 4 samples per hour with traditional manual procedures. Characterization was done with means of hyperspectral imaging and 4-point-probe measurements to ascertain optical properties and electrical conductivity.
The conductivity data of the initial experiments was used to train the ML algorithm. After training, the algorithm inferred new experimental conditions for achieving highest possible conductivity, closing the feedback loop. In the end, 3 iterations of the experiments yielded the value of conductivity higher than 150 S cm-1.
Further work includes the optimization of the experimental setup as well as the ongoing effort to use hyperspectral imaging to bypass the bottleneck step i.e. profilometry, as it is the main source of error.
10:45 AM - MT02.07.09/MT03.08.09
Data-driven Materials Design of Halide Perovskites for Photovoltaic Applications
Shijing Sun1,Noor Titan Putri Hartono1,Felipe Oviedo1,Zekun Ren1,Janak Thapa1,Zhe Liu1,Armi Tiihonen1,Ian Marius Peters1,Juan Pablo Correa Baena2,Tonio Buonassisi1,Savitha Ramasamy3
Massachusetts Institute of Technology1,Georgia Institute of Technology2,Institute of Infocomm Research3Show Abstract
o meet increasing global energy demand, it is critical yet challenging to explore new methods to accelerate the development of novel energy materials. In recent years high-throughput experimentation (HTE) and machine-learning techniques have become increasingly accessible to scientific researchers. We herein demonstrate a case study on the data-driven design of perovskite-inspired materials for photovoltaic applications, where we employed machine-learning techniques to guide the synthesis of new halide perovskites for photovoltaic applications. Halide perovskites (ABX3, where A = Cs, methylammonium (MA), formamidinium (FA); B = Pb, Sn; and X = Cl, Br, I) have shown great promise as light absorbers. Solar cells based on perovskites have surprised the energy community as an emerging low-cost photovoltaic technology with a record power conversion efficiency (24.2%) now exceeding polycrystalline Si cells (22.3%). In this study, we developed a high-throughput experimental platform for thin-film synthesis and characterization, and investigated 75 unique perovskite compositions interest for energy-harvesting applications in a two-month period. To achieve desired optoelectronic properties, we established a set of selection criteria for screening. A deep neural network is employed to classify compounds into 0D, 2D, and 3D perovskite structures via X-ray diffraction patterns analysis.  The combination of fast synthesis and machine-learning assisted data diagnostics achieves an acceleration of over an order of magnitude per experimental learning cycle over our laboratory baseline. Among the 41 Pb-free perovskite compositions studied, we identified the optimised doping level in a multi-site alloy series, Cs3(Bi1-xSbx)2(I1-xBrx)9, where a desired structural (2D) and optical properties (< 2 eV).  Our work contributes to the prospect of automated materials discovery and we envision an accelerated development in functional materials in the next decade aiming to provide new energy solutions.
 NREL. National Renewable Energy Laboratory, Best Research Cell Efficiencies http://www.nrel.gov/ncpv/images/efficiency_chart.jpg. (accessed June 14, 2019).
 Oviedo, F.; Ren, Z.; Sun, S.; Settens, C.; Liu, Z.; Hartono, N. T. P.; Ramasamy, S.; DeCost, B. L.; Tian, S. I. P.; Romano, G.; et al. Fast and Interpretable Classification of Small X-Ray Diffraction Datasets Using Data Augmentation and Deep Neural Networks. npj Comput. Mater. 2019, 5 (1), 60.
 S. Sun, N. T. P. Hartono, Z. D. Ren, F. Oviedo, A. M. Buscemi, M. Layurova, D. X. Chen, T. Ogunfunmi, J. Thapa, S. Ramasamy, C. Settens, B. L. DeCost, A. G. Kusne, Z. Liu, S. I. P. Tian, I. M. Peters, J. P. Correa-Baena and T. Buonassisi, Joule, , DOI:10.1016/j.joule.2019.05.014.
11:00 AM - MT02.07.10/MT03.08.10
Application of Variational Autoencoders to Create Thin Film Structure Zone Diagrams
Lars Banko1,Yury Lysogorskiy1,Ralf Drautz1,Alfred Ludwig1
Ruhr-Universität Bochum1Show Abstract
Structure zone diagrams (SZD) are frequently used to estimate thin film microstructures based on a few chosen synthesis parameters. Despite their usefulness, the predictive power of classical SZD is very limited due to the complexity of the synthesis-microstructure relationship of thin films. Furthermore, the complicated interplay of many synthesis parameters and compositional complexity hinders a generalisation. Classical SZD have in common that they are based on a small number of observations. Underlying trends were extracted by the scientists‘ expertise and in a creative process abstracted into a diagram representation of microstructural features. Several refined SZD were proposed, which implemented more physical parameters. With emerging developments in combinatorial thin film synthesis and high-throughput characterization a fast, high-quality acquisition of microstructure data is now possible. This and progress in machine learning of images now provides tools to handle complex image data and improve SZD: We present a dataset containing > 100 samples of SEM surface images from Cr-Al-O-N material libraries, each featuring a different chemical composition and synthesis condition such as deposition temperature, ion energy and sputter frequency of high power impulse magnetron sputtering (HiPIMS). We train convolutional variational autoencoders (VAE) on this dataset of augmented SEM surface image data. Results show that VAEs can cluster microstructure data through latent space representations. The performance of different neural network architectures is discussed. The VAEs generative capabilities to predict SEM surface images from chemical composition and synthesis parameters are investigated. By sampling of the latent representation, we are able to generate SZDs for different variations and combinations of input parameters like temperature, ion energy and chemical composition. The qualitative trends which we observe demonstrate the prediction of microstructure by generative deep learning models.
11:15 AM - MT02.07.11/MT03.08.11
Generative Adversarial Networks with Molecular Graph Convolution for Learning Secondary Structures of Functional Biomolecules
Siddharth Rath1,Oliver Nakano-Baker1,Jonathan Francis-Landau1,Ximing Lu1,Kevin Jamieson1,Burak Ustundag1,2,Mehmet Sarikaya1
University of Washington1,Istanbul Teknik Universitesi2Show Abstract
Generative models, a recent paradigm in machine learning has revolutionized the industry by generating ‘natural looking’ data. While such models have found limited applications in the domain sciences, they display untapped potential in generating materials or molecular structures commensurate with target properties and desired functionalities. While the protein folding problem has been addressed previously by multilevel computational methods and various deep convolutional neural networks, unfortunately, the key step of encoding atomic structures for computational treatment is a challenge. Historical efforts have focused on pre-process featurization that relies upon traditional string representation without any structural information, expert-designed heuristics-based inputs, or on volumetric modeling that presumes a specific predetermined conformation without associated functionality. Here we demonstrate the first implementation of generative models, more precisely, generative adversarial networks with graph encoding of atomic connectivity within the biomolecules, for data-driven prediction of peptide and protein conformations associated with particular functionalities such as binding to atomically flat surfaces and biomineralization. In the graph input, atoms are considered as nodes and the bonds are considered edges, while angles in the molecule are encoded as a third order tensor between any three nodes. The generator tries to output secondary structures in terms of the bond edges and angle tensors while the discriminator network learns from existing sequences and their secondary structures from pdb files and MD simulations. We test the predicted results with MD simulations as well as circular dichroism experiments. Results show that the generative model developed herein is generalizable to any functionality and more accurate than existing methodologies for predicting functionality-associated peptide conformations for practical implementations in disease diagnostics, drug screening, biosensing and bioelectronic devices.As part of the Materials Genome Initiative, the research is supported by NSF-DMREF program through the grant DMR-1629071.
MT02.08/MT03.09: Joint Session: Machine Learning Augmented High-Throughput Experimentation II
Wednesday PM, December 04, 2019
Hynes, Level 2, Room 210
1:30 PM - MT02.08.01/MT03.09.01
Prediction Interpretability in Data-Driven Materials Development
Julia Ling1,Astha Garg1,James Peerless1,Erin Antono1,Edward Kim1,Yoolhee Kim1,Nils Persson1,Malcolm Davidson1
Citrine Informatics1Show Abstract
Sequential learning is a data-driven workflow for accelerating materials development. In this iterative workflow, machine learning models are used to explore a “design space,” the set of possible experiments that could be performed, to surface promising candidate materials. Experimental data for those candidate materials are used to retrain the models so that they can provide successively better-informed suggestions.
For sequential learning to be effective, a relevant design space of candidate materials must first be constructed. These design spaces often include complex constraints, as well as a mix of continuous and categorical variables. The machine learning model can be used to sift through the design space to surface the most promising candidates. For these top candidates, it is valuable to have insights into how the model made its predictions and why it predicts high performance. Interpretability analysis can increase confidence in the model predictions, uncover sample bias in the underlying training data, and provide information on the robustness of the predicted performance. This talk will discuss approaches for constructing relevant design spaces and for interpreting model predictions, and show how these approaches fit into the overall sequential learning workflow.
2:00 PM - MT02.08.02/MT03.09.02
Network Theory Meets Materials Science
Muratahan Aykol2,Vinay Hegde1,Christopher Wolverton1
Northwestern University1,Toyota Research Institute2Show Abstract
One of the holy grails of materials science, unlocking structure-property relationships, has largely been pursued via bottom-up investigations of how the arrangement of atoms and interatomic bonding in a material determine its macroscopic behavior. Here we consider a complementary approach, a top-down study of the organizational structure of networks of materials, based on the interaction between materials themselves. We demonstrate the utility of applying network theory to materials science in two applications: First, we unravel the complete “phase stability network of all inorganic materials” as a densely-connected complex network of 21,000 thermodynamically stable compounds (nodes) interlinked by 41 million tie-lines (edges) defining their two-phase equilibria, as computed by high-throughput density functional theory. We find that the node connectivity in the materials network has a lognormal distribution, and the connectivity decreases with the number of elemental constituents in a material. Analyzing the topology of this network of materials has the potential to uncover new knowledge inaccessible from traditional atoms-to-materials paradigms. Using the connectivity of nodes in this phase stability network, we derive a rational, data-driven metric for material reactivity, the “nobility index”, and quantitatively identify the noblest materials in nature. Second, we apply network theory to the problem of synthesizability of inorganic materials, a grand challenge for accelerating their discovery using computations. We combine the above phase stability network with timelines for the first experimental synthesis of each compound from literature citations. This allows us to create a time-dependent network, and from the time-evolution of the underlying network properties, we use machine-learning to predict the likelihood that hypothetical, computer generated materials will be amenable to successful experimental synthesis. ** In collaboration with S. Kirklin, L. Hung, S. Suram, P. Herring, and J. Hummelshoj
2:30 PM - MT02.08/MT03.09
3:30 PM - MT02.08.03/MT03.09.03
A Database to Enable the Discovery and Design of Atomically Precise Nanoclusters
Sukriti Manna1,Peter Lile1,Alberto Hernandez1,Tim Mueller1
Johns Hopkins University1Show Abstract
Atomically precise nanoclusters can be used for numerous applications due to the unique properties they possess. Despite their wide range of applications, the structures and properties of many small elemental clusters remain unknown. We present the “The Quantum Cluster Database,” an open-access source containing the structures and properties of tens of thousands of cluster structures of up to 55 atoms for 55 elements. The structures are compared against previous computational and experimental data where available. We discuss the methods that are being used to accelerate the construction of the database and describe how the database can be accessed for cost-effective, data-driven materials design.
3:45 PM - MT02.08.04/MT03.09.04
Data Driven Experimental Discovery of New Nitride Materials
Andriy Zakutayev1,Sage Bauers1,Elisabetta Arca1,Wenhao Sun2,Chris Bartel3,John Perkins1,Aaron Holder3,Stephan Lany1,Gerbrand Ceder2
National Renewable Energy Laboratory1,University of California, Berkeley2,University of Colorado Boulder3Show Abstract
New materials enable new technologies, so discovery of new materials is one of the most important directions in materials research. Oxides and some other materials chemistries, which have been extensively explored in the past, yielded many spectacular properties. Other chemistries, such as nitrides, have been barely touched: for every 14 documented oxides there is only 1 known nitride.
We will present on data-driven experimental discovery of new nitride materials, focusing on experimental synthesis and characterization, while also featuring computational predictions and machine learning. The data mining efforts followed by first-principles calculations and machine learning analysis indicated that there are 93 unexplored ternary metal nitride chemical spaces, with 244 new predicted stable ternary materials, and explained the stability trends among these and other nitrides .
Experimental synthesis using high-throughput combinatorial methods realized 7 of these compounds, including Zn-M-N (M= Sb, Mo, W) with wurtzite-derived crystal structures and Mg-TM-N (TM = Nb, Ti, Zr, Hf) with rocksalt-derived crystal structures. Physical property characterization results of the ternary rocksalts indicate that they are semiconductors with 1.8-2.1 eV optical absorption onsets and large dielectric constants . The Zn-Sb-N wurtzite is the first ever reported Sb-containing nitride, with Sb in unusually high 5+ valence state, and measured room-temperature photoluminescence near 1.6-1.7 eV solar matched band gap 
Overall, these results both demonstrate the power of data-driven materials discovery, and suggest that many new previously unreported nitride materials remain to be synthesized.
 W. Sun et al, Nature Materials (2019), DOI: 10.1038/s41563-019-0396-2
 S. Bauers, et al Proc. Nat. Acad. Science (2019), DOI: 10.1073/pnas.1904926116.
 E. Arca et a;, Materials Horizons (2019), DOI: 10.1039/c9mh00369j
4:00 PM - MT02.08.05/MT03.09.05
Active Learning for Nanophotonic Design via Multi-Fidelity Physical Models
Katherine Fountaine2,Harry Atwater1,Jialin Song1,Yury Tokpanov1,Yuxin Chen1,Dagny Fleischman1,Yisong Yue1
California Institute of Technology1,Northrop Grumman Corporation2Show Abstract
We have explored the design of nanophotonic structures, such as subwavelength-scale spectral filters, using an advanced active machine learning algorithm that efficiently explores multiple physical models with different approximation fidelities and costs. Our method, which is applicable to a variety of nanophotonics optimization problems, employs a novel strategy consisting of a mutual information based multi-fidelity Gaussian process optimization algorithm (MF-MI-Greedy). It consists of two components: an exploratory procedure to gather information about the target (i.e., the highest fidelity) function via querying lower fidelity functions, followed by an exploitative procedure to optimize the target level fidelity with the previously gathered information. Our results on several pre-collected nanophotonics datasets demonstrate the compelling performance of the multiple-fidelity Bayesian optimization approach. These experiments suggest that there is a significant potential in utilizing cheap, multi-fidelity simulations to aid the discovery of optimal photonic nanostructures.
4:30 PM - MT02.08.06/MT03.09.06
Accelerating Materials Discovery through Rapid Construction of Processing Phase Diagrams
Duncan Sutherland1,Aine Connolly1,Sebastian Ament1,Michael Thompson1,Carla Gomes1,Bruce van Dover1
Cornell University1Show Abstract
Exhaustive experimental mapping of non-equilibrium processing phase diagrams demands a prohibitively huge allocation of resources for even a single realistic system with more than two compositional degrees of freedom, even with current state-of-the-art high-throughput techniques. Advanced data analysis methods are thus called for to accelerate such explorative efforts, focusing on multimode analysis of critical boundary points in the phase diagram where transitions are observed. Here, we present a hierarchical, prioritized data analysis structure to optimize usage of costly experimental resources. By combining data analysis methods based on optical characterization and x-ray diffraction with sophisticated active learning algorithms, we can efficiently map phase boundaries in composition-time-temperature processing phase diagrams. We demonstrate the utility of our approach by constructing processing phase diagrams for spike annealed multicomponent oxide materials
4:45 PM - MT02.08.07/MT03.09.07
High-Throughput Screening of Perovskite-Inspire Materials Using Steady-State Photoconductivity and Bayesian Optimization
Felipe Oviedo1,Jose Perea1,Han Yin1,Janak Thapa1,Armi Tiihonen1,Zhe Liu1,Ian Marius Peters1,Shijing Sun1,Rafael Jaramillo1,Tonio Buonassisi1
Massachusetts Institute of Technology1Show Abstract
Hybrid organic-inorganic perovskites solar cells have recently increased scientific interest for their manufacturing simplicity and high performance, challenging the best thin-film photovoltaic devices. Perovskites solar cells have broad and strong light absorption along with excellent transport properties that partly explain their record power conversion efficiency above 24% [1, 2, 3]. Compositional engineering of perovskites is a time-consuming effort. Reaching high efficiency in compositions with complex and diverse dopants that usually requires significant trial and error and hundreds of measurements of full solar cells . In addition to the abundance of applied research on this subject, there is great interest in understanding the fundamentals of transport and photophysical properties of various perovskites compositions . High-throughput methods for screening compositions at the film level could be a potentially powerful alternative to investigate the complex perovskite composition space efficiently. Nevertheless, screening for high-efficiency perovskite compositions with high-throughput methods is not yet firmly established, in part due to the complexity of photophysical characterization experiments at the film level. In this work, we report for the first time a combination of high-throughput conventional steady-state photoconductivity method, (SS-PC) to determine diffusion lengths and Bayesian optimization methodology. This approach allows us to investigate the complex compositional space efficiently by just making films and not full solar cells. By using QSS-PC as a proxy for efficiency, we use Bayesian optimization to guide compositional changes and obtain the best solar cell efficiency for a given material, accelerating material screening by 10X.
 Silver-Hamill Turren-Cruz et al. Energy Environ. Sci., 2018, 11, 78, DOI: 10.1039/c7ee02901b
 Best Research-Cell Efficiencies (NREL, accessed 02 January 2019); https://www.nrel.gov/pv/assets/pdfs/pv-efficiency -chart.20181221.pdf
 Jiajun Peng et al. Chem. Soc. Rev., 2017, 46, 5714, DOI: 10.1039/c6cs00942e
 Ian L. Braly at al. J Phys. Chem Lett. 2018, 9, 3779-3792, DOI: 10.11021/acs.jpclett.8b11520
 Y. Chen et al. NATURE COMMUNICATIONS | 7:12253 | DOI: 10.1038/ ncomms12253 |
MT02.09: Poster Session III: Machine Learning Augmented High-Throughput Experimentation
Thursday AM, December 05, 2019
Hynes, Level 1, Hall B
8:00 PM - MT02.09.01
Machine Learning for Revealing Aging Mechanisms of Perovskite Solar Cells
Armi Tiihonen1,Shijing Sun1,Jose Perea1,Felipe Oviedo1,Zhe Liu1,Noor Titan Putri Hartono1,Janak Thapa1,Tonio Buonassisi1
Massachusetts Institute of Technology1Show Abstract
Perovskite solar cells are among the most promising new photovoltaic technologies, boasting with rocketing efficiency records and raising interest as a component of tandem solar modules. Although the stability of perovskite solar cells is steadily improving and the most stable perovskite devices already pass 1000-hour aging tests under operating conditions , insufficient lifetime still remains a major bottleneck of the technology.
The aging mechanisms of perovskite solar cells need to be understood better for suppressing them. Several aging mechanisms, such as perovskite decomposition or ion migration in the device , have been identified in literature, and the observed aging mechanisms depend highly on the environmental conditions the devices are facing, such as temperature, humidity, and visible or ultraviolet illumination. More information about the activation and ground causes of the degradation of the devices is still needed, but the majority of aging tests in the literature do not demonstrate clear degradation of the devices . The next step is to shift the focus of research community towards longer aging tests producing clear degradation with detailed analysis [1-2], and new working methods, such as machine learning, are required for extracting all the possible information from the laborious aging test.
In this contribution, we create a comprehensive mapping of the degradation of perovskite solar cells by aging devices until clear degradation under several different combinations of environmental stress conditions, such as illumination, increased humidity, and high temperature. Automatized measurement systems are utilized for collecting densely sampled aging data with sample sizes that are statistically significant. We utilize machine learning methods for analyzing the relatively large data set our approach produces. This way, we are able to extract more refined information on the activation and interlinking of the detected aging mechanisms.
 C. C. Boyd, R. Cheacharoen, T. Leijtens, and M. D. McGehee, “Understanding Degradation Mechanisms and Improving Stability of Perovskite Photovoltaics,” Chemical Reviews, vol. 119, no. 5, pp. 3418–3451, Mar. 2019.
 A. Tiihonen, K. Miettunen, J. Halme, S. Lepikko, A. Poskela, and P. D. Lund, "Critical analysis on the quality of stability studies of perovskite and dye solar cells," Energ & Environmental Science, vol. 11, no. 4, pp. 730-738.
8:00 PM - MT02.09.02
High-Throughput Discovery of Next Generation Sequencing-Based Peptide-Guided New Materials via Machine Learning
Jacob Rodriguez1,Deniz Yucesoy1,Siddharth Rath1,Jason Stephany1,Doug Fowler1,Mehmet Sarikaya1
University of Washington1Show Abstract
A crucial problem in the development of nanobiotechnology is control of the interface between the organic molecule and the substrate. Mastery of the field would enable highly sensitive biosensors capable of detecting disease biomarkers, affinity-tunable drugs, and efficient synthesis of solid biomaterials, among many other applications. Peptides are short amino acid sequences (4-40 long) that can have diverse conformational states when placed near a bias (i.e., 2-D materials, e.g., MoS2, electrical fields, etc.) and/or in solution (i.e., PBS, deionized water, etc.). These sequences can be found in larger proteins or designed in silico when the conformational behavior is often well-defined. Machine learning approaches may be used in many applications including drug design and continues to produce successful results. However the large data sets are required to train the ML algorithms to enable them to identify the dominant features. In a convergent science approach, here we identified three biological replicas of ~2.5 million unique 12-amino acid long peptide sequences with affinity for MoS2 using a novel method that combines Next-Generation Sequencing and Phage-Display directed evolution approaches. The massive size of this dataset grants a much greater understanding of why peptides bind to MoS2 upon development of the correct model or a large amount of experimental validation. We have developed a multitude of machine learning applications for the prediction of binding behavior including linear regression and recurrent neural network methods. Both methods seek to predict the functional affinity of this massive MoS2 binding set using the data from the combinatorial libraries that are indirectly related to the solid-binding affinity. Further, the diversity in our approach yielded informative conclusions about the types of ML applications appropriate for this genetic dataset. Our results show that the functionality of peptides can be predicted using linear regression within and across replicates with high accuracy (0.75 to 0.9 Pearson/Spearman Rank Correlation). Encouraged by the preliminary results, we are currently implementing ML-assisted directed evolution experiment along with recurrent neural networks to determine the evolution path for high throughput labelling towards creating custom libraries. The research is supported by NSF-DMREF program through the grant DMR-1629071 as part of the Materials Genome Initiative.
8:00 PM - MT02.09.03
Data Driven Analysis of Dielectric Constants in Inorganic Materials
Kazuki Morita1,Daniel Davies1,Keith Butler2,1,Aron Walsh1,3
Imperial College London1,Rutherford Appleton Laboratory2,Yonsei University3Show Abstract
Dielectric constants are crucial to understanding optical and electric properties of materials. A low-cost predictor of dielectric behaviour would be highly valuable in materials science. Models such as Clausius-Mossotti and Penn's model are well known and have been used intensively, however, they are known to only hold for limited types of compounds. Previously, Han and co-workers have screened ~2000 compounds to find high dielectric constant materials with a large band gap[1,2]. They reported several materials that do not follow the existing models. Simple physical models are generally inadequate to capturing the trends in dielectric constants, which derive from many-body interactions. We propose the use of statistical approaches to describe them. We train multiple machine learning models using database of 1636 compounds. Our analysis show that some models were successful in capturing the complex trends in dielectric behaviour. Comparing with conventional models, the machine learning model gave an order of magnitude improvement in predictive power. The chemical trends in dielectric constants will also be discussed.
 Novel high-κ dielectrics for next-generation electronic devices screened by automated ab initio calculations. NPG Asia Mater. 7, e190 (2015); https://doi.org/10.1038/am.2015.57
 High-throughput ab initio calculations on dielectric constant and band gap of non-oxide dielectrics. Sci. Rep. 8, 14749 (2018); https://doi.org/10.1038/s41598-018-33095-6
8:00 PM - MT02.09.04
A Semi-Automatic Pipeline for Efficient and Sustained Polymer Data Capture
Pranav Shetty1,Rampi Ramprasad1
Georgia Institute of Technology1Show Abstract
Machine Learning (ML) has enabled huge strides to be made in fields as diverse as Computer Vision, Natural Language Processing, Robotics etc . Materials Science has also seen a flurry of work in recent years involving use of statistical techniques to predict material properties. ML models are data-hungry and their predictive accuracy improves asymptotically with the amount of data fed into it. As most materials property data is in journal papers and not in an easy to use database, prospective researchers must painstakingly extract this data manually. We hope to address this issue in the polymer space by building a framework that can automatically extract polymer property data from published literature. Data can be extracted systematically from 1000’s of papers to generate insight that would be very difficult to do manually.
Our framework aims to capture data from text in published literature and from tables. We generate word vectors , a mathematical way of representing words in a high dimensional space, from our corpus of polymer papers and use those to tokenize the text and thus extract polymer property information. We extract properties such as glass transition temperature, polymer melting temperature, dielectric constant, refractive index etc from tens of thousands of papers to generate ∼1000’s of data points for each property. Similar work has been done in the space of extracting synthesis parameters for inorganic materials from literature  but this is the first such framework for the polymer domain. With the extracted data, we can power ML models that can be used to accelerate materials discovery and design new polymers.
 Pedro Ballester and Ricardo Matsumura Araujo. On the performance of googlenet and alexnet applied to sketches. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
 Damodar Reddy Edla, Pawan Lingras, and K Venkatanareshbabu. Advances in Machine Learning and Data Science: Recent Achievements and Research Directives , volume 705. Springer, 2018.
 Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems , pages 3111–3119, 2013.
 Edward Kim, Kevin Huang, Adam Saunders, Andrew McCallum, Gerbrand Ceder, and Elsa Olivetti. Materials synthesis insights from scientific literature via text extraction and machine learning. Chemistry of Materials, 29(21):9436–9444, 2017.
 Zach Jensen, Edward Kim, Soonhyoung Kwon, Terry ZH Gani, Yuriy Roman-Leshkov, Manuel Moliner, Avelino Corma, and Elsa Olivetti. A machine learning approach to zeolite synthesis enabled by automatic literature data extraction. ACS Central Science, 2019.
Jason Hattrick-Simpers, National Institute of Standards and Technology
Barnabas Poczos, Carnegie Mellon University
Markus Reiher, ETH Zurich
Aleksandra Vojvodic, University of Pennsylvania
Machine Learning: Science and Technology | IOP Publishing
Matter & Patterns | Cell Press
MT02.10: High-Throughput Experimentation and Machine Learning I
Thursday AM, December 05, 2019
Hynes, Level 2, Room 210
8:30 AM - MT02.10.01
Beyond Just Fitting Numbers—Artificial Intelligence for Identifying Statistically Exceptional Materials
Luca Ghiringhelli1,Matthias Scheffler1,2
Fritz-Haber-Institut der MPG1,Humboldt-Universität zu Berlin2Show Abstract
Several issues hamper progress in data-driven computational science. In particular, these are a missing FAIR  data infrastructure and appropriate data-analytics methodology .
Significant efforts are still necessary to fully realize the A and I of FAIR. Here the development of metadata, their intricate relationships, and data ontology need more attention. Obviously, a FAIR data infrastructure – for being accepted by the community – should work without bureaucratic hurdles or the needs for special training. In this talk, I will discuss the challenges and progress, focusing on computational materials science.
Concerning the data-analytics, we note that the number of possible materials is practically infinite, but only 10 or 100 of them may be relevant for a certain science or engineering purpose. In simple words, in materials science and engineering, we are often looking for “needles in a hay stack”. Fitting or machine-learning all data (i.e. the hay) with a single, global model may average away the specialties of the interesting minority (i.e. the needles). I will discuss methods that identify statistically-exceptional subgroups in a large amount of data, and I will discuss how one can estimate the domains of applicability of machine-learning models. 
1) FAIR stands for Findable, Accessible, Interoperable and Re-usable. The FAIR Data Principles; https://www.force11.org/group/fairgroup/fairprinciples
2) C. Draxl and M. Scheffler, Big-Data-Driven Materials Science and its FAIR Data Infrastructure. Plenary Chapter in Handbook of Materials Modeling (eds. S. Yip and W. Andreoni), Springer (2019). https://arxiv.org/ftp/arxiv/papers/1904/1904.05859.pdf
3) Ch. Sutton, M. Boley, L. M. Ghiringhelli, M. Rupp, J. Vreeken, M. Scheffler, Domains of Applicability of Machine-Learning Models for Novel Materials Discovery, to be published.
9:00 AM - MT02.10.02
Crystallographic Information and Thermoelectric Properties Obtained from High-Throughput Experiments of Ca1-xBixMnO3 Powder
Kenjiro Fujimoto1,Yusuke Yamada1,Akihisa Aimi1,Keishi Nishio1,Shingo Maruyama2
Tokyo University of Science1,Tohoku University2Show Abstract
Material prediction by machine learning in recent years requires huge data, and in many cases it is supplemented by text mining and computational chemical simulation. I think that development of high-throughput experimental tools should also be continued in order to interpolate diversity in data and to construct a new database.
In case of high-throughput materials preparation and evaluation, we treat about several hundred samples in one day. As an example, in conventional method for synchrotron powder X-ray diffraction, we have to fill fine capillaries (c.a. 0.2 mmΦ) with well-grounded powder. We need at least 10 hours for only sample filling when we measure 100 sample in one day. Therefore, we have made a prototype for effective and high-throughput evaluation in synchrotron X-ray powder diffraction measurements. Tools made with 3D printers made it possible to continuously measure powder libraries transferred to tape.
In this study, as an example, powder library of perovskite-type Ca1-xBixMnO3 was prepared and obtained X-ray diffraction data using the our developed high throughput evaluation tool. Then, we studied which parameter contributed to the improvement of the thermoelectric performance from crystallographic information obtained our developed automated Rietveld analysis program.
As the amount of Bi substitution increased, lattice constant was linearly increased based on Vegard's law. From the change of the bond length of Mn-O and the tilt angle of MnO6 octahedron depending on the Bi substitution amount, the decrease of the electrical conductivity is ideally expected. However, the power factor (PF = S2 × σ) increased as the substitution amount increased. From these results, it was thought that the increase in PF value was related to the carrier concentration more strongly than the other parameters.
These XRD and XAFS experiments were conducted at the BL5S1 and BL5S2 of Aichi Synchrotron Radiation Center, Aichi Science & Technology Foundation, Aichi, Japan (Approval No.2018P0104).
9:15 AM - MT02.10.03
A High-Throughput Study of Refractory High-Entropy Alloys Guided by Machine Learning
Howie Joress1,Nils Persson1,Brian DeCost1,Jason Hattrick-Simpers1
National Institute of Standards and Technology1Show Abstract
While refractory high entropy alloys (HEAs) have shown great promise as candidates for the next generation of high-temperature materials, identifying the alloys with the best properties is challenging due to the vast compositional-processing space they occupy. In this work we are particularly interested in HEAs with corrosion resistance at high temperature. Even if we only consider quaternary alloys with the 10 elements typically included in refractory HEAs (RHEAs) there are still nearly 2.5 million alloy combinations. Here we will discuss our work combining high-throughput combinatorial thin-film synthesis, rapid characterization, and machine learning to quickly explore the RHEA space. We begin by using available materials datasets to train a machine learning model using the Citrine machine learning platform. The model will correlate alloy composition with phase as well as mechanical properties. We then fabricate continuous spread combinatorial libraries by magnetron sputtering, each library having a range of alloy compositions (effectively nearly 200 discreet compositions). We rapidly characterize these alloys for phase using X-ray diffraction followed by high-temperature oxidation and corrosion resistance. For oxidation resistance we oxidize the films at temperatures up to 1000 °C and examine the oxidation products. We perform rapid electrochemical corrosion screening using a scanning droplet cell. As we measure alloy properties we catalog the results and use them to expand the model to predict oxidation and corrosion resistance.
9:30 AM - MT02.10.04
Accelerating Generation of Fundamental Materials Insights by Analyzing Machine Learning Models
Mitsutaro Umehara1,2,Helge Stein1,Dan Guevarra1,Paul Newhouse1,David Boyd1,John Gregoire1
California Institute of Technology1,Toyota Motor North America2Show Abstract
Machine learning have been transferred to several research fields of material science. A primary role of scientists is extraction of fundamental knowledge from data, and we demonstrate that this extraction can be accelerated using neural networks via analysis of the trained model itself rather than its application as a prediction tool. We trained Convolutional Neural Network (CNN) model, which predicts photoelectrode performance (photogeneration power for the light-driven oxygen evolution reaction) of BiVO4:(Mo, W, Tb, Gd, Dy) from Raman spectroscopy spectra and composition data obtained by High Throughput Experiments (HTE). The dataset includes different Bi:V ratios as well as single and co-alloying compositions of each of the 5 alloying elements, resulting in a high dimensional dataset that is not amenable to manual analysis. Interpretation of this high dimensional dataset is facilitated by analyzing the trained CNN model by evaluating local partial derivatives (Gradient Analysis), revealing key data relationships that are not readily identified by human inspection or traditional analyses. Statistical analysis of the ensemble of ~1 million gradients provides quantitative composition-structure-property and structure-property relationships as well as the similarity of the chemical role of different alloying elements, which collectively provide insight into the fundamental material science. Furthermore, automated reporting of these key data relationships illustrates a key mechanism by which machine learning methods accelerate scientific discovery.
 Mitsutaro Umehara, Helge S. Stein, Dan Guevarra, Paul F. Newhouse, David A. Boyd & John M. Gregoire, npj Comput. Mater. 5 (2019) 34.
9:45 AM - MT02.10.05
A Data Driven Approach for the Accelerated Discovery of Photocathode Materials
Evan Antoniuk1,Yumeng Yue1,Yao Zhou2,Bruce Dunham3,Piero Pianetta3,Theodore Vecchione3,Evan Reed1
Stanford University1,Google2,SLAC National Accelerator Laboratory3Show Abstract
In this work, we utilize a data driven approach for the development of photocathode materials, which have served an integral part in developing modern x-ray light sources. In turn, these high-brightness light sources have enabled exciting discoveries. Recently, the hard x-ray free electron laser (FEL) at SLAC National Accelerator Laboratory has enabled researchers to monitor bond formation in the active site of proteins, optically tune the interlayer interactions in two- dimensional materials and probe the formation of diamonds from laser-compressed hydrocarbons. Through the further development of ultra-high brightness light sources, previously unthinkable experiments may be imagined.
Perhaps the most cost-effective method for improving the performance of FELs is through the discovery of new photocathode materials that can produce higher-brightness beams. However, past efforts for the discovery of photocathode materials have primarily used trial and error approaches with very low throughput. To date, less than 30 materials have been reported in the literature for photocathode applications. In this work, we screen over 10,000 bulk crystals in the Materials Project database to discover candidate photocathode materials. We utilize the Materials Project calculated electronic band structures as well as a newly developed photoemission model to rapidly identify materials with ideal photoemission properties. To ensure our candidate materials can be readily integrated into photocathode devices, we then filter out materials that have not yet been synthesized.
Following this filtering process, we discover over 300 candidate photocathode materials with a predicted brightness that is 10x larger than the current state-of-the-art photocathodes. We further characterize these high-brightness photocathode materials by performing high-throughput DFT work function calculations including multiple surface terminations and Miller indices. The photoemission properties of the photocathode materials are then further explored by utilizing DFT to calculate the photoexcitation probability of all possible optical excitations in a material. Through close partnerships with experimental collaborators, we discuss the possibilities for these newly discovered photocathode materials to shape the next generation of FELs.
10:30 AM - MT02.10.06
Adding Domain Knowledge and Causality to Materials Informatics
Colorado School of Mines1Show Abstract
Today’s materials science deals with data sets that are much smaller than what would be desired by the contemporary statistical methods. Historically, this problem has been dealt with using “modern” scientific method – formulation, via induction, of causally related hypotheses and their measurement-based testing through deduction. However, the complexity of materials’ behavior and the insufficient knowledge of all factors influencing experimental outcomes, in combination with the successes of AI in other fields (e.g. targeted advertising) motivated recent surge of activities in utilizing machine learning, neural networks, random forests and other methods in materials science. While of unquestionable practical value, the statistically obtained relationships suffer from the absence of causality and non-uniqueness. In this talk I will discuss ways of adding domain knowledge and causality to the data driven materials discovery and design. The approach we recently adopted starts from the appropriate theory and the analytically deduced relationships between quantities of interest. In the next step we replace hard-to-access physical quantities appearing in those relationships with physically motivated proxies that are easily accessible from materials data bases or are inexpensive to compute from first-principles. The price for this replacement is the introduction of free parameters into the models, which are then obtained by fitting to existing, usually experimental data. In this way one integrates domain knowledge with materials informatics and creates causal and interpretable models of relevant materials properties. I will review successes in applying such an approach to the discovery and design of novel semiconductors for thermoelectric, photovoltaic and power electronics applications.
11:00 AM - MT02.10.07
Mapping and Understanding Large-Scale Stability Trends across the Ternary Metal Nitrides
Wenhao Sun1,2,Chris Bartel2,3,Elisabetta Arca4,Sage Bauers4,Bethany Matthews5,Janet Tate5,Bor-Rong Chen6,Michael Toney6,Laura Schelhas6,Andriy Zakutayev4,Stephan Lany4,Aaron Holder3,Gerbrand Ceder2
University of Michigan–Ann Arbor1,Lawrence Berkeley National Labs2,University of Colorado Boulder3,National Renewable Energy Laboratory4,Oregon State University5,SLAC National Accelerator Laboratory6Show Abstract
Exploratory synthesis in novel chemical spaces is the essence of solid-state chemistry. However, uncharted chemical spaces can be difficult to navigate, especially when materials synthesis is challenging. Nitrides represent one such space, where stringent synthesis constraints have limited the exploration of this important class of functional materials. Here, we employ a suite of computational materials discovery and informatics tools to survey, visualize, and explain stability relationships across the inorganic ternary metal nitrides. First, we use crystal structure prediction algorithms to probe the stability landscapes of previously unexplored ternary nitride spaces, identifying hundreds of promising new ternary nitrides for further exploratory synthesis. Next, we use unsupervised machine-learning algorithms to cluster together cations with a similar propensity to form stable or metastable ternary nitrides. To visualize these clustered nitride families, we construct a large and comprehensive stability map of the inorganic ternary metal nitrides. Our map reveals broad overarching relationships between nitride chemistry and thermodynamic stability, and inspires us to rationalize these trends from their underlying chemical origins. To do so, we extract from the DFT-computed electron density the mixed metallicity, ionicity, and covalency of solid-state bonding, which we use to formulate data-driven insights into the thermochemical and electronic origins of ternary nitride stability.
 W. Sun et al., “A Map of the Inorganic Ternary Metal Nitrides,”Nature Materials (2019)
11:15 AM - MT02.10.08
Combinatorial Synthesis and High-Throughput Characterization of Microstructure and Phase Transformation in NiTiCu-X Quaternary Thin-Film Libraries for Elastocaloric Cooling
Naila Al Hasan1,Huilong Hou1,Jonathan Counsell2,Tieren Gao1,Suchismita Sarkar3,Sigurd Thienhaus4,Apurva Mehta3,Alfred Ludwig4,Ichiro Takeuchi1
University of Maryland1,Kratos Analytical Ltd.2,SLAC National Accelerator Laboratory3,Ruhr-Universität Bochum4Show Abstract
Ni-Ti based shape memory alloys (SMAs) have found widespread use in the last 70 years but improving their functional stability remains a key quest for more robust and advanced applications. Named as such due to their ability to retain their processed shape as a result of a reversible martensitic transformation, they are highly sensitive to compositional variations. Alloying with ternary and quaternary elements to fine tune the lattice parameters and the thermal hysteresis of an SMA, therefore, becomes a challenge in materials exploration. Combinatorial materials science allows streamlining of the synthesis process and data management from multiple high throughout characterization techniques. In this study, composition spreads of Ni-Ti-Cu-X (X = V, Fe, Co) thin film libraries were synthesized by magnetron co-sputtering on thermally oxidized Si wafers. Composition dependent phase transformation temperature and microstructure were investigated and determined using high throughput wavelength dispersive spectroscopy, synchrotron x-ray diffraction, and temperature-dependent resistance measurements. Composition-structure-property phase maps for the quaternary systems are used to discuss correlations of functional properties with respect to local microstructure and composition of the thin film libraries. This work was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE 1322106.
11:30 AM - MT02.10.09
Discovery of Promising Salt Hydrates for Thermal Energy Storage Using High Throughput Computation and Machine Learning
Steven Kiyabu1,Donald Siegel1
University of Michigan1Show Abstract
Salt hydrates demonstrate promise as heat storage materials as they possess high energy densities and reversibility at moderate temperatures. Despite their promise, a great number of salt hydrate compositions have not been explored. The goal of this work is to identify new salt hydrates that can outperform known materials in terms of energy density and are predicted to be thermodynamically stable. A total of 5,292 hypothetical salt hydrates were generated by systematically substituting cations and halides into 76 salt hydrate crystal structures mined from the Inorganic Crystal Structure Database. These hypothetical hydrates were characterized according to their enthalpy of dehydration (from which stability, energy density, and heat-storing temperature are derived) in one of two ways. 2,954 of these hydrates were characterized using high-throughput density functional theory calculations. A machine learning model with extensive feature selection was then developed and trained on these hydrates. The initial set of structural and ionic features was expanded to a set of several thousand physically meaningful combinations of the original features. A two-step feature selection process using LASSO and a genetic algorithm was then employed before the final model was trained. The remaining 2,338 hypothetical salt hydrates were then characterized using this predictive model. Several promising hypothetical hydrates were identified with higher energy densities than experimentally-known salt hydrates. Furthermore, the machine learning model illuminated several new property-performance relationships in salt hydrates.
11:45 AM - MT02.10.10
Accelerating the Search for Lithium-Ion Conductors with Machine Learning Interatomic Potentials
Koutarou Aoyagi1,2,Chuhong Wang1,Tim Mueller1
Johns Hopkins University1,Toyota Motor Corporation2Show Abstract
All-solid-state lithium-ion batteries are leading candidates for the next generation of batteries, but interfacial resistance needs to be improved for commercialization. Interfacial impedance can be improved through the use of cathode coating materials between the active electrode material and the solid state electrolyte. As experimentally exploring candidate coating materials is time-consuming and resource-intensive, the development of new coating materials can be accelerated by computationally screening candidate materials in a high-throughput manner. One of the most important criteria for coating materials is lithium-ion conductivity. As the mechanism for lithium-ion conduction is generally not known for a candidate coating material, computational assessment of lithium-ion conductivity is best accomplished through molecular dynamics simulations. Sufficiently accurate simulations can be performed using density functional theory, but the computational cost of this approach limits its use to materials that are fast ionic conductors. We demonstrate that this problem can be addressed through the use of machine learning to develop interatomic potentials on the fly. We present a method for using moment tensor potentials, a recently developed class of machine learning interatomic potentials , for automatically calculating lithium-ion conductivity with molecular dynamics simulations and demonstrate that this approach results in improved agreement with experiments over ab-initio molecular dynamics while requiring less computing time.
 A. V. Shapeev, Multiscale Model. Simul. 2016, 14, 1153.
MT02.11: Experimentation and Machine Learning I
Thursday PM, December 05, 2019
Hynes, Level 2, Room 210
1:30 PM - MT02.11.01
Active Machine Learning for Automating Materials Discovery
Shali Jiang1,Roman Garnett1
Washington University in St. Louis1Show Abstract
In materials science, conducting synthesis and characterization of a proposed material can be extremely expensive, rendering searches for novel materials with a desired property fundamentally difficult. In such situations, it is critical that we allocate limited resources effectively. In "active" machine learning, we consider how to collect the most-useful data to achieve our goals effectively with a limited experimental budget. We will introduce active machine learning and discuss a particular important setting: "active search," where we seek to discover rare, valuable points from a large space of alternatives -- this serves as a simple mathematical model of scientific discovery. We will discuss the surprising difficulty of this problem and introduce efficient, nonmyopic polices to solve it, demonstrating our method on large-scale materials and drug discovery experiments. Nonmyopic active search increases search efficiency dramatically across a wide range of settings, suggesting it may be a promising approach to automating discovery pipelines.
2:00 PM - MT02.11.02
Pan-Sharpening Algorithm for Spectral Map Reconstruction
Nikolay Borodinov1,Natasha Bilkey2,Alison Pawlicki1,Marcus Foston2,Anton Ievlev1,Alex Belianinov1,Stephen Jesse1,Rama Vasudevan1,Sergei Kalinin1,Olga Ovchinnikova1
Oak Ridge National Laboratory1,Washington University in St. Louis2Show Abstract
Recent advances in chemical imaging allow material composition characterization at the nanoscale. Such methods (optical spectroscopy, secondary ion spectrometry, mass spectrometry to name a few) vary in data acquisition, sample preparation, and spatial resolution; and thus offer common yet nonidentical applicability for different samples .
Coupling atomic force microscopy with infrared spectroscopy is a recent addition to the chemical imaging toolkit. A pulsed infrared laser triggers periodic thermal expansion of the sample, which is then detected by an AFM tip. Recent advances in AFM-IR allow imaging of polymer blends and nanocomposites, biological tissue, and ion migration; which makes it a highly relevant technique . Currently there are two operational modes: an acquisition of a single wavenumber map, which has full AFM resolution, and an acquisition of a sparse arrays at full spectral resolution. A direct attempt to measure full spectral response at each point of an AFM scan would be prohibitively long, and very likely to be disturbed by sample drift. In order to yield a full resolution dataset which could be used for correlative analysis, the two types of data – a spectral array and a set of single wavenumber maps, have to be combined in a physically meaningful way. Methods for combining multiple data channels with different spectral and spatial resolution have been intensively explored and overall are commonly referred to as pan-sharpening. One of these approaches, which is very suitable for the case of AFM-IR, relies on coupled non-negative matrix factorization (CNMF) of the data.
We demonstrate the applicability of CNMF-PS algorithm for synthetic data, experimental AFM-IR with known ground truth and an experimental AFM-IR with unknown ground truth. We analyze the algorithm parameters (downsampling rate, number of NMF components, number of single wavenumber maps used) on the quality metrics of the PS operation. We use our method for the analysis of plant cell wall materials and derive the correlation between the spatial distributions of chemically dissimilar components provided by CNMF-PS and local physical property (mechanical stiffness measured as the shift in AFM tip contact resonance). In addition, we use this method for the chemical characterization of tribofilms which are easily destroyed by exposure to the IR. These examples highlights the utility of PS algorithm for in-depth nanoscale characterization of various materials. This work can be readily adopted by other chemical imaging techniques generating spectral images.
 A Belianinov et al, ACS Nano 12 (12) (2018), pp 11798–11818
 A Dazzi and C B Prater, Chem. Rev. 117 (7) (2017), pp 5146–5173
 The authors acknowledge the Center for Nanophase Materials Sciences, which is a US DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725, (N.B., A.P., A.V.I, A.B., S.V.K, O.S.O). A portion of algorithm development was a part of the AI Initiative, sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory (S.J., R.K.V), managed by UT-Battelle, LLC, for the U.S. Department of Energy (DOE). The plant sciences portion of this work was supported by the Center for Engineering MechanoBiology (CEMB), an NSF Science and Technology Center, under grant agreement CMMI: 15-48571 (N.B. and M.F.).
2:15 PM - MT02.11.03
Accelerated Catalyst Discovery through Gaussian Processes and Active Learning
Kiran Vaddi1,Olga Wodo1,Krishna Rajan1
University at Buffalo1Show Abstract
We introduce an active learning procedure to query and uncover highly catalytic regions within a design space. We have identified key challenges related to the high dimensional responses of cyclic voltammetry experiments and derived a Bayesian Model Selection (BMS) method to efficiently label the responses and to guide the active learning within the identified design space. Applying the proposed methodology on a simulated kinetic zone diagram as the design space, we show that actively learning within the design space can improve the design of experiments for catalyst discovery and characterization. We have reduced the total number of experiments required to discover the high catalytic zones in the design space by an order of magnitude, thereby significantly accelerating design and minimizing experimental requirements. Using an active batch search, we derive a mechanism to actively learn highly catalytic zones in a high-throughput combinatorial search for catalyst.
2:30 PM - MT02.11.04
Insights in Chemical Features Impacting the Quality and Lifecycle of 3D Printed Model System—An Integrated Experimental, Modeling and Data Sciences Approach
Amra Peles1,William Rosenthal1,Francesca C Grogan1,Yulan Li1,Erin I Barker1,Zachary C Kennedy1,Timothy Pope1,Christopher Barrett1,Marvin Warner1
Pacific North West Laboratory1Show Abstract
Additive manufacturing is causing fundamental changes in the way complex 3D objects are produced from their digital designs. It is well known that product quality, ease of reproducibility and often open-source nature of digital model design possess pressing concern about utilizing this fast-developing technology to produce safety and security critical parts. Our ability to understand additive manufacturing process from conceptual design to the life-long printed part performance and to uncover critical features that are the best indicators of future failure of the part is critical. Here the elements forming a part of the holistic integrated approach are presented, which are also the backbone for the materials discovery framework.
We employ the polymer nylon 12 as the feedstock to print the specimens for the mechanical testing and as a model system. We report on initial polymer feedstock characterization for the details of compositional chemistry and powder-specific qualities; the mathematical model for powder geometry and material property representation; the implementation of phase-field models to study sintering and aging properties of selective laser sintering (SLS) process; few-shot learning strategy to train and characterize micro-CT images; and coupled to the uncertainty quantification an implementation of long short term memory (LSTM) algorithm to optimize error back flow in time dependent aging study. The insights into sensitivity analysis and the adaptive inference modeling to guide the design of build experiments, aging of printed specimens and set of static and dynamic mechanical tests will be presented. The micro-CT studies are used to look at structural features and their changes due to variable processing and aging of printed specimens. The experimental micro-CT images and details of experimentally aged specimens serve as the ground truth for the structural and aging study features from phase-field model.
3:15 PM - MT02.11.05
The Machine Learning Route to Accelerated Discovery and Inverse Design
University of New York, Buffalo1Show Abstract
The use of modern machine learning, informatics, and data mining approaches is a relatively new development in the chemical and materials domain. These techniques have been exceedingly successful in other application fields, and since there is no fundamental reason why they should not have a similarly transformative impact on chemical and materials research, there is now a concerted effort by the community to introduce data science in this new context. They hold tremendous promise for the practical realization of accelerated discovery and inverse design. However, adapting techniques from other application domains for the study of chemical and materials systems requires a substantial rethinking and redevelopment of the existing methods.
In this presentation, we will discuss our work on designing advanced, physics-infused neural network architectures, the fusion of unsupervised clustering with supervised regression for local ensemble models, active and transfer learning techniques, bootstrapping approaches to minimize our training data footprint, methods to increase the applicability domain of data-derived models, and automated hyperparameter optimization.
3:45 PM - MT02.11.06
Data-Driven Inquiry into Materials Synthesis
Elsa Olivetti1,Edward Kim1,Alexander van Grootel1,Zach Jensen1
Massachusetts Institute of Technology1Show Abstract
Predictive materials modeling can provide properties of real and virtual compounds and will be available on demand, thereby enabling rapid iteration time in materials design. However, the allure and necessity of accelerated discovery that motivates computational materials design is diminished by the prevalent heuristic approaches to materials synthesis and optimization. This presentation will outline our collaborative work to extract information from peer reviewed academic literature across a range of materials science texts. We have demonstrated not only the potential of the natural language processing (NLP) approach to assemble materials data from the literature, but we have also shown that one can develop hypotheses for what synthesis conditions drive a particular target material outcome using learning approaches. The presentation will describe application of NLP and machine learning to a few cases of materials discovery as well as issues with current and historical writing conventions in materials science literature to propose a structured way to facilitate reproducibility, clarity, and machine readability.
4:15 PM - MT02.11.07
Chemical Dynamics Analysis Pipeline at PNNL
Mathew Thomas1,Malachi Schram1,Jan Strube1,Robert Rallo1,Christopher Barrett1,Kevin Fox1,Noah Oblath1
Pacific Northwest National Laboratory1Show Abstract
We present a computing and data science effort at PNNL for an end-to-end analysis pipeline for chemical dynamics studies using High-Performance Computing (HPC) resources. The current computing model uses DIRAC (Distributed Infrastructure with Remote Agent Control) for its workflow and data management. A detailed meta-data assignment using the DIRAC File-Catalog is used to automate the stages of data processing. The DIRAC system is deployed on containers managed using a Kubernetes cluster to provide a scalable infrastructure. A modified DIRAC agent provides the ability to submit jobs using singularity on dedicated and opportunistic HPC sites. The data products from this pipeline are feed into a graphics processing unit (GPU) cluster that runs various Machine Learning (ML) tasks, such as 3D convolution networks and/or physics aware temporal models.
4:30 PM - MT02.11.08
AI Driven Microstructure Exploration—The Case of Organic Electronics
Baskar Ganapathysubramanian1,Balaji Sesha Sarath Pokuri1,Sambuddha Ghosal1,Prerna Ritesh1,Soumik Sarkar1
Iowa State University of Science and Technology1Show Abstract
The performance of most organic electronics is critically dependant on the bulk morphology. Subsequently it becomes important to identify distributions of morphologies that have high performance. While computational studies and high-throughput analysis promise drastic improvement of time to delivery, they are nonetheless limited in several aspects, primarily two-fold — (a) the size of morphology space, even for simplistic descriptions (2D, binary) is practically unlimited and (b) the resources required for quantification are still limited (limited physics models) and time consuming. In this work, we address the first issue of efficiently exploring the practically infinite morphology space using state-of-the-art machine learning and artificial intelligence algorithms. More specifically, we take a completely data-driven approach to generate morphologies that show high performance. We used a combination of invariance obeying generative adversarial networks along with robust and interpretable CNN based structure-property map to generate extremely high performing morphologies. The generated morphologies consistently showed high performance and simultaneously sampled remote regions of the morphology space that were previously not reported. Finally, we quantified the generated morphologies against full physics simulations to find that the novel AI approach predicts high performing morphologies in most of the cases.
4:45 PM - MT02.11.09
Practical Implementation of Materials Informatics for Discovery of Superionic Conductors
Ryoji Asahi1,Nobuko Ohba1,Masato Matsubara1,Akitoshi Suzumura1,Shin Tajima1,Yumi Masuoka1,Joohwi Lee1,Seiji Kajita1
Toyota Central R&D Labs., Inc.1Show Abstract
Materials informatics (MI) has attracted a great deal of interest over the past decade in efforts to accelerate materials discovery. Virtual screening is a frequently employed MI procedure to link a target property to an explanatory feature of materials, i.e., a descriptor. The model is then applied to predict and rank the target property in materials registered in the database. However, a major bottleneck of the MI model is an insufficient amount of supervised data. To overcome this issue, we took three approaches, namely, high-throughput (HT) computations for an extension of the database,1,2 a newly developed ensemble-scope descriptor for the virtual screening with a limited supervised data,3 and a HT experiment that facilitates efficient materials discovery within a certain chemical search space MI suggested.4 In this presentation, we demonstrate applications of the method to an exploration of oxygen ion conductors, aiming to find new materials superior to the conventional high-temperature conductor, yttria-stabilized zirconia (YSZ), which is used for solid oxide fuel cells.
The ensemble-scope descriptor includes the physical and chemical knowledge, and short- and long-range representations of the crystal structure. This multifaceted feature augments information acquired from a relatively small number of supervised data to improve the prediction power; in addition, it provides physical insights in the prediction, which facilitate to get new knowledge and extend a search space. Given only 29 supervised data, we successfully discovered more than 5 compounds, such as EuKGe2O6 and Ca3Fe2Ge3O12, which were verified by experiment. The MI prediction was then used to focus a chemical search space where the HT experiment performed. We implemented HT combinatorial synthesis, HT-XRD measurement, and HT conductivity measurement. The application of the method to oxygen ion conductors led to the discovery of materials in the Ca-(Nb,Ta)-Bi-O system that exhibited high conductivity and durability.
 Jinnouchi, Asahi, J. Phys. Chem. Lett. 8, 4279 (2017).
 Kajita, Ohba, Jinnouchi, Asahi, Sci. Rep. 7, 16991 (2017).
 Lee, Ohba, Asahi, RSC Advances 8, 25534 (2018).
 Matsubara, Suzumura, Tajima, Asahi, ACS Comb. Sci. 21, 400 (2019).
MT02.12: Poster Session IV: Experimentation and Machine Learning
Friday AM, December 06, 2019
Hynes, Level 1, Hall B
8:00 PM - MT02.12.01
Effect of Dielectric Particle Heterogeneity on Capacitance—A Machine Learning Biased Genetic Algorithm Approach
Venkatesh Meenakshisundaram1,2,David Yoo1,2,Andrew Gillman1,Phil Buskohl1
Air Force Research Laboratory1,UES, Inc.2Show Abstract
Microscale spatial and material heterogeneities in 3D printed electrical devices present significant challenges to predictable electrical performance and device reliability. Dielectric particles are often added to dielectric inks to tailor the macro level permittivity of printed dielectric substrates and coatings. In these inks, the combined role of particle morphology, discrete spatial arrangement and material properties on variance is difficult to distinguish experimentally and hence poorly understood. This is primarily due to the large parameter space of processing variables as well as electrical sensitivity to local heterogeneities. We address this challenge by combining a finite element capacitor model with a neural network biased genetic algorithm (NBGA) to optimize the volume fraction, particle size and permittivity distributions of dielectric particles to identify systems with high capacitance variance. Comparison of the optimized system to an equivalent system with monodisperse particles revealed that heterogeneity in particles could be key to achieving larger variance that is not available to system with monodisperse particles. A closer look at the optimized system revealed that variance in distance between particles and the electrodes was larger than the monodisperse system. This can be attributed to larger packing configurations accessible to the system with polydisperse particles as compared to system with monodisperse particles. Classification-based machine learning techniques were also applied to the NBGA-created database to extract correlations between the spatial/material distributions of the dielectric particles and the capacitance variance. Collectively, the study provides a useful framework to correlate electrical performance with both macro- and microstructural variation sources, which is key to accelerating the development of 3D printing materials.
8:00 PM - MT02.12.02
Machine Learning Based Data Driven Approach for Optimized Inkjet Printed Electronics
Fahmida Pervin Brishty1,Ruth Urner1,Gerd Grau1
York University1Show Abstract
Machine Learning (ML) has not been explored extensively to optimize printed electronics manufacturing. As a predictive methodology, it has the potential to efficiently minimize printing configuration workload. Inkjet printing is a promising method of additive manufacturing due to its attractive attributes including low-cost, scalability, non-contact printing and microscale on-demand customization. It generates droplets of electronic materials with a piezo-electrically actuated dispenser controlled through voltage pulse and timing parameters. This will enable novel electronics development on flexible plastic and paper substrates such as wearable sensors, RFID tags or flexible displays in a drop-on-demand (DOD) way. A major challenge in inkjet printing is the rapid optimization of stable jetting conditions. Several problems can occur: no ejection, perturbation, satellite drop, multiple drops, drop breaking, nozzle clogging etc. These non-idealities can lead to significant uncertainty in the behavior of mass-produced electronic devices and circuits. Steady drop speed and volume generation requires numerous time and material consuming trial and error experiments. Theoretical modeling and prediction is limited and difficult due to the complexity of the printing process. Here, an intelligent ML algorithm is demonstrated to forecast the jetting window based on machine and material properties.
Optimal inkjet system parameters vary from material to material and printer to printer, so stable jetting is interpreted here as a data-driven optimization problem. A major challenge was the lack of an established dataset in this context. Data was extracted from academic papers as well as experimentally collected in our lab. Printers from different manufacturers and different inks (pure solvents and nanoparticle inks) were included in the dataset. The input factors were printer-dependent (printer make, nozzle size), material-dependent (density, viscosity, surface tension) and printing control parameters (dwell time, echo time, rise time, fall time, jetting frequency, dwell voltage, echo voltage and waveform shape). The measured outputs were drop velocity and volume.
The merged dataset has 13 features out of which the 9 most important features were identified during a first stage of exploratory data analysis. A detailed analysis was then performed to compare various (linear and non-linear) regression models with the goal to identify a type of model with high predictive capacities while at the same time allowing for interpretation of the underlying implied dependencies of the involved features. The models were trained on 80% of the data and the mean absolute error was calculated on 20% test data. Simple linear relationship consideration between the input and output features did not yield accurate predictions. Instead, small ensembles of decision trees (boosted decision trees and random forests) were explored further to estimate jet drop velocity, and volume. The models were applied to an experimentally collected data set with a material that was not included in the training set. The learned regression model predicted drop velocity with test root mean square error (RMSE) of 0.53 m/s. Drop volume was forecasted for the same dataset with RMSE of 14.88 pico-liter.
In conclusion, we demonstrated that employing machine learning for drop behavior prediction can be used for forecasting new fluid drop formation which were not available at training time. This will enable more efficient materials selection as well as tuning of printing parameters. It has the potential to considerably speed up the development of novel materials and inks for printed electronics by eliminating extensive jetting experiments that are costly in terms of money, time and material.
8:00 PM - MT02.12.03
Scientific Data Infrastructure for Combinatorial Material Science
Lars Banko1,Sigurd Thienhaus1,Alfred Ludwig1
Ruhr-Universität Bochum1Show Abstract
Data mining by statistical/machine learning methods is an emerging topic in material science. Advanced algorithms are able to find patterns in large datasets beyond human capabilities. Additionally, these techniques can accelerate the analyses of complex data. Combinatorial material science generates large, comparable data sets of materials libraries that are designated for data mining applications. Aggregation of those data sets within a research group or even within a certain scientific community provides the opportunity to generate knowledge based on non-trivial correlations. The basis for this approach is a solid data management which ensures a high degree of reusability by appropriate data curation. Here, we demonstrate our recent achievements in the development of a customized scientific data infrastructure. The solution consists of a commercially available, customizable document management system, a terminal server-based IT infrastructure and in-house developed software tools. The main purpose of this data infrastructure is to track all data and information about a materials library throughout the whole sample lifecycle, from experimental planning and synthesis over processing to characterization and analyses. It is demonstrated that standardization of data acquisition, pre-processing and storage promote time efficient, machine assisted data analyses. The use of terminal servers guaranties access from various devices (computers, tablets, smartphones) and operating systems (Windows, Linux, OS X, iOS, android etc.) and improves data security at the same time. Maintenance is reduced by remote applications which are easy to deploy and update. An additional benefit is the structured storage of knowledge which counteracts fast personnel cycles in university research.
8:00 PM - MT02.12.04
Machine Learning Prediction of Glass-Forming Ability and Elastic Modulus for Bulk Metallic Glasses
Jie Xiong1,San Qiang Shi1,Tong-Yi Zhang2
The Hong Kong Polytechnic University1,Shanghai University2Show Abstract
There is a genuine need to shorten the development period for new materials with desired properties. Bulk metallic glasses (BMGs) are a unique class of materials that are gaining attention in a wide variety of applications due to their attractive physical properties. One limitation to the wide-scale use of these materials is the lack of predictable tools for understanding the relationship between alloy composition and ideal properties. In this work, machine learning (ML) approach was applied on a dataset of 6312 alloys. The resulting ML model predicted the glass forming ability and elastic moduli of unseen alloys in good agreement with most experimentally measured values. It will promote the development of basic theories of metallic glasses to reveal the intrinsic correlation of physical properties through material big-data mining. This work indicates the great potential of ML in the design of advanced materials with target properties.
8:00 PM - MT02.12.05
Accelerated Development of High-Performance Nanocomposite Solar Absorbers Using Bayesian Optimization
TieJun Zhang1,Qiangshun Guan1,Afra Alketbi1,Aikifa Raza1
Khalifa University of Science and Technology1Show Abstract
Machine learning-based approach is desired for accelerating materials design, development and discovery, especially when it is coupled with high-throughput experiments and simulations. In this work, we propose to apply the Bayesian optimization to design ultrathin multilayer W-SiC nanocomposite absorbers for high-temperature solar power generation. The optical properties of nanocomposite depends on the filling factors as predicted by the effective medium theory. The spectrally averaged solar absorptance of various absorber designs is evaluated with a semi-analytical scattering matrix method. The design of spectrally selective nanocomposite absorber is optimized over a range of filling factors and layer thicknesses to maximize the overall solar absorptance. Our nanofabrication and experimental characterization results demonstrate the capability of the proposed close-loop approach for solar energy materials development. Comparison with other global optimization methods (Random Search, Simulated Annealing and Genetic Algorithm) shows that the Bayesian optimization can expedite the design of multilayer nanocomposite absorbers and hence reduce their development cost significantly. This work sheds light on the high-throughput discovery of materials for solar energy and sustainability applications.
8:00 PM - MT02.12.06
Recommender System of Processing Conditions for Inorganic Compounds Based on a Parallel Experimental Dataset
Hiroyuki Hayashi1,2,3,Atsuto Seko1,2,3,Isao Tanaka1,3
Kyoto University1,PRESTO2,National Institute for Materials Science3Show Abstract
Studies on high throughput material-screening through calculations based on density functional theory and/or machine-learning methods have increased rapidly in the last decade. However, data-centric approaches for successful processing conditions are still in the very early stage. Little works have yet been reported except for literature-data based approaches [1,2]. In this study, we propose a machine-learning approaches for successful processing conditions for inorganic compounds based on parallel experiments . Initially, an experimental database was constructed for 67 pseudobinary oxides registered in the Inorganic Crystal Structure Database (ICSD)  by parallel experiments using 23 starting materials and 23 cation mixing ratios. Precursor powders were obtained by four synthesis methods (solid-state reaction, polymerized complex, cyclic ether sol-gel, and spray co-precipitation), which were fired at five different temperatures. This resulted in 1,648 unique chemical synthesis conditions and database entries. The reactants were characterized sequentially using powder X-ray diffraction equipment with an automatic sample exchanger. The synthesis results were rated as a score, which was placed into a fifth-order tensor with 243,340 elements. The Tucker decomposition method was used to predicted yet-to-be-rated scores for unexperimented processing conditions. Good predictive performance of the present model was demonstrated by leave-one-experimental-composition-out cross validation. It was further evaluated by examining the presence of highly rated compositions in another database, ICDD-PDF (International Center for Diffraction Data-Powder Diffraction File) . The prediction performance was about twice when the compositions are chosen randomly and was as good as the existing method of the composition-recommender system . Moreover, the present method can recommend not only the compositions that are likely to exist stably but also successful synthesis conditions of them at the same time. The chemical similarities regarding the chemical processing conditions evaluated through the tensor decomposition seem to be consistent with our heuristics. Superiority of the recommender system with the synthesis data to accelerate the discovery of as-of-yet-unknown compounds can be demonstrated.
 E. Kim et al., npj Comput. Mater. 3, 53 (2017).
 E. Kim et al., Chem. Mater. 29, 9436 (2017).
 H. Hayashi et al., submitted.
 G. Bergerhoff and I. Brown. In Crystallographic Databases, International Union of Crys-tallography: Chester, U.K., 1987.
 PDF-4+ 2019; ICDD: Newtown Square, PA, 2018 (accessed Apr 20, 2019).
 A. Seko et al., Phys. Rev. Mater. 2, 013805 (2018).
8:00 PM - MT02.12.07
Designing Stretchable MoS2 Kirigami Using Deep Reinforcement Learning
Pankaj Rajak1,Beibei Wang2,Ken-ichi Nomura2,Aiichiro Nakano2,Rajiv Kalia2,Priya Vashishta2
Argonne National Laboratory1,University of Southern California2Show Abstract
Mechanical properties of 2-D materials such as MoS2, MoSe2, WS2 and WSe2 can be tuned by the ancient art of kirigami. Experiments and atomistic simulations show that these 2-D materials can be stretched more than 50% by strategic insertion of cuts. However, designing kirigami structures with desired mechanical and thermal properties is highly sensitive to the pattern and location of kirigami cuts on a flat sheet. Furthermore, the search space of kirigami design increases exponentially with an increase in the system size. We have used a reinforcement learning (RL) model, which after training can generate a wide range of MoS2 kirigami structures with high stretchability. Our model consists of an RL agent, whose goal is to design kirigami cut patterns, and a convolution neural network-based reward model. The latter is trained using molecular dynamics simulation data, which gives no reward to patterns generated by the RL agent if the stretchability is less than 30%. The remaining structures are given rewards proportional to their stretchability. The RL model is trained for a MoS2 kirigami structure state space of 4,826,809 candidates, and after training the model is able to synthesize structures with more than 50% stretchability from a space consisting of up to 6 cuts.
This work was supported as part of the Computational Materials Sciences Program funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, under award number DE-SC0014607. The simulations were performed at the Argonne Leadership Computing Facility under the DOE INCITE and Aurora ESP programs and at the Center for High Performance Computing of the University of Southern California
8:00 PM - MT02.12.08
Materials Discovery by Machine Learning and Single Particle Diagnosis
Yukinori Koyama1,Atsuto Seko2,1,Isao Tanaka2,1,Shiro Funahashi1,Naoto Hirosaki1
National Institute for Materials Science1,Kyoto University2Show Abstract
Discovery of new materials often leads to breakthroughs in a variety of applications, and thus exploration of new materials is a major subject in materials research. Space of chemical compositions is too wide to search for novel compounds without convincing guidelines. Therefore, it is important to establish a reasonable methodology for materials exploration. In this study, we propose an approach of materials exploration using a machine-learning technique to estimate relevance of chemical compositions followed by single-particle-diagnosis experiments. We also demonstrate discovery of novel nitrides by this approach.
First, we developed a machine-learning model of a descriptor-based recommender system to estimate relevance of chemical compositions, i.e. whether the compositions form stable structures or not. Training dataset includes chemical compositions registered in the Inorganic Crystal Structure Database (ICSD) as positive cases, and compositions not registered in the ICSD as negative cases. A set of descriptors composed of means, standard deviations, and covariances of elemental representations are used to define similarity of chemical compositions. Random forest classifier is used to estimate probabilities of chemical compositions as the positive cases. The expectant probabilities can be used as indicators of the relevance.
After screening by the machine-learning model, we tried to synthesize compounds of the relevant chemical compositions. Then, we analyzed crystal structures of the obtained samples by the single-particle-diagnosis approach, as follows. The samples are first analyze by a conventional powder X-ray diffraction (XRD). If XRD patterns are not fitted well by known crystals, the samples might contain unknown structures. Then, particles of about several micrometers in size, which are like single crystals, are picked up. Single-crystal XRD data of the individual particles are collected and their crystal structures are refined.
The proposed approach is efficient for materials exploration. We have found novel nitrides having new crystal structures.
8:00 PM - MT02.12.09
Predicting Carbon Nanotube Forest Synthesis-Structure-Property Relationships Using Physics-Based Simulation and Deep Learning
Taher Hajilounezhad1,Zakariya Oraibi1,Ramakrishna Surya2,Filiz Bunyak1,Kannappan Palaniappan1,Prasad Calyam1,Matthew R. Maschmann1
University of Missouri-Columbia1,University of Cincinnati2Show Abstract
Carbon nanotubes (CNTs) are promising candidate materials for numerous advanced applications due to their significant mechanical, thermal, optical and electrical properties. CNT forests represent population of CNTs that self-assemble into arrays, with CNTs oriented perpendicular to their growth substrate. Unfortunately, a remarkable performance degradation is observed when individual CNTs self-assembled into CNT forests. Structural disorder plays a major role in this performance gap. To date, there is a lack of understanding about how numerous CNT synthesis processing parameters influence structural disorder and the relationship of CNT structural morphology to CNT forest properties. Further, physical experiments can sample only a limited number of CNT forest growth parameter combinations because of time and financial restrictions. A comprehensive and multidimensional exploration of all synthesis growth parameters is impractical.
The recent advances in high-performance computing, parallel computing and artificial intelligence has opened a new door to employ simulation modeling and computational approaches to prevent costly and frequently biased interpretation of physical experiments. The integration of artificial intelligence and high-throughput physical experiments has given rise to autonomous materials research systems that can direct smart material synthesis experiments via reinforcement learning algorithms. This approach may vastly accelerate material discovery, understanding, and adoption into various industrial applications. We foresee that numerical simulations that compliment physical experiments will provide a vital new data pipeline to train deep learning (DL) algorithms for such systems moving forward. By utilizing coupled physical and numerical experimental campaigns, autonomous research systems may operate faster and at a reduced cost.
In this work, we develop DL models that analyze CNT forest image data prepared by a timeresolved and physics-based numerical simulation. The DL models are used to predict both the CNT synthesis attributes (CNT diameter, CNT areal density) and the resultant mechanical stiffness based on the structural morphology of the CNT forest. The physics-based finite element simulation first models the growth and self-assembly of CNT forests based on stochastic CNT physical parameters and then models the compression of the resultant CNT forest. The simulated CNT forest structural morphology images train the DL models to predict synthesis parameters. Different machine learning algorithms and deep learning architectures are used to find underlying processstructure correlations. The trained model then predicts CNT synthesis attributes based on forest characteristics. Classification accuracies of up to 95.5% are achieved by applying deep convolutional neural networks. This study represents an early step to implement a highthroughput computational and experimental synthesis set-up that yields application-tailored CNT forests with prescribed properties.
8:00 PM - MT02.12.10
Automation of Electron Microscopy to Enable Atomic Datasets for Machine Learning
Matthew Hauwiller1,Abinash Kumar1,James LeBeau1
Massachusetts Institute of Technology1Show Abstract
Theory, synthesis, and characterization form the essential components of the materials informatics cycle. While significant emphasis has been placed on theory and synthesis, comparatively limit progress has been made in the area of materials characterization. Preliminary efforts to apply machine learning to characterization have focused largely on the backend data analysis, but capturing reproducible data in a statistically significant way remains a major challenge. Although electron microscopy can provide atomic-level structural measurements of materials to unravel structure-property relationships, there are some critical limitations to the current workflow that severely limit the current usage and future potential for materials informatics. The largely manual nature of the technique requires a significant amount of time to characterize an extremely small volume of material. This makes it difficult to collect large enough datasets that truly represent the material, especially when the material is inhomogenous or contains various defects. The human input in collecting images and spectroscopic data inherently contains both bias and random error that is undesirable for any comprehensive and systematic study.
In this presentation, we will present a core component to enable the autonomous electron microscope, the Universal Scripting Engine for Transmission Electron Microscopy (USETEM). We will show that this scripting engine is widely applicable and simplifies scripting to enable high-throughput atomic-level imaging of materials. The object-oriented code allows the microscope, detector, and any other components to be controlled through either a visual build tool (similar to LabView) or simple python scripts. We will discuss how the USETEM framework can make it easier to foster collaborations between microscopists at different universities, national laboratories, and industry as researchers develop creative ways to use their instruments to study materials. Machine learning for instrument control will also be discussed with insight into how it opens new opportunities in autonomous image acquisition. As a first step towards this vision, a deep convolutional neural network is demonstrated that can be used to automate convergent beam electron diffraction pattern analysis. The process enables, for example, autonomous determination of sample thickness to within 1 nm and tilt to within a fraction of a milliradian, at real-time speeds. Automating the electron microscope using artificial intelligence will address data size, bias, and documentation concerns, providing improved inputs for machine learning algorithms for faster discovery of emergent materials.
8:00 PM - MT02.12.11
Small-Data Driven Machine Learning Screening Framework for Accelerated Discovery of Ferroelectric Oxides
Achintha Ihalage1,Yang Hao1
Queen Mary University of London1Show Abstract
The application of machine-learning (ML) to accelerate materials discovery, synthesis, optimization and characterization has significantly reduced the time and resource consumption of first principal calculations or experimental measurements. Especially after the launch of programs such as Materials Genome Initiative and the recent popularity of data science, many open access materials databases have now reached “big data” status, resulting an exponentially growing trend in ML based materials discovery. However, some technologically important materials such as ferroelectrics are scarce in nature and therefore the available datasets do not reveal much information about these materials. Ferroelectric materials can be electrically, mechanically or thermally excited which makes them intriguing candidates for many device applications. The shortage of data has hindered the applicability of ML, or more specifically deep learning (DL), in discovering new ferroelectric materials. The use of "small data" to train a machine learning algorithm and yet achieving state-of-the-art results has been a popular research topic. Specifically, in the field of ferroelectrics and tunable materials where the data is very scarce, machine learning algorithms that perform well even with a small amount of data can be fairly effective and can make it as an effective platform for new material discovery. The available material datasets, either first principle calculated (MP, OQMD), experimental (ICSD, HTEM) or both (CSD, Citrination), despite being huge, however, lack of ferroelectric information.
Most of the regularly updated databases contain over 100,000 materials which makes it tedious to search and discover new ferroelectric materials manually. Another important point to note is that significant portion of first principle calculated materials available in the datasets have not yet been synthesized and their ferroelectric behaviour is not reported in the calculations. Hence, we adopt two machine learning methods for accelerated screening of large databases to find previously unreported ferroelectric oxides. The emphasis of this work is to use an extremely small dataset to train the two machine learning algorithms and later combine the results to discover new ferroelectric oxide materials.
In this work, we propose a combined ML – DL framework for accelerated discovery of ferroelectric oxide materials by learning from an extremely small dataset, thus breaking the shackle between ML and big data. The proposed classification algorithms are trained on a small literature-collected database and the classification is performed on the materials project (MP) database. By combining the results, we report 24 promising ferroelectric candidate materials along with their structure, band gap, chemical stability status and ferroelectric likelihood information. Our results suggest that the developed framework is able to identify ferroelectric materials at an accuracy of 89% and the results were mutually confirmed by the two algorithms. Materials are synthesized, processed and measured with results to be presented at the conference.
The authors acknowledge Engineering and Physical Sciences Research Council (EPSRC) for providing funding for Software Defined Materials for Dynamic Control of Electromagnetic Waves (ANIMATE) project (Grant No. EP/R035393/1). A.I. acknowledges IET AF Harvey Research Prize for funding the PhD studentship.
8:00 PM - MT02.12.12
Creating Glasswing Butterfly-Inspired Durable Antifogging Superomniphobic Supertransmissive, Superclear Nanostructured Glass through Bayesian Learning and Optimization
Sajad Haghanifar1,Paul Leu1
University of Pittsburgh1Show Abstract
The creation of durable superomniphobic surfaces with optical functionality has been extremely challenging. Major challenges have included low optical transmission, low optical clarity, lack of scalable fabrication, condensation failure, and inability to self-heal. Inspired by recent research on the transmission advantages of the random nanostructures on the glasswing butterfly, we report on a strategy to create self-healing, random re-entrant nanostructured glass with high liquid repellency and antifogging properties with supertransmission (99.5% at 550 nm wavelength for double-sided glass) and superclarity (haze under 0.1%). Our approach to creating these random nanostructures is to utilize a multiobjective learning and Bayesian optimization approach to guide the experiments of glass substrate fabrication. The surface demonstrates static water and ethylene glycol contact angles of 162.1 ± 2.0° and 155.2 ± 2.2°, respectively. The glass exhibits resistance to condensation or antifogging properties with an antifogging efficiency more than 90% and demonstrates the departure of water droplets smaller than 2 μm. The surface can restore liquid-repellency after physical damage through heating for 15 minutes. We envision that these surfaces will be useful in a variety of optical applications where self-cleaning, antifouling, and antifogging functionalities are important.
8:00 PM - MT02.12.13
Machine Learning and Optimization in Shape Memory Alloys Using a Large Experimental Database
William Trehern1,Ibrahim Karaman1
Texas A&M University1Show Abstract
Shape memory alloys (SMA) are martensitically transforming materials that exhibit interesting functional properties when undergoing transformation. One of the most well-known of SMA functional properties, the shape memory effect, can be observed when an SMA is deformed in the lower temperature martensitic phase and then heated to the austenitic phase, recovering the original shape and reversing the deformation strains. Upon cooling the material back to martensite, a twinned martensitic structure forms to accommodate for Bain strains that would otherwise change the shape of the sample. Another functional property, superelasticity, occurs when the SMA is deformed in the higher temperature austenitic phase. In this case, the austenite will transform to martensite through the applied strain, forming a detwinned martensite structure allowing for large amounts of deformation. Upon unloading, the material returns to the austenitic phase, recovering the deformation strains and returning to the pre-deformed shape.
Both of these functional properties are subject to many different variables (such as material composition, production method, forming processes, and heat treatments) that influence the property characteristics (transformation temperatures, transformation strain, irrecoverable strain, enthalpy of transformation, and fatigue life to name a few). The test parameters also have a great influence on these properties, thus should also be accounted for. In this study, more than 80 independent variables coupled with nearly 30 different response variables are used to find hidden correlations or patterns in 6,000 raw, real experimental data entries. Through exploitation of this database, important and highly influential variables were extracted and utilized in a prediction process for a targeted material property to use in an application. These predictions are then selected based on expected improvement and experimentally validated. I will discuss the methodology used for database development, necessary data cleaning steps, model development and feature extraction, model prediction, prediction selection using optimization, and experimental procedures for prediction validation. Assessment of related research and possibilities of this informatics approach in other systems will also be discussed.
8:00 PM - MT02.12.14
Discovery Paradigm for Novel Organic-Inorganic Halide Perovskites for Optoelectronic Applications through Automated Synthesis
Mahshid Ahmadi1,Katherine Higgins1,Maxim Ziatdinov2,Rama Vasudevan2,Sergei Kalinin2
University of Tennessee, Knoxville1,Oak Ridge National Laboratory2Show Abstract
Hybrid organic-inorganic perovskites (HOIPs) are rapidly emerging as one of the most fascinating materials for photovoltaic, light emission, lasing, and sensing applications1-4. In general, three-dimensional (3D) HOIPs adopt the typical perovskite crystal structure of ABX3, where A, B, and X denote monovalent organic or inorganic cations (e.g., CH6N+ (MA+), CH5N2+ (FA+), guanidinium (GA+), Cs+ and Rb+), divalent inorganic cations (Pb2+, Sn2+), and halide anions (I−, Br−, Cl−), respectively. These compounds are among more than one thousand perovskite-inspired candidate compounds that have been theoretically predicted during the last few years5, 6. However, despite extensive theoretical studies, only a small fraction of predicted compounds has been experimentally realized since synthesis of each new material involved complex and time-consuming optimization cycle for synthesis. In addition, optimizing these materials for specific applications requires careful balance between intrinsic properties such as bandgap, defect chemistry, charge transport and crystal structures that affects material microstructure, and poorly understood parameters such as chemical stability of surfaces and interfaces. In this presentation we will demonstrate the first results on the automated synthesis and characterization for the combinatorial libraries of HOIPs. Using automated laboratory synthesis, we demonstrate formation of the library of HOIP compositions. Next, an automated characterization tool with capability of UV-Vis absorption and photoluminescence (PL) spectroscopy is used to rapidly measure the band gap energy and PL properties across the composition library. Finally, machine learning applied to the optical properties allows rapid elucidation of property evolution along 2- and 3-dimensional phase fields. We further discuses application of Gaussian process optimization for evolutionary search in the high dimensional composition spaces and balance between exploration and exploitative searches targeting individual aspects of figure of merits and their combinations. The synergy of compositional, and optical properties such as band gap and photoluminescence spectroscopy developed here allows a comprehensive picture of the functionality evolution across the composition series and attempts to establish predictive relationships across the composition in these material systems.
1. Ahmadi, M.; Wu, T.; Hu, B., A Review on Organic–Inorganic Halide Perovskite Photodetectors: Device Engineering and Fundamental Physics. 2017, 29 (41), 1605242.
2. Lukosi, E.; Smith, T.; Tisdale, J.; Hamm, D.; Seal, C.; Hu, B.; Ahmadi, M., Methylammonium lead tribromide semiconductors: Ionizing radiation detection and electronic properties. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 2019, 927, 401-406.
3. Park, N.-G.; Grätzel, M.; Miyasaka, T.; Zhu, K.; Emery, K., Towards stable and commercially available perovskite solar cells. Nature Energy 2016, 1, 16152.
4. Zhao, B.; Bai, S.; Kim, V.; Lamboll, R.; Shivanna, R.; Auras, F.; Richter, J. M.; Yang, L.; Dai, L.; Alsari, M.; She, X.-J.; Liang, L.; Zhang, J.; Lilliu, S.; Gao, P.; Snaith, H. J.; Wang, J.; Greenham, N. C.; Friend, R. H.; Di, D., High-efficiency perovskite–polymer bulk heterostructure light-emitting diodes. Nature Photonics 2018, 12 (12), 783-789.
5. Zhao, X.-G.; Yang, D.; Ren, J.-C.; Sun, Y.; Xiao, Z.; Zhang, L., Rational Design of Halide Double Perovskites for Optoelectronic Applications. Joule 2018, 2 (9), 1662-1673.
6. Nakajima, T.; Sawada, K., Discovery of Pb-Free Perovskite Solar Cells via High-Throughput Simulation on the K Computer. The Journal of Physical Chemistry Letters 2017, 8 (19), 4826-4831.
Jason Hattrick-Simpers, National Institute of Standards and Technology
Barnabas Poczos, Carnegie Mellon University
Markus Reiher, ETH Zurich
Aleksandra Vojvodic, University of Pennsylvania
Machine Learning: Science and Technology | IOP Publishing
Matter & Patterns | Cell Press
MT02.13: Experimentation and Machine Learning II
Friday AM, December 06, 2019
Hynes, Level 2, Room 210
8:30 AM - MT02.13.01
ML-Aided Thermal Management Materials Design and Small Data Strategy
National Institute for Materials Science1Show Abstract
Materials informatics has been expected to accelerate the process of materials design and development. More and more research results have been reported recently on study of material property prediction and material design using materials data and machine learning method. We have developed some machine learning models to predict thermal properties and design new materials with ultra-high/low thermal conductivities. In this talk, our recent work on development of thermal insulating coating and high thermal conductive polymer will be introduced. These results demonstrate the effectiveness of machine learning, meanwhile show that data availability and quality are the key issues of this method. A “small data” strategy is proposed to decrease the amount of data needed in machine learning and to make the best use of existing data.
9:00 AM - MT02.13.02
Using Advanced Decision Policies in Bayesian-Optimized Machine Learning to Control Carbon Nanotube Growth
Benji Maruyama3,Rahul Rao1,Pavel Nikolaev1,Ahmad Islam1,Kristofer Reyes2
UES Inc Air Force Research Laboratory1,University at Buffalo, The State University of New York2,Air Force Research Laboratory3Show Abstract
Control over the properties (length, defect densities) and structure (diameter, semiconducting/metallic type) of carbon nanotubes (CNTs) is highly desirable for a number of applications. In this regard, we developed ARES (Autonomous Research System), which utilized a Random Forest algorithm to optimize carbon nanotube growth rate by evaluating in situ feedback from Raman spectroscopy during the growth process.1 Here we utilize Bayesian Optimization (BO) to control carbon nanotube diameters, which are critical for electronics applications. Previous implementations BO in materials development often use statistical models such as Gaussian Processes (GPs) to represent the experimental response function of interest. GPs make it difficult to specify fine-grained structure in the response function other through the specification kernel functions, which are often used to identify smoothness and periodicity of the function. In materials applications, however, other important structures of the response function must be considered. For example, the kinetics of materials synthesis can be parameterized through input variables such as temperature and gas flow rates, which affect certain kinetic processes over others. Hence, such a kinetic response is best modeled locally so that each local model captures only the relevant physics specific to input variables such as temperature. Here we demonstrate the use of local approximation belief models for to control CNT diameters. We show how to use such Bayesian beliefs inside the Knowledge Gradient (KG) decision policy to select information-rich experiments to run inside a closed-loop experimental system. Through this combination of local approximation belief models and the KG decision policy, we show how specific CNT properties such as diameter can be tuned and optimized over a small number of experiments.
9:15 AM - MT02.13.03
Predicting Solid-Solution Formation—Machine-Learning and a New Physics-Based Rule
Zongrui Pei1,Junqi Yin2,Jeffrey Hawk1,David Alman1,Michael Gao1
National Energy Technology Laboratory1,Oak Ridge National Laboratory2Show Abstract
There are various empirical rules proposed to predict the formation of single-phase solid solution, but they are based on very small datasets and of very limited predictability. In the present work, we perform a machine-learning (Gaussian Process Classification) study on a large dataset consisting of 1252 alloys, including binary and high-entropy alloys, and we achieved a success rate of 93% in predicting single-phase solid solution. More importantly, the present machine-learning results also identify the most important features, among which are the Molar Volume and Bulk Modulus. Inspired by this machine-learning insight, a new physics-based thermodynamic rule is constructed. The new rule is nonetheless slightly less accurate (73%) than the machine learning algorithm, but employs only the elemental properties, which is in line with the spirit of Hume-Rothery rules. Therefore, it has advantages of simplicity and efficiency that render it very useful for high throughput application.
9:30 AM - MT02.13.04
freud—Powerful Particle Simulation Analysis Tools for Machine Learning and Materials Design
Bradley Dice1,Vyas Ramasubramani1,Eric Harper1,Matthew Spellings1,Joshua Anderson1,Sharon Glotzer1
University of Michigan1Show Abstract
Although particle simulations span a wide range of length and time scales, most well-established analysis tools are strongly focused on biomolecular simulations, and few tools exist for studying colloidal and coarse-grained material simulations. This presentation will showcase freud, a Python package that aids in calculating quantities that are frequently of interest in colloidal and nanoparticle simulations. We will discuss machine learning applications, including crystal structure identification in both supervised and unsupervised settings, which are enabled by the wide range of particle environment descriptors that freud implements. The freud library scales to extremely large systems, which has been crucial to studying complex phenomena such as the hexatic phase transition in systems of hard polygons that becomes most evident in simulations of over one million particles. It is also well-suited for building user-defined, computationally efficient analysis methods that can be adapted for analyzing new systems. We will show how the package’s efficiency and flexibility have enabled its use in such diverse applications as the inverse design of isotropic pair potentials and the optimization of shape-driven solid-solid phase transitions. Finally, we will demonstrate how output from freud can be coupled with visualization tools such as OVITO to render per-particle quantities in complex systems. Its flexibility allows freud to be used to engineer materials by design, including in cutting-edge applications such as analyzing simulations on-the-fly to study complex phenomena as they occur.
9:45 AM - MT02.13.05
Revealing the Spectrum of Unknown Layered Materials with Super-Human Predictive Abilities
Gowoon Cheon1,Ekin D. Cubuk2,Evan Antoniuk1,Joshua Goldberger3,Evan Reed1
Stanford University1,Google Brain2,The Ohio State University3Show Abstract
We use semi-supervised learning to discover over 1000 new two-dimensional layered materials that have yet to be discovered or synthesized. We accomplish this by combining physics with machine learning on experimentally obtained data and verify a subset of candidates using density functional theory. Our model accelerates the discovery of layered materials by 13 times compared to random trial-and-error approaches. Even compared to expert scientists working in the field of two-dimensional materials, it is five times better than practitioners in the field at identifying layered materials and is comparable or better than professional solid-state chemists. We also of course find that our model is orders of magnitude faster than any human.
To achieve super-human performance, we employ semi-supervised learning techniques for the first time in materials discovery. Semi-supervised learning utilizes unlabeled data in addition to labeled data, which is powerful in cases where labels are expensive to obtain or are noisy. We find that semi-supervised learning provides benefits over supervised learning in identifying layered materials. In the field of materials physics, labeled data can be scarce, such as rare materials known to possess certain properties; they can also be noisy, such as property measurements with large errors. Semi-supervised learning may be applicable to a wide range of problems in materials science.
10:30 AM - MT02.13.06
Microstructure Informatics—Expanding Descriptors from Molecular to Microstructural Level
State University of New York at Buffalo1Show Abstract
The holy grail of materials science is to discover the mechanism governing the material properties and describe them in terms of a small set of physically meaningful descriptors. The discovery and exploration of materials and their properties critically depend on the availability of easily computable descriptors. At the atomistic level, descriptors played a key role in the area of materials design via first-principle combinatorial methods for photovoltaic, battery, or catalytic materials. More than a dozen software is available to calculate descriptors at the electronic and atomistic level. Descriptors at the next scale – microstructure– are relatively less explored and consist of various application specific and disparate clusters of descriptors. Most frequently, descriptors are tailored for characterizing specific mechanisms. In this talk, we present our unified framework to compute a library of generic descriptors. We describe our microstructure representation that facilitates the descriptor calculation. Our representation is based on the graph and skeleton and enables microstructure characterization in terms of shape (i.e., morphology), geometry, and connectedness (i.e., topology). We explain how this work lays the foundation for machine learning of microstructure-property relationships and enables information fusion between multiple scales. We showcase our framework using examples from organic electronics.
11:00 AM - MT02.13.07
High-Throughput Electron Microscopic Analysis of Nanomaterials Based on Machine Learning Techniques
Byoungsang Lee1,Seokyoung Yoon2,Jung Heon Lee1,2
Sungkyunkwan University1,SKKU Advanced Institute of Nanotechnology (SAINT)2Show Abstract
Many important physical and chemical properties of nanomaterials, such as their optical, electronic, and catalytic properties, are strongly influenced by their morphological characteristics such as their shape, structure, and size.1-5 Although it is desirable to synthesize mono-disperse nanomaterials for many applications, poly-disperse nanomaterials with substantially different shapes, structures, and sizes are generally obtained. Therefore a method suitable for precise characterization of the morphological properties of nanomaterials is highly demanded. Furthermore, it would be very important for this method to fulfill statistical analysis of the morphological characteristics of the nanomaterials as well.
Spectroscopy techniques such as UV-vis spectrometry, small-angle X-ray scattering (SAXS), and dynamic light scattering (DLS) are commonly used to assess morphological characteristics of nanomaterials. But it is very difficult to find exact morphological characteristic of nanomaterials using these techniques.
On the other hand, as electron microscopy (EM) allows images of individual nanomaterials to be taken, it is one of the best ways to analyze the morphological characteristics of nanomaterials. Furthermore, as the images of hundreds of nanomaterials can be easily obtained at the same time, if EM characterization is taken at low magnification, it can be a powerful way to acquire morphological characteristics of nanomaterials in a statistical manner. However, morphological analyses of nanomaterials are still carried out manually by individual researcher as the size of nanomaterials are generally measured from only several of the hundreds of nanomaterials collected from the researcher and the shapes and structures of nanomaterials can only be described in a qualitative manner. Thus, an innovative method to analyze the morphological characteristics of nanomaterials in a quantitative manner is highly demanded.
Here we have developed highly precise method to analyze the morphological characteristics of nanomaterials from EM images. [precise/facile/simple] Among over 50,000 nanomaterials we tested, this method were capable to identify the size and shape of nanomaterials with high precision of 99.7% and low false discover rate of 0.2%. This method was also capable to automatically identify and either separate or exclude overlaid nanomaterials to minimize misidentification of their morphological characteristics. To achieve this, we processed individual EM images of nanomaterials in various routes, including diverse pre- and post- treatments and varied edge detection (multiple thresholds), using machine learning and stacked all the processed.
On the other hand, it is impossible to measure all the population of nanoparticles with EM equipment. Therefore, we tried to find the number of samples needed through repeated measures multivariate analysis, but it was difficult to find the point to be saturated. The significance (p-value) in the multivariate analysis only affects the null hypothesis (effect size = 0), not only the correlation but also the size of the sample. On the other hand, the effect size is expressed in standard deviation units, which can be compared between any studies and utilized in meta-analysis. Therefore, we calculated the effect size using Hedges' g according to the number of repeated measurements and the number of particles per measurement and found a point where the decreasing slope saturates.
In addition, it was possible to cluster nanoparticles effectively using Fourier transform and Gaussian mixer models, and to classify optical effects according to morphological characteristics analyzed by EM through comparison of far-field optical responses of nanoparticles through clustering there was. As a result, we propose the need for new analytical techniques using EM.
11:15 AM - MT02.13.08
Inverse Learning of Material Physics Through In Situ Image Data and Continuum Modeling
Hongbo Zhao1,Martin Bazant1
Massachusetts Institute of Technology1Show Abstract
With the availability of spatio-temporal imaging data of energy materials at the nano to micron scale, there is a tremendous amount of information about the material properties that can be extracted. As an example, we show that the thermodynamic model, or free energy landscape of a phase-separating system can be learned through inversion of models such as Cahn-Hilliard or Allen-Cahn equations. Other models that involve reaction kinetics can also be used to model the dynamics of imaged particles when they are chemical driven. Here, we demonstrate that the inverse problem technique can be applied to lithium-iron phosphate (LFP). By imaging the evolution of lithium concentration in the particle, we are able to extract its free energy and reaction kinetics, which typically are difficult to obtain through electrochemical measurements alone due to mosaic phase separation at the porous electrode scale.
11:30 AM - MT02.13.09
Evaluating Machine Learning as a Tool for Segmentation of In Situ TEM Data
James Horwath1,Dmitri Zakharov2,Eric Stach1
University of Pennsylvania1,Brookhaven National Laboratory2Show Abstract
In situ and operando experimental techniques afford scientists the ability to observe dynamic processes in material systems with high spatial and temporal resolution. However, such experiments can generate enormous amounts of data in very short amounts of time. Without frameworks to efficiently and quickly analyze data, the benefits of these novel experimental techniques cannot be fully realized. Simultaneously, while the field of deep learning for image classification and segmentation has developed to a point where non-experts can implement and extend models to a variety of applications, we face a need for reproducibility and physical understanding of the results these models provide. Here, we evaluate the use of machine learning as a segmentation tool for the measurement and localization of supported nanoparticles in a high temperature Environmental Transmission Electron Microscopy (ETEM) experiment. Our methods are capable of segmenting high-resolution images at rates greater than one image per second, with only minimal post-processing steps required for quantitative analysis.
To begin, we have developed a method to automate the creation of reliable training labels for high resolution ETEM images using traditional computer vision techniques. With this training data in hand, we study the role of convolutional neural network (CNN) architecture on the practical utility of different models, and the learnability of our dataset. We also demonstrate the impact of image resolution on the ability of a standard CNN model to learn seemingly simple image features. Our experiments show the importance of regularization, particularly for image analysis, and we expose the danger in engineering machine learning architectures to better fit data rather than learning features of physical importance. In a final test case, we exhibit the use of a simple, single-layer CNN to process high resolution images which confirms that important image features are learned in the earliest layers of a deep learning model, and promotes the notion of scaling back the depth of a model in favor of increased interpretability and extensibilty. The use of deep learning to solve physical problems is an area of active research, however a bridge is needed to relate learning theory and development to practical approaches for applying these techniques in other fields. In this work we attempt to systematically determine and investigate important features of machine learning models and datasets so that general tools for high-throughput data analysis can be developed.
11:45 AM - MT02.13.10
Robust Microstructure Representation
Devyani Jivani1,Olga Wodo1
SUNY Buffalo1Show Abstract
Microstructure quantification and design require a detailed and informative representation that uses a small number of design variables and at the same time allow to explore large microstructure spaces. The representation should capture the material features that are critical for establishing quantitative microstructure-property relationships. This work introduces skeletal-based microstructure representation that has capabilities to address all the above requirements. The modeling with skeletons is analogous to vertebrate animals and trees where the skeleton represents the underlying structure. Skeletons allow for the distillation of shape, size, and topological information. In principle, this paradigm allows the creation of models of arbitrary geometry and topology. The shape recognition algorithm – SpeedSeg is used to segment the skeleton into basic primitives to further decrease the design/storing space. In this set-up, a combination of arcs and line-segments is used.
The research demonstrates that using this representation shapes can be constructed incrementally and stored compactly. Moreover, the flexibility of this representation is demonstrated by modeling microstructures with a high level of spatial heterogeneity. The Cahn-Hilliard equation is used to model microstructure evolution between two polymers in the thin film. The microstructures are synthetically generated for varying blend ratios and interaction parameters. The study reveals that this representation leads to two orders of magnitude reduction in terms of storage required and associated number of variables. A similar analysis is performed for several other types of microstructures, including fibrous and cellular structures.