Elsa Olivetti, Massachusetts Institute of Technology
Muratahan Aykol, Toyota Research Institute
Logan Ward, University of Chicago
Jason Hattrick-Simpers, National Institute of Standards and Technology
GI01.01: Knowledge Discovery in Materials Science—Methods and Applications I
Tuesday AM, April 23, 2019
PCC West, 100 Level, Room 102 C
10:30 AM - *GI01.01.01
Data-Driven Molecular Engineering of Functional Materials
University of Cambridge1,STFC Rutherford Appleton Laboratory2,Argonne National Laboratory3Show Abstract
Large-scale data-mining workflows are increasingly able to predict successfully new chemicals that possess a targeted functionality. The success of such materials discovery approaches is nonetheless contingent upon having the right data source to mine, adequate supercomputing facilities and workflows to enable this mining, and algorithms that suitably encode structure-function relationships as data-mining workflows which progressively short list data toward the prediction of a lead material for experimental validation.
This talk describes how to meet these data science requirements via a large-scale data-mining case study that aims to discover materials with panchromatic optical absorption for solar-cell applications . In particular, the presentation shows how to auto-generate large material databases of photovoltaic-relevant experimental information from documents, using natural language processing and machine learning, via our ChemDataExtractor tool [2-4]. A workflow that executes large-scale electronic structure calculations to afford a computational counterpart to these experimental data is then described. These wavefunction calculations are used to extend knowledge beyond experiment. The resulting large database of chemical structures and their optical properties is then mined for materials discovery using custom-built algorithms that are encoded forms of structure-function relationships. These molecular design rules progressively filter the parent set of chemicals until a lead candidate appears, which is experimentally validated. The highly promising photovoltaic device outputs afforded experimentally from our predicted lead materials demonstrate the power of data-driven materials discovery.
The talk closes by translating the generic data-science aspects of this case study into the wider perspective of an overarching template for data-driven materials design, prediction and experimental validation of functional materials.
 Cole et al, Advanced Energy Materials, (2018) doi.org/10.1002/aenm.201802820
 Swain & Cole, J. Chem. Inf. Model. 56 (2016) 1894–1904
 Court & Cole, Scientific Data 5 (2018) 180111.
11:00 AM - GI01.01.02
Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks
Edward Kim1,Zach Jensen1,Alexander van Grootel1,Kevin Huang1,Matthew Staib1,Sheshera Mysore2,Haw-Shiuan Chang2,Emma Strubell2,Andrew McCallum2,Stefanie Jegelka1,Elsa Olivetti1
Massachusetts Institute of Technology1,University of Massachusetts Amherst2Show Abstract
Leveraging new data sources is a key step in accelerating the pace of materials design and discovery. To complement the strides in synthesis planning driven by historical, experimental, and computed data, we present an automated method for connecting scientific literature to synthesis insights. Starting from natural language text, we apply word embeddings from language models, which are fed into a named entity recognition model, upon which a conditional variational autoencoder is trained to generate syntheses for arbitrary materials. We show the potential of this technique by predicting precursors for two perovskite materials, using only training data published over a decade prior to their first reported syntheses. We demonstrate that the model learns representations of materials corresponding to synthesis-related properties, and that the model's behavior complements existing thermodynamic knowledge. Finally, we apply the model to perform synthesizability screening for proposed novel perovskite compounds.
11:15 AM - GI01.01.03
Teaching a Computer Synthesis—Obtaining “Codified Synthesis Recipes” by Machine Reading Millions of Papers
Olga Kononova1,Haoyan Huo1,Tanjin He1,Ziqin Rong2,Tiago Botari1,Vahe Tshitoyan2,Wenhao Sun2,Gerbrand Ceder1,2
University of California, Berkeley1,Lawrence Berkeley National Laboratory2Show Abstract
Nowadays materials discovery is significantly facilitated and accelerated by high-throughput computations, and for many applications we are able to predict specifically which material we would like to make. Still, when it comes to synthesizing a predicted material, experimental approaches often go through many attempts before the new material can be made (if at all), limiting the pace of materials discovery. Moreover, no universal theory of synthesis exist which would help to predict synthesizability of materials based on its computed parameters.
We aim to approach predictive synthesis of materials by learning it from existing data. This requires a comprehensive data set of inorganic materials synthesis rules stored in the structured machine-readable format. Unfortunately, for inorganic materials, no such database exists so far. Therefore, our first step is to create a collection of so-called “codified synthesis recipes”, and then data mine it to predict synthesis routes and guidelines for novel materials.
A large amount of information about materials synthesis has been accumulated in scientific publications over decades. Hence, we used it as a primary source of the data for our collection. We downloaded and parsed over 3 mil of scientific articles, classified them according to the synthesis type described in there and processed the paragraphs for “recipes” extraction. We created an “extraction pipeline” consisting of algorithms and models which use various Machine Learning and Natural Language Processing techniques to find materials entities, operations steps and synthesis conditions in corresponding synthesis paragraphs. Using these tools, we converted each of ~44,000 solid state synthesis paragraphs into a “codified recipe” which includes target material, starting precursors, firing step conditions (temperature, time, and atmosphere), and entire synthesis graph built from synthesis operations. The obtained a dataset of ~15,000 recipes is used to construct chemical reaction representing synthesis, and to predict synthesis guidelines.
11:30 AM - GI01.01.04
Materials Property Datasets with Minimal Effort Using Hybrid Human-Machine Text Extraction
Roselyne Tchoua1,Aswathy Ajith1,Zhi Hong1,Logan Ward2,Kyle Chard2,Debra Audus3,Shrayesh Patel1,Juan de Pablo1,2,Ian Foster1,2
The University of Chicago1,Argonne National Laboratory2,National Institute of Standards and Technology3Show Abstract
Despite significant progress in natural language processing and machine learning, there remains a gap between the current data extraction needs in fields such as materials science and the capabilities of state-of-the-art tools. In this talk we describe our efforts to develop human-machine methods for automatically extracting scientific facts from literature. Our overarching goal is to decrease the amount of manual extraction – a tedious, time-consuming, and error-prone process – by automating extraction activities where possible. With the assumption that our automated approaches require some supervision we seek to prioritize human involvement while optimizing overall extraction accuracy. We focus initially on the task of extracting polymer names and properties with the aim to create the aim to create an outline database of polymer properties. In this context, we have explored hybrid human-machine Information Extraction systems with varied amount of human involvement.
Our recent machine learning-based approaches for automatically identifying scientific named entities in text. To circumvent the need for a large annotated corpus, we use an ensemble of word embedding models and limited domain-specific knowledge to propose candidate entities (candidates are words that are deemed similar to a reference entities by our models). We assign the labeling of these candidates (identifying strings that were not actually target entities) to an expert material scientist. This task is more straightforward than reading and recognizing the entities in documents. Finally, we train a semi-supervised named entity word vector classifier to select actual target names from candidates proposed by the word embedding model. Our preliminary results are promising as they are comparable (within 10% recall or number of entities retrieved) to results extracted by a state-of-the-art, domain-specific Natural Language Processing toolkit. However, our approach requires only minimal human input and it does not rely on an exhaustively annotated corpus.
11:45 AM - GI01.01.05
A Classifier for Identifying Materials with Metal-Insulator Transitions
Nicholas Wagner1,Peiwen Ren1,James Rondinelli1
Northwestern University1Show Abstract
We have assembled the largest dataset of resistivity-temperature measurements on temperature-activated metal-insulator transitions (MITs) to date (45 unique compounds). We supplemented this dataset with additional entries on metals and insulators with known transport behavior, i.e., do not undergo temperature-driven MITs, for comparison. We then collected features for the 147 compounds which describe chemical composition (e.g. mean electronegativity, atomic radii, and elemental heat of fusion); overall and local atomic structure; and estimates of the on-site electron repulsion, charge transfer energy, and compound polarizability. From this data, we constructed a machine-learning classifier to predict whether a material would undergo a MIT or not. Our model achieves a cross-validation AUC score of 88.24 +/- 11.63 and a mean accuracy of 79.23 +/- 9.23% on this metal-insulator transition prediction task. We also conducted a survey of 51 graduate students, faculty, and staff scientists to estimate the ability of scientists to classify metals vs. insulators vs. MIT. The mean accuracy for humans was 59.8%.
GI01.02: Knowledge Discovery in Materials Science—Getting More Out of Characterization
Tuesday PM, April 23, 2019
PCC West, 100 Level, Room 102 C
1:30 PM - *GI01.02.01
Knowledge from Atomically Resolved Images—Deep Learning Meets Statistical Physics
Sergei Kalinin1,Stephen Jesse1,Christopher Nelson1,Maxim Ziatdinov1,Rama Vasudevan1,Ondrej Dyck1,Andrew Lupini1
Oak Ridge National Laboratory1Show Abstract
Atomically resolved imaging techniques including scanning transmission electron microscopy and scanning tunneling microscopy are almost routine by now and provide atomically-resolved pictures of static structures and their evolution with time, as well as insight into local electronic properties. However, the wealth of information stored within these images is still not fully harnessed. The use of this data for predictive materials design can often be broken into a two-part problem, with the first being the feature extraction, including mapping all atomic coordinates, isolating the defects, classifying the symmetry, etc. In many or all these areas, deep learning provides a robust tool that can be used in near real-time on atomically resolved images and can be trained on simulated images without need for expensive experiments to capture large training sets [1-3]. In parallel, once the local atomic configurations are identified, the question becomes how to use such information to understand the system and its interactions, and ultimately use them to build models with predictive capabilities. One answer is based on mesoscopic model matching, where the material properties are described by the corresponding Ginzburg-Landau (GL) free energy. The corresponding analytical solutions for well-defined defects including domain boundaries, surfaces, or interfaces can then be fitted to STEM data, providing information on (poorly known) gradient terms and boundary conditions. For discrete systems, we propose that by studying the characteristic structural and chemical fluctuations that exist within a single chemical composition, we can infer the relevant interactions and produce a generative model that can predict properties over a range of scales in a finite region of chemical and temperature space. We use the compositional and structural fluctuations in the quenched (static) system to build a generative model encoding the effective interactions in the system. Finally, we extend this machine learning approach to the mapping of solid-state reaction mechanisms. We developed a deep learning approach that allows fully automated identification of individual atoms in STEM images, using theoretical or labeled images as a training set. We extend this approach to construct reaction pathways for point defects in 2D materials, trace the structural evolution of atomic species during electron beam manipulation, and create a library of defect configurations in Si- and vacancy doped graphene.
This research is supported by the by the U.S. Department of Energy, Basic Energy Sciences, Materials Sciences and Engineering Division and the Center for Nanophase Materials Sciences, which is sponsored at Oak Ridge National Laboratory by the Scientific User Facilities Division, BES DOE.
 M Ziatdinov, O Dyck, A Maksov, X Li, X Sang, K Xiao, R. Unocic, R. Vasudevan, S. Jesse and SV Kalinin, ACS Nano 11 (2017), p. 12742 .
 R Vasudevan, N Laanait, EM Ferragut, K Wang, DB Geohegan, K Xiao, M Ziatdinov, S Jesse, O Dyck and SV Kalinin, npj Computational Materials 4 (2018), p. 30.
 M Ziatdinov, O Dyck, A Maksov, B Hudak, A Lupini, J Song, P Snijders, R Vasudevan, S Jesse and SV Kalinin, arXiV (2018), p. 1801.05133
 L Vlcek, RK Vasudevan, S Jesse and SV Kalinin, J. Chem. Theor. Comp. 13 (2017), p. 5179.
 L Vlcek, M Pan, RK Vasudevan and SV Kalinin, ACS Nano 11 (2017), p. 10313.
2:00 PM - *GI01.02.02
Artificial Intelligence for Knowledge Generation in Materials Science
Carnegie Mellon University1Show Abstract
The process of scientific inquiry involves observing a signal (data) and interpreting it to generate information (knowledge). For example, in electron microscopy the signal may be a diffraction pattern from which information on crystal orientation may be deduced by applying diffraction theory. Science advances both through improvements in gathering data and in techniques for extracting knowledge from it. Artificial intelligence (AI) – a broad term comprising data science, machine learning (ML), neural network computing, computer vision, and other technologies – opens new avenues for extracting information from high-dimensional materials data. In that sense, AI offers the possibility to advance materials science in the same way as a new imaging modality or a new theoretical model. The applications of AI in materials science cut a broad swath, from large, labelled data sets the fit naturally in the Big Data paradigm to small, sparse, multimodal data sets that test the limits of cutting-edge AI. This presentation will focus on AI applications in the context of multimodal image-based data, including experimental and simulated micrographs that include composition, processing, or properties metadata. Computer vision (CV) representations are developed to numerically encode the visual information contained in images. ML tools are then selected based on the characteristics of the data set and the desired outcome. For example, a large, homogeneous data set of steel inclusions is best suited to a Deep Learning approach involving a purpose-built convolutional neural network. In contrast, a random-forest method can find significant trends in a small, multi-modal data set that includes microstructural, crystallographic, and micromechanical data. Complex image segmentation leverages a convolution neural network that has been trained using images very different from those it is applied to. These case studies will motivate a discussion of AI method selection based on data set characteristics and desired outcomes. The ultimate goal is to develop AI as a new tool for information extraction and knowledge generation in materials science.
2:30 PM - GI01.02.03
Metric Learning of Composition-Current Mapping from High-Throughput Experiments to Accelerate Catalyst Discovery for Fuel Cells and Metal-Air Batteries
Olga Wodo1,Kiran Vaddi1,Surya Vamsi Devaguptapu1,Fei Yao1,Brian Hayden2,Krishna Rajan1
University at Buffalo, The State University of New York1,University of Southampton2Show Abstract
The high-throughput exploration of materials space has been recognized as a new paradigm in materials design and discovery. However, typical high-throughput exploration methods deliver high-dimensional, sparse and very diverse datasets that pose the challenge of extracting the key features and trends that could guide the discovery process. This is a non-trivial task as quite often the underlying physical phenomena are uncertain and latent variables governing the performance are largely unknown. To address this challenge, we propose metric learning tools that are able to extract information from high-throughput exploration and material characterization experiments rapidly. We introduce metric learning based methodology for high throughput exploration of bi-functional catalysts for electrochemical systems (e.g., fuel cells and metal-air batteries). The key aspects of our methodology are (i) learning the similarity measures, as opposed to using fixed similarity measures (e.g., Euclidean distance, dynamic time warping) and (ii) imposing continuity in the composition space. In particular, using this methodology, we discover the relationship between the composition of multi-metal oxide catalysts (LaMnNiO, NiCeCoFeO) and the cyclic voltammetry (CV) curves. We demonstrate how our approach discovers the natural groups in the high dimensional datasets. This is of high importance, as the discovery of these natural groups can guide the identification of potential electrochemical regimes and underlying physical phenomena that are uncertain or unknown a priori (e.g., pure physical adsorption, redox reaction).
2:45 PM - GI01.02.04
Performance Assessments from Low-Cost Surrogate Measurements
Helge Stein1,Dan Guevarra1,Joel Haber1,John Gregoire1
California Institute of Technology1Show Abstract
Despite advances in high-throughput electrochemistry that enable the characterization of thousand of materials a day, the extraction of certain performance figures of merit requires time intensive measurements that diminish throughput, particularly when performance varies over time. One example is the performance and stability evaluation of oxygen reduction reaction (ORR) catalysts where gradual corrosion processes inherently require long-duration experiments to assess a catalysts stability and activity. We present methods of predicting key performance and stability figures of merit for ORR catalysts from soley analyzing fast CV scans as a surrogate measurement. To navigate high-dimensional composition spaces amended by low dimensional representations of CV curves, we employ multidimensional scaling. The visualization schemes presented are discussed as a template example of cooperatively harnessing artificial and human intelligence.
GI01.03: Automation of Materials Research—From Robots to Software
Tuesday PM, April 23, 2019
PCC West, 100 Level, Room 102 C
3:30 PM - *GI01.03.01
A Self-Driving Laboratory for Accelerating Materials Discovery
Curtis Berlinguette1,Jason Hein1,Alán Aspuru-Guzik2,Ben MacLeod1,Fraser Parlane1,Brian Lam1
The University of British Columbia1,The University of Toronto2Show Abstract
This presentation will focus on our self-driving laboratory for thin film materials discovery and optimization. Discovering high-performance, low-cost materials is an integral component of technology innovation cycles, particularly in the clean energy sector. The linear methodology currently used to develop optimal materials can take decades, which impedes the translation of innovative technologies from conception to market. Our interdisciplinary team is utilizing advanced robotics, machine learning, and computational screening to overcome this challenge. We are closing the feedback loop in thin film materials research by enabling our self-driving robotics platform named “Ada” to design, perform, and learn from its own experiments efficiently and in real time. As a proof-of-principle set of experiments, I will show how Ada discovers and optimizes high-performance, low-cost hole transport materials for use in advanced solar cells. I will also showcase how Ada’s modular design can enable the automated and autonomous discovery of materials for other clean energy technologies.
4:00 PM - *GI01.03.02
Robot-Enabled Halide Perovskite Discovery—A Case Study in Autonomous Materials Exploration
Fordham University1Show Abstract
Halide perovskites—such as methylammonium lead iodide—are an emerging class of solution processable materials exhibiting a tremendous range of structural diversity, and resulting optoelectronic properties. Growing large single crystals suitable for crystallographic structure determination is an important step in understanding these new compounds, but identifying the narrow parameter windows needed for crystal growth is typically performed by trial and error. Data-driven approaches can accelerate the discovery of new materials. But to achieve this promise, we need a comprehensive data capture of both success and failure that is unavailable in the published literature.
To obtain an unbiased exploration of chemical space, we’ve built a robotic system for halide perovskite synthesis. Our general software infrastructure enables remote users to specify electronic experiment plans to be conducted in the laboratory and captures the resulting data in machine readable form. This in turn allows human scientists and algorithms to remotely conduct synthesis experiments, enabling closed loop autonomous discovery of new materials. In the lab, we have developed “robot-ready” halide perovskite syntheses methods that avoid the corrosive reagents and extreme temperature conditions typically used to make these materials, and use only liquid-phase reagent solutions so as to avoid slow solid dispensing steps. This allows us to use commercial liquid handling robots to perform automated experimentation. By collecting a comprehensive dataset of thousands of reactions, we have been able to apply machine learning approaches to optimizing novel reactions and learning about the physicochemical requirements for crystal formation.
4:30 PM - GI01.03.03
ChemOS—Orchestrate Self-Driving Laboratories for Next-Generation Experimentation
Loïc Roch1,2,Florian Häse3,1,2,Alán Aspuru-Guzik2,1,4
Vector Institute for Artificial Intelligence1,University of Toronto2,Harvard University3,Canadian Institute for Advanced Research (CIFAR)4Show Abstract
Current approaches to materials discovery require up to two decades of fundamental and applied research for materials technologies to reach the market. This slow and capital-intensive turnaround calls for disruptive strategies to expedite innovation. Self-driving laboratories can provide the means to revolutionize experimentation by empowering automation with artificial intelligence (AI) to enable autonomous discovery. However, the lack of adequate software solutions to enable autonomous experimentation significantly impedes the development of the self-driving laboratories.
In the self-driving laboratories, AI algorithms continuously learn, model, design and recommend experiments to be executed on the automated platforms, in a closed-loop approach. This procedure differs from combinatorial or high-throughput experimentation, in which experimental campaigns are designed prior to starting the experimentation process. By carrying out an optimal set of experiments, and by interconnecting geographically distributed laboratories into a single self-driving laboratories, the suggested approach has the potential to significantly lower the costs associated with the initial discovery, thus, reducing the price of the finalized products (e.g. solar panels, batteries, etc.).
This talk addresses the challenges of engineering a software package, which facilitates the development and deployment of the self-driving laboratories. Recently, we made significant progresses towards addressing this challenges and have implemented ChemOS; a versatile, flexible, and modular orchestration software, which contains the essential layers indispensable for operating the self-driving laboratories. ChemOS coordinates the overall computational and experimental workflow, monitors experiments, administrates data collection, storage and sharing as well as details about the configurations of the available automated laboratory equipment, potentially distributed across different physical laboratories. The functional design of ChemOS and its modular structure allow for the global control of complex heterogeneous automation platforms. ChemOS facilitates interaction between researchers, AI algorithms, and robotic hardware by providing various intuitive interfaces via natural language processing. One of the crucial components of ChemOS are the various AI algorithms encapsulated in the learning module. Notably, this module includes Phoenics and Chimera, two in-house developed AI algorithms specifically designed to enable an optimal use of automated equipment for Pareto optimization in chemistry and materials science. These algorithms learn procedures on-the-fly, with no prior assumptions allowing to follow unprecedented routes to scientific discovery.
We demonstrate the performance of ChemOS on the optimization of blends consisting of hole-transport materials, stabilizers, and dopants with the goal to maximize both the conductivity and the stability of spin-coated film on glass substrate. In this example, the experimental procedure is remotely orchestrated by ChemOS, and experimentations run in full autonomy.
4:45 PM - GI01.03.04
Data Services to Increase Data Accessibility and Adoption of Data-Driven Materials Science Research
Marcus Schwarting1,Ben Blaiszik1,Logan Ward1,Jonathon Gaff1,Ryan Chard1,Zhuozhao Li2,Kyle Chard2,Yadu Nand2,Evan Pike3,Michael Franklin2,Steve Tuecke1,Ian Foster1
Argonne National Laboratory1,The University of Chicago2,Cornell University3Show Abstract
Increasing the accessibility, application, and adoption of data-driven materials science research techniques requires a software and data service infrastructure to enable simple and broad access to large sources of materials data and state-of-the-art machine learning and physical models. For these data and models to be most useful, researchers must be able to find, access, gather, and use them. Yet datasets may be large (i.e., many terabytes in size or comprised of millions of files), of varying qualities, heterogeneous (i.e., many file types), or located on distributed storage resources or behind varying service layers. Similarly, models of interest to a researcher may be difficult to find, completely unavailable, out of date with current data, or require a high level of technical proficiency or significant computational resources to recreate and run. Thus, building a data service infrastructure capable of simplifying and automating aspects of data and model discovery, access, and usage remains a key challenge towards speeding the pace of discovery and innovation in materials science. Here, we present a set of such data services: (1) the Materials Data Facility (MDF) to streamline broad data sharing regardless of size and location, automate materials-specific indexing of dataset contents, and enable data discovery to spur new analyses and machine learning efforts; and (2) the Data and Learning Hub for Science (DLHub) to enable publication of models, facilitate invocation of state-of-the-art models on new data through a hosted service, and automate the retraining of models when new data are available. We present the DLHub and MDF services as a means to bring data-driven materials capabilities to a much wider set of research users. These capabilities are shown via a set of examples highlighting the ways MDF and DLHub features can be combined to enable data discovery and model usage with far less effort that previously required.
GI01.04: Poster Session: Advancing Materials Discovery with Data-Driven Science
Tuesday PM, April 23, 2019
PCC North, 300 Level, Exhibit Hall C-E
5:00 PM - GI01.04.01
Program for Three-Dimensional Quantification of Elemental Segregation to Surfaces in Large APT Datasets
Linqing Peng1,Jonathan Poplawsky2
Grinnell College1,Oak Ridge National Laboratory2Show Abstract
Atom Probe Tomography (APT) has gained increasing popularity in characterizing a wide range of materials such as alloys, semiconductors, and oxides. The three-dimensional distribution of millions of atoms of all elements with sub-nanometer resolution can be identified within hours. With efficient analysis tools, APT exhibits significant potential to produce big data for materials design and applications in machine learning. However, the size of APT datasets makes data analysis challenging for a normal desktop computer. One common demanding task is analyzing elemental distribution near a curved interface of interest from a large APT dataset. I will present a Python data mining tool that we developed to efficiently quantify and visualize elemental segregation at interfaces derived from 3D elemental distribution. The tool is equipped with the capability of parsing the widely used VRML file format generated by the commercially available CAMECA software IVAS and can readily read the iso-concentration surface produced by the software to define the interface of interest. The time complexity of the searching algorithm was optimized to O(N) to reduce the overall calculation time to minutes for a dataset containing millions of atoms. The program was further optimized for elements of low concentration by allowing users to reduce statistical errors by extending the bin size of counting near-surface elemental concentration. Developed in python, this open-source program will be available to the whole scientific community. The program uses the Visualization Toolkit (VTK), an open-source and freely available 3D visualization package, that allows the users to readily customized visualization and to further process the data.
The function and method of the program will be demonstrated with a successful application to microalloying elemental segregation in the Al-Cu alloy system that is responsible for the alloy’s high-temperature properties. The segregation of microalloying elements Zr and Cu that synergistically stabilize interfaces between the strengthening θ' phase Al2Cu precipitate and the α phase Al matrix has been quantified and visualized. The concentration color-map revealed Mn segregation differences between the highly oriented precipitate interfaces, with more Mn segregating to the less stable semi-coherent interface. Zr atoms, in contrast, show the opposite behavior. The results of our study were compared with density functional theory (DFT) calculations to reveal the fundamental segregation mechanism. Overall, our program for analyzing APT data can be used to efficiently quantify and visualize the atomic percentage or excess of elements at every point on particle interfaces within any standard APT dataset and to prepare big data for future machine learning applications. The research was conducted at ORNL's Center for Nanophase Materials Sciences (CNMS), which is a U.S. DOE Office of Science User Facility.
5:00 PM - GI01.04.02
Natural Language Processing for Materials Discovery and Design
University of California, Berkeley1Show Abstract
The majority of all materials data is currently scattered across the text, tables, and figures of millions of scientific publications. We present progress on our use of natural language processing (NLP) and machine learning techniques to extract materials knowledge by textual analysis of the abstracts of several million journal articles. To date, we have extracted more than 60 million mentions of materials, structures, properties, applications, synthesis methods, and characterization techniques from our database of over 3 million materials science abstracts. With this data, we can augment conventional materials informatics techniques with NLP-derived features, showing significantly improved performance for materials discovery and design. In particular, we illustrate that many commonly used chemical features in property prediction models are outperfomed by features constructed from text-mined chemical word embeddings. In addition, we demonstrate that new functional materials, such as thermoelectrics and topological insulators, can be identified using only contextualized word embeddings for materials. These embeddings also prove useful as inputs for more advanced machine learning models.
5:00 PM - GI01.04.03
Augmenting Machine Learning of Energy Landscapes with Local Structural Information
Shreyas Honrao1,Stephen Xie2,Richard Hennig2,1
Cornell University1,University of Florida2Show Abstract
We present a machine learning approach to calculate formation energies of compounds relative to the ground state crystal structure of the pure components in the context of structure predictions. Typical methods for structure predictions such as genetic algorithms often rely on density-functional theory codes to perform such calculations at a relatively high computational cost. Here, we explore commonly used learning algorithms such as kernel ridge regression, support vector regression, and artificial neural networks. The efficiency of machine learning approaches relies on suitable data representations that encode the relevant physical information about the crystal structures. We illustrate a novel representation using local radial and angular distribution functions. We apply the machine learning approaches to binary systems and show that these methods provide small root-mean square prediction errors of a few meV/atom across the composition and structure space. The high accuracy makes our machine learning models great candidates for the exploration of energy landscapes.
5:00 PM - GI01.04.04
Predicting Material Properties Using a Novel Descriptor “Elemental Fingerprints” with Neural Networks
Jaekyun Hwang1,Satoshi Watanabe1
The University of Tokyo1Show Abstract
Thanks to the rapid growth of machine learning (ML) techniques, it has become possible to predict various material properties, screen promising candidate materials having desired properties, and select important features.1 In many cases, however, a model for ML is applicable to a specific family of materials or often requires information about atomic arrangements that are difficult to obtain in real experimental situations. On the other hand, it is well known that the insufficiency in information degrades ML accuracy severely. In this study, we focus on cases where only limited information, such as a concentration of raw materials or chemical formula, is available for ML. We propose a novel descriptor set named “elemental fingerprints set” for such a situation.
The elemental fingerprints set is made from the frequency distribution of properties of the elements in the target material. This set has an advantage that no information on atomic arrangements, such as crystal structure and coordination numbers, is necessary for its construction. This feature fits well with the real situation of experiments where actual atomic arrangements are not known, and only the experimental conditions and target materials are controllable. To demonstrate the effectiveness of this elemental fingerprints set, we tried to train and predict the standard formation energy, band gap energy, and volume per atom (or density) using data taken from Open Quantum Materials Database (OQMD).2 For materials having more than one atomic structure in OQMD, we used the most stable crystals to establish the one-to-one correspondence between materials and their properties.
The models for ML were constructed using the neural network ensemble with adversarial training.3 This method enabled us to reduce the prediction error by more than 25% compared with a naive single neural network model. It is also worth noting that this ensemble method is suited for modern parallel computing, enabling fast training and prediction.
The elemental fingerprints set shows always better performance than the two descriptor sets suggested previously.4 The best performance in our study was obtained when the combination of three descriptor sets, i.e. the previous two and ours, was used together. The standard formation energy is predicted with the mean absolute error of 31.7 meV/atom and coefficient of determination of 0.981 in the test of 25,315 compounds after training with 227,838 compounds.
 Y. Liu, T. Zhao, W. Ju, S. Shi, J. Materiomics 3, 159 (2017).
 S. Kirklin, J. E. Saal, B. Meredig, A. Thompson, J. W. Doak, M. Aykol, S. Rühl, C. Wolverton, npj Comput. Mater. 1, 15010 (2015).
 B. Lakshminarayanan, A. Pritzel, C. Blundell, Adv. Neural Inf. Process. Syst. 30, 7219 (2017).
 L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, npj Comput. Mater. 2, 16028 (2016).
5:00 PM - GI01.04.05
Machine Learning Accelerates the Characterization of Functional Materials
Massachusetts Institute of Technology1Show Abstract
During a typical R&D cycle of learning, approximately a quarter of the time is directly invested in diagnosing the root cause(s) of underperformance, and additional time is indirectly wasted due to faulty diagnoses resulting in less productive experimental cycles. In this presentation, I demonstrate a >10x acceleration of diagnosis, with improved accuracy, using classification, regression, and advanced statics algorithms. I’ll dive deep into how classification and analysis of spectra, including X-ray diffraction data, can be performed within seconds using fully convolutional neural networks with global averaging layer, leading to an accuracy improvement of ~5% over conventional neural networks. We identify and distinguish between sources of error using class averaging maps (CAMs). This allows us to visualize the output of our algorithm, and observe what features in the spectra are used more heavily by the FCN to perform the analysis. We demonstrate that characterization problems constitute a non-trivial classification problem, by employing the t-SNE methodology, as different classes are often not linearly separable. In closing, I’ll highlight several other cases where neural networks and Bayesian inference has shed light on characterization, aiding scientists in more rapidly improving their materials.
5:00 PM - GI01.04.06
Optimization of Transparent Hole-Conducting Materials Via Machine Learning
Lingfei Wei1,2,Xiaojie Xu3,James Bullock4,5,Gurudayal Gurudayal1,Joel Ager1,5
Lawrence Berkeley National Laboratory1,Southeast University2,Lawrence Livermore National Laboratory3,The University of Melbourne4,University of California, Berkeley5Show Abstract
P-type transparent conducting thin films (p-TCMs) are essential components of optoelectronic devices including solar cells, ultraviolet detectors, displays, and flexible sensors.1 Cu-Zn-S (CZS) thin films prepared by chemical bath deposition (CBD) can have both high transparency in the visible range as well as excellent hole conductivity (>1000 S cm-1).2 However, the interplay between the deposition parameters in the CBD process (metal and sulfur precursor concentrations, temperature, pH, complexing agents, etc.) creates a multi-dimensional parameter space such that optimization for a specific application could be time-consuming. Here, we show that fractional factorial design of experiment (DoE) combined with machine learning allows for efficient optimization of p-TCM performance. The approach is guided by a figure of merit (FOM) related to the film conductivity and transmission T in the desired spectral range for the application (FOM, ΦH=T^10/Rs), where Rs is the sheet resistance. A specific example will be shown with 4 CBD deposition factors, leading to 62 experiments including repetitions. The machine learning model is based on Support Vector Machine Regression (SVR) employing a radial basis function (RBF) as the kernel function.3 10-fold cross-validation scheme was performed to mitigate overfitting. Predicted areas in the parameter space with maximal FOMs were selected for a second round of optimization (48 experiments). Performance of optimized films as hole contacts in solar cells and in UV photodiodes will be presented. The optimization approach shown here will be generally applicable to any materials synthesis process with multiple parameters.
1. Morales-Masis, M.; De Wolf, S.; Woods-Robinson, R.; Ager, J. W.; Ballif, C. Transparent Electrodes for Efficient Optoelectronics. Adv. Electron. Mater. 2017, 3, 1600529.
2. Xu, X.; Bullock, J.; Schelhas, L. T.; Stutz, E. Z.; Fonseca, J. J.; Hettick, M.; Pool, V. L.; Tai, K. F.; Toney, M. F.; Fang, X.; Javey, A.; Wong, L. H.; Ager, J. W. Chemical Bath Deposition of P-Type Transparent, Highly Conducting (CuS)x:(ZnS)1-x Nanocomposite Thin Films and Fabrication of Si Heterojunction Solar Cells. Nano Lett. 2016, 16, 1925–1932.
3. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
5:00 PM - GI01.04.07
Structural Evaluation of Ca1-xBixMnO3-δ Using Combination of Newly High-Throughput Data Collection Tool for Synchrotron Powder X-Ray Diffraction and Automatic Structural Refinement Software
Kenjiro Fujimoto1,Akihisa Aimi1,Yusuke Yamada1,Shingo Maruyama2
Tokyo University of Science1,Tohoku University2Show Abstract
In case of high-throughput materials exploration, we have to treat about several hundred samples in one day. In conventional method for synchrotron powder X-ray diffraction, we have to fill fine capillaries (0.2 mmΦ) with well-grounded powder. We need more than 10 hours for only sample filling when we measure 100 sample in one day. In this study, we made a prototype for effective and high-throughput evaluation in synchrotron X-ray powder (XRD) diffraction.
Well-grounded perovskite-type Ca1-xBixMnO3-δ powder library were set on a reaction plate (35×35×5mmt), which had 36 reaction wells (diameter 4 mm) and developed for high-throughput materials exploration in our group. Powder library deposited in evenly arranged wells were picked up by polyimide tape and set them to the prototype data collection tool developed as an alternative to fine capillaries. Tool were created with 3D printers. The powder adhered to the polyimide tape was made to enter the X-ray irradiation position continuously like a cassette tape.
Ideal diffraction data was obtained by swinging the tape several degrees around the irradiation position when X-rays were irradiated on the powder fixed to the tape. The lattice constant change according to Bi substitution amount can be calculated in a short time from the automatic structural analysis program developed by our group. And it was confirmed that the result was roughly the same as conventional Rietveld analysis by manual.
These XRD and XAFS experiments were conducted at the BL5S1 and BL5S2 of Aichi Synchrotron Radiation Center, Aichi Science & Technology Foundation, Aichi, Japan (Approval No.2017P0202).
5:00 PM - GI01.04.08
Distribution of Zr Atoms in Σ3(1-12)/ Ce1-xZrxO2 Grain Boundary Using Genetic Algorithm and Substitution Region Restriction Method
Yeong-Cheol Kim1,Ki-Yung Kim1,Young-Bok Kim1
Since most technical materials are polycrystalline, it is very important to study how grain-boundaries affect the atomic distribution in crystalline materials. When Zr is added to the Σ3(1-12)/ CeO2 grain-boundary structure to make CeO2-ZrO2 solid solution, the number of cases where Zr can be substituted to the Ce site is too many; in the grain-boundary structure that is composed of 48 Ce and 96 O atoms, the number of cases for 9 Zr atoms is about 109.
Because added atoms are usually segregated to the grain-boundary, Mizoguchi group restricted the substitution of atoms only to atom sites around grain-boundaries and obtained an optimum structure at high speed [1, 2]. When the atoms, however, are added more than certain amount, they may start to go to bulk region. We used lattice statics and genetic algorithm to study Zr distribution in the grain boundary structure; Zr atoms gathered around the grain-boundary. We increased the structure size further to reduce the effect of grain-boundary interaction and consider the bulk region in the structure. In order to reduce the computation time, some amount of Zr atoms were first substituted to the atomic sites near the grain-boundary, and then the remaining Zr atoms were substituted to the remaining Ce sites excluding the already substituted sites in the whole Ce sites. This substitution region restriction method could help reduce the number of cases and find optimum structures at high speed.
 S. Kiyohara and T. Mizoguchi, Physica B, 2018, 532, 9-14.
 S. Kikuchi, H, Oda, S. Kiyohara, and T. Mizoguchi, Physica B, 2018, 532, 24-28.
5:00 PM - GI01.04.09
Construction of Neural Network Potential to Investigate Interface Structures, Ion Migration Under Electric Fields and Phonon Properties
Koji Shimizu1,Takanori Moriya1,Masayoshi Ogura1,Wei Liu1,Wenwen Li2,Yasunobu Ando2,Emi Minamitani1,Satoshi Watanabe1,3
The University of Tokyo1,National Institute of Advanced Industrial Science and Technology2,National Institute for Materials Science3Show Abstract
Recently, the construction of interatomic potentials using first-principles calculation data and machine-learning technique has been widely tried because of higher reliability and low computational costs. Our group previously constructed the interatomic potentials of amorphous-Li3PO4 using neural network (NN) , and showed that the calculated Li-ion conductivities agree well with experimental data . In the present study, we have tried to extend the application range of the NN potentials (NNPs) in the following three aspects: (1) Metal/solid electrolyte interface structures (Au/Li3PO4 for the development of all-solid-state Li-ion batteries and novel memory devices) , (2) ion dynamics under electric fields (Li migration in amorphous-Li3PO4), and (3) phonon properties (wurtzite GaN for the power semiconductor devices).
(1) The construction of NNP for Au/Li3PO4 is challenging because this is a four-elements system which involves a large number of input structures and parameters. We have explored the possibility of accelerating the construction process by the following procedure: First, we constructed two separate NNPs optimized for Au and Li3PO4, respectively, and then constructed the NNP for the Au(111)/Li3PO4 system with adding the interactions of Au−Li, −P, and –O to the two NNPs. We have found that the NNPs show comparable accuracy with the conventional one, while the present approach required less computational time for NN potential optimization.
(2) The charge state is of critical importance to evaluate the change in atomic forces due to applied electric fields. By examining the forces acting on atoms in amorphous-Li3PO4 under electric fields using density functional theory (DFT) calculations, we have found that a proportional relationship between the changes in atomic forces and the electric field, and a strong correlation between local atomic structures and Born effective charges. Based on these findings, we have constructed the NNP which can predict the Born effective charges. In the presentation, we will also show the molecular dynamics simulation results under electric fields.
(3) Since the prediction of phonon behavior needs higher-order derivatives of energy than that of force, the accuracy of NNPs constructed with the conventional procedure  is often insufficient for phonon properties. Therefore we have optimized the NNPs so as to reproduce the atomic forces obtained by DFT calculations. The phonon dispersion calculated using this NNP agrees well with DFT results. In the presentation, we will also discuss the thermal conductivities of wurtzite GaN obtained by non-equilibrium molecular dynamics simulations.
This work was supported by CREST, JST and JSPS KAKENHI, Japan.
 J. Behler et al., Phys. Rev. Lett. 98, 146401 (2007).
 W. Li et al., J. Chem. Phys. 147, 214106 (2017).
 I. Sugiyama et al., APL Mater. 5, 046105 (2017).
Elsa Olivetti, Massachusetts Institute of Technology
Muratahan Aykol, Toyota Research Institute
Logan Ward, University of Chicago
Jason Hattrick-Simpers, National Institute of Standards and Technology
GI01.05: Accelerating Materials Research with Machine Learning I
Wednesday AM, April 24, 2019
PCC West, 100 Level, Room 102 C
8:00 AM - GI01.05.01
Inverse Design of Thermoelectric Materials—Results and the Case for a Database of Charge Scattering Times
Institute of Materials Research and Engineering1,Nanyang Technological University2Show Abstract
Deep Learning algorithms such as neural networks have recently emerged and have the potential to enable data-driven discovery of new material properties. Functional properties are especially difficult to predict as they not only depend upon ground state properties that are routinely calculated by first principles DFT, but also transport properties such as scattering times and energy dependence. We show the first demonstration of utilizing fully connected neural networks to not only predict thermoelectric properties of the known database of materials, but also towards inverse design and feature selection for both atomic and material descriptors. The limitation of such a technique is that data on non-equilibrium descriptors such scattering times does not exist and I will talk about our foray into creating and learning from such a dataset - the Singapore Materials database (singmat) enabled by high performance computing.
8:15 AM - GI01.05.02
Pursuing the Next-Generation of High-Efficiency Phosphors with Machine Learning
University of Houston1Show Abstract
The development of new phosphors that are necessary for the next generation of high efficiency LED lighting requires a unique approach for materials discovery. Researchers often rely on chemical substitution or serendipity to identify new materials; however, this inevitably leads to slow, incremental advances in technology development. Our work has recently created a new approach that uses computational chemistry and machine learning to identify new material guiding our experimental efforts. By predicting the vibrational properties and electronic structure of potential phosphors compounds, high-efficiency materials can be screened a priori ensuring the only best materials are experimentally explored. Following this methodology, our research has developed a number of materials ranging from borates to nitrides with high efficiency and thermal stability at elevated temperatures. Moreover, the complementary use of computation, machine learning, and synthesis provides a fundamental understanding of the composition, structure, and property relationship necessary for the continued advanced optical materials.
8:30 AM - GI01.05.03
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals
Chi Chen1,Weike Ye1,Yunxing Zuo1,Chen Zheng1,Shyue Ping Ong1
University of California, San Diego1Show Abstract
Fast, accurate and transferable surrogate models for property prediction have the potential to rapidly accelerate materials design and discovery. However, classical machine learning models typically depend on feature engineering and the transferability is limited in vast chemical space. Graph networks are a new ML paradigm that supports both relational reasoning and combinatorial generalization. Graphs are a natural representation for a system of atoms and the bonds between them. In addition, graph networks employ graph-level attributes to include structural independent states. Here, we develop, for the first time, MatErials Graph Network (MEGNet) for accurate property predictions in molecular and crystalline materials. We show that the MEGNet models outperform existing ML models in 11 out of 13 properties of the QM9 molecule data set. Furthermore, a single-task MEGNet model can accurately predict internal energy, enthalpy and free energy using temperature, pressure and entropy as graph-level inputs. Similarly, the MEGNet models trained on ~60, 000 crystalline materials achieved significantly lower errors compared to the state-of-the-art models on formation energy, band gap and elastic moduli. Such MEGNet models are interpretable and well-known chemical trends of elements can be extracted from the model-learnt elemental embeddings. Lastly, we demonstrate that transfer learning of elemental embeddings trained from a larger data set can accelerate the training of property models with smaller amounts of data, addressing one of the critical bottlenecks to application of machine learning in materials science.
8:45 AM - *GI01.05.04
Automated Machine Learning Applied to Diverse Materials Design Problems
Lawrence Berkeley National Laboratory1Show Abstract
There have been many recent machine learning efforts aimed at determining composition-property or structure-property relationships. Typically, each application and data set requires fitting its own model. In this presentation, I will describe a general algorithm called "automatminer" that automatically determines a machine learning model for composition-property or structure-property relationships given a data set. Automatminer generates materials science descriptors, performs feature selection, and conducts model and hyperparameter optimization all as a "black box" process. With automatminer, no user intervention is necessary to form a machine learning model. I will report automatminer's performance on a diverse array of materials data sets reported in the literature, showing performance that is comparable to or exceeds that of hand-tuned models for many different types of problems. In addition to its applications as a quick and easy way to generate machine learning models for materials, automatminer can serve as a consistent benchmark against which to evaluate the predictive power of new methods.
9:15 AM - *GI01.05.05
JARVIS-ML—Physics Inspired AI for Fast and Accurate Screening of Materials
Francesca Tavazza1,Kamal Choudhary1,Brian DeCost1
National Institute of Standards and Technology1Show Abstract
One of the main difficulties in applying AI to Material Science is choosing effective descriptors for materials. In this work we developed a complete set of chemo-structural descriptors to significantly extend the applicability of machine-learning (ML) in material screening for multicomponent systems. These new descriptors allow differentiating between structural prototypes, which is not possible using the commonly used chemical-only descriptors. We developed ML models for formation energies, bandgaps, static refractive indices, magnetic properties, modulus of elasticity, k-point integration grid and plane-wave cutoffs for 3D materials as well as exfoliation energies of two-dimensional (2D) layered materials. We used a gradient boosting decision tree (GBDT) algorithm and the training data consisted of 24549 bulk and 616 monolayer materials taken from JARVIS-DFT database. JARVIS-ML allows to make on the fly prediction using machine learning models trained using our JARVIS-DFT database.
GI01.06: Automation of Materials Research—Synthesis and Characterization
Wednesday AM, April 24, 2019
PCC West, 100 Level, Room 102 C
10:15 AM - *GI01.06.01
Active Learning Driven Mapping of Combinatorial Libraries of Functional Materials
University of Maryland 1Show Abstract
Over the years, the challenges in the high-throughput combinatorial experimentation has evolved from synthesis of large number of disparate compounds to developing quantitatively accurate rapid characterization tools to analysis and digestion of large amount of data churned out by the methodology. To address the last challenge, we have been increasingly relying on machine learning techniques including pattern recognition within diffraction data to construct phase diagrams and mining experimental databases to look for trends in materials properties for future predictions. We have previously demonstrated on-the-fly analysis of synchrotron diffraction data, so that a rough picture of the structural phase diagram is attained immediately after all the measurements have been carried out. We are now developing techniques to let the algorithm dictate the sequence of experiments in order to maximize attainable knowledge, minimize experimental resources, and as a result further speed up the materials discovery procedure. In this active learning exercise, a Gaussian process is used to steer the mapping of combinatorial libraries. Examples of dynamic mapping of structural phase diagrams performed at a synchrotron beamline as well as with an in-house diffractometer with a variable temperature stage will be discussed. This work is performed in collaboration with A. Gilad Kusne, V. Stanev, A. Mehta, B. DeCost, J. Hattrick-Simpers, and Y. Liang, and it is funded by NIST and ONR.
10:45 AM - *GI01.06.02
Exploring the Materials Genome Through Nanomaterial Megalibraries
Northwestern University1Show Abstract
Throughout history, the materials we have used and rely on have evolved over time, slowly becoming more and more complex. The progression from the stone tools used by early-man to the polyelemental materials used today has been relatively slow due to the massive parameter space that materials encompass. Indeed, when one considers the 91 metal elements in the periodic table, and all possible combinations, a nearly infinite number of possible materials exist. This is particularly true at the nanoscale where small changes in size or shape, even at a fixed composition, can dramatically change a material’s properties. Computational methods, or data mining of the current materials genome, can narrow the parameter space to areas of interest for a specific reaction, but experimental methods for fabricating and analyzing these nanomaterials in a high throughput manner are still required, as they often exhibit properties different from their bulk-scale counterparts. In this presentation, an approach to combinatorial nanoscience relying on “megalibraries” consisting of as many as 5 billion positionally encoded nanoparticles will be described. The libraries can be tailored to encompass a wide variety of alloy and phase-separated nanoparticles that are comprised of as many as 8 different elements. Importantly, one megalibrary contains more new inorganic materials than scientists cumulatively have produced and characterized to date. From these libraries, important insight into how thermodynamic phases form in polyelemental nanoparticles has been obtained, and design rules for engineering heterostructures in a polyelemental nanoparticle have been established. Methods to use megalibararies to identify new materials and catalysts for important chemical transformations will be presented. The resulting data sets created by this platform are enormous and require new methods of analyzing them in order to decipher the implications of polyelemental nanomaterials for a wide range of applications. Therefore, this novel approach lays the foundation for creating an inflection point in the pace at which we both explore the breadth and discover the capabilities of the materials genome.
11:15 AM - *GI01.06.03
Generating the Largest Experimental Materials Database and Initial Findings on the Science It Enables
Joint Center for Artificial Photosynthesis, California Institute of Technology1Show Abstract
In an era of rapid advancement of algorithms that extract knowledge from data, the importance of data and metadata management is more important than ever. The largest and most well annotated solid state materials databases are based on computational results, which have enabled successful implementations of machine learning in materials research. Experimental materials databases for specific data types, such as powder diffraction patterns, have been developed, but the challenges in managing and linking data across disparate synthesis and characterization experiments have hampered establishment of large databases with heterogeneous data. The recently published HTEM (Zakutayev 2018) comprises a notable advancement in this area with its well-organized and searchable database of primarily optoelectronic and structural properties of thin film materials. We present a complementary effort with measurements based on photoelectrochemistry research using methods including XRD, XRF, Raman, UV-vis, and electrochemistry. By developing a lightweight data management framework that is generally applicable for experimental science and beyond, we have compiled 5 years of experiments to produce the Materials Experiment and Analysis Database. MEAD contains raw data and metadata from ~15 million experiments on ~1 million materials, as well as the analysis and distillation of that data into property and performance metrics via software in an accompanying open source repository. The unprecedented quantity and diversity of experimental data is searchable by experiment and analysis attributes generated by both researchers and data processing software. The search web interface allows users to visualize their search results and download zipped packages of data with full annotations of their lineage. As the world’s largest open source materials database, MEAD provides substantial challenges and opportunities for incorporating data science in the physical sciences, and the associated data and algorithm management framework will foster increased incorporation of automation and autonomous discovery in materials and chemistry research.
11:30 AM - GI01.06.04
Reversible Perovskite Electrocatalysts for Oxygen Reduction / Oxygen Evolution for Fuel Cells and Metal-Air Batteries
Brian Hayden1,Kieren Bradley2,Kyriakos Giagloglou2,Hugo Jungius2,Chris Vian2
University of Southampton1,Ilika Technologies2Show Abstract
The identification of electrocatalysts mediating both the oxygen reduction reaction (ORR) and oxygen evolution reaction (OER) are prerequisite for the development of reversible fuel cells and rechargeable metal-air batteries. The question remains as to whether a bifunctional catalyst, or a single catalyst site, will exhibit potentials converging to the equilibrium potential of +1.23VRHE. Transition metal-based perovskite provide tunable catalysts where site substitution can influence both the ORR and OER catalytic activity. However, substitution in the pseudo-binary phases results in an anti-correlation in ORR and OER activities. We reveal that for LaxMnyNi1-yO3-δ, compositions with lanthanum A-site sub-stoichiometry exhibits reversible activity correlating with the appearance of the Mn3+/Mn4+redox couple. The Mn3+/Mn4+couple is associated with Mn4+co-existing with Mn3+in the bulk, as La3+is substituted by Ni2+at the A-site to create a mixed valent system. We also show that a direct A-site substitution by the Ca2+cation in LaxCa1-xMnyO3-δperovskites also results in the creation of Mn4+, the appearance of the Mn3+/Mn4+redox couple, and a concomitant reversible activity. These results would only have been accessible with an effective combinatorial synthetic and comprehensive high throughput screening strategy, and highlight a general strategy of optimizing oxide electrocatalysts with reversible activity.
GI01.07: Accelerating Materials Research with Machine Learning II
Wednesday PM, April 24, 2019
PCC West, 100 Level, Room 102 C
1:30 PM - *GI01.07.01
Predicting Properties is not Enough—Realizing the Full Potential of Machine Learning in Materials Discovery
Citrine Informatics1Show Abstract
Much attention in the materials informatics community has centered on developing machine learning (ML) models to predict materials properties, based on training data derived from experiments and simulations. While such models can be helpful tools, they are (on their own) not sufficient to enable real-world materials discovery. Furthermore, the ongoing focus on property prediction creates the risk that we miss out on the value ML can provide across other critical aspects of the materials discovery process. In this talk, we discuss ML considerations beyond property prediction for materials discovery, including ML model trustworthiness and applicability, automated data analysis, and substitution of simulations or quick experiments for time-consuming, expensive characterization.
2:00 PM - GI01.07.02
Design of Molecules with High Hole Mobility by Applying Machine-Learning Technologies
Nobuyuki Matsuzawa1,Hideyuki Arai1,Masaru Sasago1,Eiji Fujii1,Erin Antono2,James Saal2
Panasonic Corporation1,Citrine Informatics2Show Abstract
Materials exhibiting higher mobilities than conventional organic semiconducting materials, such as fullerenes and fused thiophenes, are in high demand for applications such as printed electronics. For hole conducting materials, derivatives of benzothieno[3,2-b]benzothiophene are known to exhibit the highest hole mobility , yet their carrier transfer performance is still not satisfactory. The FUELS sequential learning framework  was used to explore new molecules that might show improved mobility using a machine-learning (ML) guided approach. An initial ML model was trained on a set of DFT calculated hole mobilities of molecules in the amorphous state, based on the percolation treatment as derived by Evans et al.  According to the sequential learning process, the ML model is used to identify promising candidate molecules, hole mobilities of the candidate molecules are then calculated by applying the DFT method, and the new data are added to the training set to improve the ML model. This iterative loop was performed for several tens of times. Through this process, over 1 million candidate structures were evaluated, and about 100 DFT mobility calculations were performed. New candidate molecules having a fused thiophene structure were identified which had DFT calculated mobility that exceeded the maximum DFT mobility used in the training data set by 25%. Further details will be presented on the extrapolative design of molecules with improved carrier transport property.
 K. Takimiya et al., Acc. Chem. Res. 47 (2014), 1493.
 J Ling, et al, Integrating Materials and Manufacturing Innovation, 6 (2017) 207.
 D.R. Evans et al., Org. Electronics, 26 (2016) 50.
2:15 PM - GI01.07.03
Machine Learning Electronic Transport Properties of Complex Semiconductor Architectures
Sanghamitra Neogi1,Artem Pimachev1
University of Colorado1Show Abstract
Modern semiconductor architectures enabling electronic, optical, sensing, robotic, bio-system or energy transport devices, compose of aggressive insertion of computing components to perform multitudes of data-centric operations at high rates. Advances of fabrication techniques have reached the sizes of quantum confinement regime, therefore making it possible to model electronic properties of device components with first principle calculations. The contact interfaces between these components dictate the device performance, especially as the device dimension approaches nanoscale. These interfaces are often marked by point defects, dislocations and additional strains due to lattice mismatch between the components. Ab initio methods become expensive and infeasible to predict electronic properties of integrated architectures with such a great number of compositional and configurational degrees of freedom. In recent years, there has been a large effort in the materials science community to employ data driven methods to accelerate materials discovery or to develop new understanding of materials behavior. However, the number of efforts, employing first-principle based data-driven methods to predict device performance incorporating processing variability, is almost non-existent.
In this study, we employ machine learning (ML) algorithms to predict electronic structure and transport properties of non-ideally fabricated multilayered thin film Si/Ge nanostructures. The ML model is trained on inexpensive ~200 DFT calculations of SixGe1-x substitutional alloys: the training data set is populated exploiting the relationship between geometrical features or local atomic environments in these systems and their electronic structure properties. The predictor variables are obtained with Voronoi tessellation approach and the response variables are calculated with the decision tree regression algorithm.  This approach has successfully predicted formation energies to expedite materials discovery.  Our ML model trained on random alloys, has shown remarkable ability to predict electronic band structures and Onsager electronic transport coefficients of large non-ideal thin film Si/Ge superlattices. We show the predictive power of our model by comparing the predicted band structures learned from small 16-atom alloy unitcells with the electronic states of large Si/Ge superlattices unfolded to 4x4 monolayer superlattice Brillouin zones . The ML framework has been especially effective in capturing crucial trends in electronic properties for a range of multilayered structures. Our ML framework will facilitate the development of inverse design approach to engineer interface profiles of integrated semiconductor architectures, to accomplish desired device performance and functionalities. The project is funded by the DARPA (DSO) MATRIX program. This work used XSEDE, which is supported by NSF grant number ACI-1053575.
 L. Breiman, Mach. Learn. 45, 5 (2001)
 L. Ward, R. Liu, A. Krishna, V. I. Hegde, A. Agrawal, A. Choudhary, and C. Wolverton, Phys. Rev. B 96, 024104 (2017)
 V. Popescu and A. Zunger, Phys. Rev. B 85, 085201 (2012)
GI01.08: Integrated Materials Research with Data-Driven Methods and Machine-Learning
Wednesday PM, April 24, 2019
PCC West, 100 Level, Room 102 C
3:30 PM - *GI01.08.01
Accelerating Materials Design and Discovery by Combing High-Throughput Computations, Experiments and Machine Learning
Toyota Research Institute1Show Abstract
Rapid discovery of functional materials is essential for efficient realization of several technologies. High-throughput computations and experiments have been successful in exploring select material spaces for specific properties. The relatively large scale materials datasets produced by these methods can potentially be used as input to machine learning or artificial intelligence methods to further accelerate materials discovery for wider material spaces and properties. In this talk, I shall discuss several examples of combining high-throughput materials databases with machine learning to accelerate materials discovery.
In one example, we demonstrate prediction of materials synthesis from the dynamics of the “materials stability network" constructed by combining the convex free-energy surface of inorganic materials computed by high-throughput density functional theory and their experimental discovery timelines extracted from citations. Wherein, machine learning methods are used to capture the time-evolution of the underlying network properties and predict the likelihood that hypothetical, computer-generated materials will be amenable to successful experimental synthesis.
In another example, we demonstrate prediction of wide range of material properties by applying deep learning methods that learn structure property relationships from open quantum materials database. Specifically, a novel neural network architecture that captures crystal structure information and is capable of predicting properties for each element in the structure will be dicussed. Further, ability to predict properties of materials from a new database via transfer learning will be demonstrated.
Finally, we demonstrate accelerated discovery of catalyst materials by combining active learning methods with high-throughput electrochemistry experiments. Wherein, active learning is a cyclic loop of training, prediction, selection, and acquisition that allows dynamic identification of the next best experiment. We demonstrate the role of selecting appropriate machine learning algorithms for training/prediction in accelerating the rate of identifying optimal materials. Specifically, we demonstrate the ability to identify highly active catalysts within multi-metal oxide systems at an order of magnitude faster rate than high-throughput experimentations, further accelerating rate of materials discovery
4:00 PM - *GI01.08.02
D3BATT—Data-Driven Design of Li-Ion Batteries
Peter Attia1,William Chueh1
Stanford University1Show Abstract
Lithium-ion batteries are complex electrochemical devices that spans multiple time and length scales. Such complexity presents challenges when engineering batteries. On a scientific level, establishing a physical picture that spans atomic, particle and device scales require integration of many types of data, physics/chemistry, and equations. On an engineering level, given the degrees of freedom, design of experiments can involve daunting number of permutations. The Center for Data-Driven Design of Li-Ion Batteries (D3BATT) at MIT, Stanford and Purdue aims to address these challenges by integrating a wide range of experiments, modeling, and data analytic approaches. In this talk, I will highlight the example of optimizing extreme fast charging, that is, charging batteries under ten minutes. We combined machine-learning and optimal experimental design methods to rapidly identify an effective extreme fast charging method on a time scale that is otherwise not possible.
4:30 PM - GI01.08.03
Sematic Segmentation of X-Ray Tomography and Serial Sectioning Images Using Convolutional Neural Networks
Tiberiu Stan1,Zachary Thompson1,Peter Voorhees1
Northwestern University1Show Abstract
Upon solidification, in nearly all cases from castings to additive manufacturing, metallic alloys freeze via the formation of dendrites. The tree-like microstructures are largely diverse and have a strong impact on the physical, chemical, and mechanical properties of the subsequent macroscale material. Machine learning was used to identify dendrites in two fundamentally different materials datasets: synchrotron-based time-resolved x-ray tomography images of dendrite growth in Al-Zn alloys, and serial sectioning images of dendrites in Pb-Sn alloys. The tomography and sectioning datasets have unique artifacts that hinder conventional segmentation techniques such as Otsu’s method or Canny edge detection. Thus, SegNet based convolutional neural networks (NNs) were trained to perform semantic segmentation. Using only 30 2D tomography images and 6 serial section images for training, we show that NNs perform better if the training images are split into smaller sections. The NNs trained on tomography data achieved >99% pixel-wise global accuracy and >95% BF1 class boundary accuracy. The NNs are also tested on diverse datasets including: Al-Cu solid-liquid mixtures from 4D tomography datasets, Sn-Pb sectioning data with low dendrite volume fractions, and x-ray tomographic images of feline spinal cords. Ways to increase NN performance using limited training data, general best practice NN training methods, and NN transferability are discussed.
4:45 PM - GI01.08.04
Segmentation in 3D Atom Probe Tomography Using Deep Learning-Based Edge Detection
Sandeep Madireddy1,Ding-Wen Chung2,Troy Loeffler1,Olle Heinonen1,3,Prasanna Balaprakash1,David Seidman2
Argonne National Laboratory1,Northwestern University2,Argonne National laboratory3Show Abstract
Atom probe tomography (APT) facilitates nano- and atomic-scale characterization and analysis of microstructural features. APT is well suited to study the interfacial properties of granular or heterophase systems. Traditionally, the identification of the interface between, e.g., precipitate and matrix phases, in APT data has been obtained by extracting iso-concentration surfaces. These surfaces are constructed based on the marching cubes algorithm, which extracts an iso-surface from a discrete scalar field with a user-supplied concentration value, or by manually perturbing the concentration value until the iso-surface qualitatively matches the interface. These approaches are rather subjective, not scalable, and may lead to inconsistencies due to local composition inhomogeneities.
We propose a digital image segmentation approach based on the holistically-nested edge detection (HED), an end-to-end edge detection approach that performs image-to-image prediction (i.e.,takes an image as input, and outputs the prediction at each pixel). This is obtained using by means of a deep learning model that leverages fully convolutional networks (FCN) and deeply supervised nets. A key challenge in using deep learning approach for our task is lack of large amount of ground truth segmentation data for training. We mitigate this by adopting a transfer-learning approach, where the weights in the convolution layers are initialized using a VGGNet model pre-trained on Imagenet data, and then training the HED model on the Berkeley Segmentation Dataset and Benchmark (BSDS500) dataset; both of these datasets consists of annotated natural images. The trained HED model is then used to automatically segment the data obtained from APT into different phases. Thus, this approach not only provides an efficient way to segment the data and extract interfacial properties, but also does so without the need for expensive interface labeling for training the segmentation model.
The APT data are prepared for segmentation by converting them from a 3D point cloud to a regular voxel grid composed of relative atomic concentrations of species. The trained HED model is used for interface detection on this 3D concentration space by extracting 2D slices of it in each of three orthogonal directions, detecting the 2D edges on them, and then merging all of them to obtain an edge map in 3D. The obtained edge map serves as the interfacial surface between the two phases.
We demonstrate the proposed segmentation approach using three material systems with inclusions of a precipitate phase in a matrix, each with different interface modality (layered, isolated, and interconnected, respectively), that correspond to different relative geometries of the precipitate and matrix phases. We demonstrate the accuracy of our segmentation approach through qualitative visualization of the interfaces as well as through quantitative comparisons with proximity histograms obtained using traditional approaches. We also note that the edge detection on multiple 2D slices and hence the 3D edge map extracted using the trained HED model for each of these cases is near real-time, taking only a few seconds.
Our approach demonstrates the power of machine learning techniques in the analyses of APT data. It should also be readily applicable to analysis of other tomographic data of multi-phase or multi-grain systems, such as X-ray tomography of alloy systems, or transition electron microscopy tomography. By using transfer learning the fully convolutional network can be trained in advance of experiments, and be applied in real time during the conduction of experiments. This may be especially valuable not only for 3D APT and transmission electron microscopy tomography but also for X-ray tomography, where rapid data analysis during the experiment may provide valuable real-time feedback to the experiment.