Symposium Organizers
Kristofer Reyes, University at Buffalo
John J. Boeckl, Air Force Research Laboratory
Keith A. Brown, Boston University
Stephane Gorsse, Bordeaux INP · ENSCBP
GI01.01: Data Driven Design and Inference for Materials Systems
Session Chairs
John J. Boeckl
Cindy Bowers
Kristofer Reyes
Monday PM, November 26, 2018
Hynes, Level 1, Room 110
8:30 AM - *GI01.01.01
Informatics Driven Design of Chemically Complex Materials
Krishna Rajan1
State University of New York at Buffalo1
Show AbstractThe aim of this The aim of this presentation is to show how Material Informatics can transform the existing paradigm of accelerated materials discovery and multiscale design. We discuss ways of approaching materials design that searches for the best pathways for future discoveries and not be limited to only searching among massive data libraries built on high throughput computation and /or experiments. We show that these pathways help to bridge the gap between fundamental materials properties and structure and materials performance. This presentation will focus on how data science methods can discover new pathways for the chemical design of glasses/ ceramics as well as uncover hidden information in molecular scale structure of amorphous solids.
9:00 AM - GI01.01.02
Predicting Colloidal Crystals from Shapes via Inverse Design and Machine Learning
Yina Geng1,Greg Anders1,Sharon Glotzer1
University of Michigan1
Show AbstractA fundamental challenge in materials design is linking building block attributes to crystal structure. Addressing this challenge is particularly difficult for systems that exhibit emergent order, such as entropy-stabilized colloidal crystals. We combine recently developed techniques in inverse design with machine learning to construct a model that correctly classifies the crystals of more than ten thousand polyhedral shapes into 13 different structures with a predictive accuracy of 96% using only two geometric shape measures. With three measures, 98% accuracy is achieved. We test our model on previously reported colloidal crystal structures for 71 symmetric polyhedra and obtain 92% accuracy. Our findings (1) demonstrate that entropic colloidal crystals are controlled by surprisingly few parameters, (2) provide a quantitative model to predict these crystals solely from the geometry of their building blocks, and (3) suggest a prediction paradigm that easily generalizes to other self-assembled materials.
9:15 AM - GI01.01.03
Machine Learning Augmented Polymer Design—Mapping the Size and Composition of Poly(Oxazolines) to Cloud Points
Jatin Kumar1,Karen Tang1,Qianxiao Li2,Anibal Gonzalez Oyarce2,Jun Ye2
Institute of Materials Research & Engineer1,Institute of High Performance Computing2
Show AbstractIt has been long suggested that present synthetic techniques of polymers afford an excessive strain on resources, and efforts should be undertaken to devise new methods to predictively design polymers based on their desired properties.[1] One of such properties include phase behavior. Phase behavior is a unique characteristic of polymers which afford interesting assemblies and microstructure of polymer solutions, blends and films,[2] which are in turn influenced by the four key fundamentals of polymer architecture: (a) topology; (b) molecular weight; (c) composition; (d) functionality. Efforts were made by Hoogenboom and co-workers[3],[4] to systematically vary composition and molecular weight for poly(oxazoline)s so as to ascertain their relationship with cloud points – a temperature dependent phase behavior. A 3D curve of best fit was obtained correlating 2 of the numerous design parameters to cloud point.
However, a 3D curve is an inaccurate correlation as there is no model that suitably fits the data, resulting in a poor fit with a loss of detail. Most importantly, the 3D curve does not cover all the parameters and is incapable of identifying the feature importance of the design parameters. In this paper, we will demonstrate the use of modern machine learning techniques which are able to correlate the various parameters describing polymer architecture, to its associated cloud point. We will demonstrate how the algorithm was trained using polymers synthesized at varying levels of precision. Finally we will demonstrate the attempt towards building an inverse design problem capable of designing a poly(oxazoline) architecture based on a required cloud point.
[1] Nature, 2016, 536, 266-268, doi:10.1038/536266a
[2] Nature Mat, 2010, 9, 101-113, doi:10.1038/nmat2614
[3] Chem Commun, 2008, 44, 5758-5760, doi:10.1039/B813140F
[4] Polym Chem, 2014, 52, 3118-3122, doi: 10.1002/pola.27364
9:30 AM - *GI01.01.04
Accelerating Additive Manufacturing via Synergy of Machine Design, Process Control and Toolpath Optimization
A. John Hart1
Massachusetts Institute of Technology1
Show AbstractThe true potential of additive manufacturing (AM) as a next-generation production technology will only be realized by synergy of automated machinery with predictive models of process and material performance. I will discuss our recent efforts to improve the performance of extrusion AM technology, including the design of a desktop system that achieves ~10X greater build rate than commercial benchmarks, implementation of feed-forward motion control algorithms to mitigate trajectory errors, and development of graph-based toolpath planning algorithms enabling continuous printing of lattice structures. In-process thermal imaging is used to calibrate toolpath planning algorithms for large-scale extrusion AM, mitigating deformation and strength loss due to local thermal gradients during printing. I will close with commentary on the opportunities and challenges toward autonomous manufacturing systems wherein AM is a cornerstone, and machine learning is an essential, yet currently primitive, component.
10:30 AM - *GI01.01.05
Accelerating the Discovery of New Energy Materials
Brian Storey1
Toyota Research Institute1
Show AbstractToyota Research Institute (TRI) was founded in order to develop technology for a new world of mobility - with an overall mission to use artificial intelligence (AI) to improve the quality of human life. In the coming decades, our society will continually need better and more environmentally sustainable sources of energy storage for mobility and transportation. At TRI, the Accelerated Materials Design and Discovery (AMDD) program will accelerate discovery of energy materials through new tools, technologies, and methodologies that leverage advances in AI. The research program consists of efforts to 1) create extensive and dynamic databases of materials knowledge via high-throughput computing, high-throughput experimentation, and mining of existing literature data, 2) develop software tools to aid the human expert in extracting knowledge from large databases and 3) develop novel automated materials discovery systems by integrating simulation, machine learning, artificial intelligence, and robotics. The AMDD program is comprised of our research scientists, software engineers, and an extensive network of funded collaborators at a number of universities and research institutions. This talk will describe the vision of our research program, progress to date, challenges and early successes in applying AI to materials research for the discovery of catalysts and battery materials, and future opportunities at the intersection of materials and computer science.
11:00 AM - GI01.01.06
Damage Nucleation and Growth Prediction from Microstructure Spatial Characteristics
Benjamin Cameron1,C. Cem Tasan1
Massachusetts Institute of Technology1
Show AbstractThe vast compositional space of metallic materials provides ample opportunity to design stronger, more ductile and cheaper alloys. However, the substantial complexity surrounding microstructural deformation makes simulating new microstructures exceedingly difficult. Instead, tedious experiments are conducted to optimize properties without tools to predict how microstructures will perform. Here, we exploit recent developments in data collection capabilities and develop a purely empirical model to forecast microstructural performance in advance, entirely bypassing the challenges associated with physics based simulations.
In-situ experiments are conducted using a scanning electron microscope in order obtain detailed microstructural information. 6400 12.5 μm square microstructural regions are tracked using a quasi digital image correlation algorithm and imaged throughout deformation in order to obtain sufficient data to train the model. Detailed analysis is conducted on three experimental datasets, two of dual-phase steel and one of bearing steel. The models are trained to predict crack nucleation and crack growth, a complex process with an intricate relationship to microstructural geometry. Multiple machine learning approaches are compared and contrasted including a n-point statistics and principle component analysis algorithm based on grain boundaries, a support vector machine approach, and a convolutional neural network.
Using this approach, significant predictive ability is achieved and the model can accurately predict crack nucleation and crack size on a new microstructure. For example, 84.8% of microstructures predicted to crack, actually crack. Artificial microstructures with idealized properties are constructed and fed into the model in order to understand how it makes its prediction. For example, there is a major difference in cracking probability for grain boundaries aligned with the loading direction (12%) and inclined at 45° to the loading direction (72%). These results demonstrate the success of this approach and paves the pathway for use of this framework in alloy design to predict a range of behaviors.
11:15 AM - GI01.01.07
Extraction of Capacitance Variance Mechanisms via a Neural-Net Accelerated Genetic Algorithm
Venkatesh Meenakshisundaram1,2,David Yoo1,2,Andrew Gillman1,2,Clare Mahoney1,2,James Deneault1,3,Nicholas Glavin1,Phil Buskohl1
Air Force Research Laboratory1,UES, Inc.2,UTC3
Show AbstractMicroscale spatial and material heterogeneities in 3D printed electrical devices present significant challenges to predictable electrical performance and device reliability. High permittivity particles are often added to dielectric inks to tailor the macro level permittivity of printed dielectric substrates and coatings. However the combined role of particle morphology, discrete spatial arrangement and material properties on variance is difficult to distinguish experimentally, due to the large parameter space of processing variables and electrical sensitivity to local heterogeneities. To address this challenge, we developed a 3D, high-fidelity finite element model of an interdigitated capacitor (IDC) with a dielectric coating of distinct spatial distributions of discrete dielectric particles. The volume fraction, particle size and permittivity distributions for each IDC was then encoded into the genome of a genetic algorithm (GA) with the objective of identifying combinations of these properties that generate high variance in the capacitance of the IDC. In addition, the genetic algorithm was guided by an artificial neural network (ANNs) that recursively trains on the data produced by GA and sends suggestions back to the GA to accelerate the identification of optimized properties. This autonomous and robust optimization approach allows us to build a database with strong likelihood of high variance mechanisms, while saving on computational costs. Preliminary results indicate that large particles in dilute volume fractions generate high variability in capacitance, due to heighten influence these particles positions on excluding the electric field. Classification-based machine learning techniques were also applied to the spatial distributions of the dielectric particles to determine whether a critical spacing parameter or local clustering was a component in driving up the variance. Collectively, the study provides a useful framework to correlate electrical performance with both macro- and microstructural variation sources, which is key to accelerating the development of 3D printing materials.
11:30 AM - GI01.01.08
Machine Learned Defect Level Predictor for Semiconductors—The Example of Hybrid Perovskites
Arun Kumar Mannodi Kanakkithodi1,Michael Davis1,Maria Chan1
Argonne National Laboratory1
Show AbstractElectronic levels introduced by impurities and defects in the middle of the band gap are critically important in semiconductors for optoelectronic, photovoltaic (PV) and quantum sensing applications. While “deep” defect levels can prove catastrophic for PV performance by causing non-radiative carrier recombination[1], impurity levels in the band gap could also be entangled for quantum sensing or lead to increased absorption of sub-gap photons which can enhance efficiencies[2]. Predicting formation energies and charge transition levels for defects in semiconductors is thus paramount; density functional theory (DFT) calculations have been widely applied for such studies[3] to overcome experimental bottlenecks. However, the requirement of large supercells and inclusion of charged states make these computations very expensive, and trends and knowledge from previous calculations are not exploited in subsequent ones. In this work, we use DFT to generate a substantial computational dataset of the formation energies and transition levels of vacancy, interstitial and substitutional point defects in many lead-based pure and mixed halide hybrid perovskites (MAPbX3, where MA = methylammonium and X = Cl/Br/I)[4]. These computations help determine the dominant intrinsic defects in any perovskite, the equilibrium Fermi level (and thus the nature of conductivity) for different chemical potential conditions, and the stable extrinsic substituents (atoms selected by screening across the periodic table) that compensate for intrinsic defects and change the equilibrium Fermi level. Further, we apply machine learning to extract crucial design rules and predictive models from the data, via the intermediate process of numerically representing any defect in terms of easily calculated structural and electronic properties of defect atoms, their known compounds and their hypothetical 12 atom perovskite unit cells. Using coefficients of linear correlation and popular regression techniques like ridge regression and LASSO, we obtain accurate predictive models for formation energy and every relevant transition level that require only a handful of descriptors as input. For example, the 0/-1 transition level of any substitution defect can be predicted using only the bond length and band edge information from an inexpensive unit cell calculation, bringing down computational cost by nearly two orders of magnitude, while simultaneously providing physical insights for the origin of these defect levels. The machine learning results derived here can lead to accelerated prediction of defect states in lead-based hybrid perovskites and allow efficient materials design of defect-tolerant semiconductors as well as semiconductors with suitably placed defect levels.
REFERENCES
[1] Y. Yan et al., Springer IP. 79-105 (2016).
[2] M.D. Sampson et al., J. Mater. Chem. A. 5, 3578 (2017).
[3] Freysoldt et al., Rev. Mod. Phys. 86, 253 (2014).
[4] T. Shi et al., Appl. Phys. Lett. 106, 103902 (2015).
11:45 AM - GI01.01.09
Interplay of Electronic and Structural Features in the Prediction of Organic Solar Cells Efficiency
Daniele Padula1,Jack Simpson1,Alessandro Troisi1
University of Liverpool1
Show AbstractIn this contribution we present an approach to predict Organic Solar Cells Efficiency based on Machine Learning.
We present a data set of small molecule donor-acceptor pairs gathered from the literature (between 2013 and 2017), for which equilibrium geometries and electronic properties have been computed at DFT level. We used electronic data in combination with Scharber's model to calculate photovoltaic parameters of the organic solar cells. Comparison with experimental data reveals a disappointing performance of DFT. It has been shown that DFT data can be refined through Machine Learning to improve the performance of Scharber's model to predict experimental photovoltaic parameters.
In this contribution, we adopt a similar procedure to predict photovoltaic parameters from structural information only, expressed in terms of molecular fingerprints. This allows to bypass a high number of DFT calculations, obtaining similar results to those reported in the literature and based on electronic data exclusively.
Finally, with a similar approach, we obtain direct predictions of solar cell efficiencies. We show that considering only electronic or structural parameters leads again to similar results, while considering both parameters at the same time results in improved predictions, allowing to obtain direct predictions of solar cells efficiency with a reasonable level of accuracy.
Finally, we critically assess the usefulness of the proposed approach for the discovery of new materials that have not been synthesized yet.
GI01.02: High-Throughput Methods for Enumerating and Searching Combinatorial Spaces I
Session Chairs
John J. Boeckl
Cindy Bowers
Brian Storey
Monday PM, November 26, 2018
Hynes, Level 1, Room 110
1:30 PM - *GI01.02.01
Quantum Machine Learning in Chemical Space
Anatole von Lilienfeld1
University of Basel1
Show AbstractMany of the most relevant chemical properties of matter depend explicitly on atomistic and electronic details, rendering a first principles approach to chemistry mandatory. Alas, even when using high-performance computers, brute force high-throughput screening of compounds is beyond any capacity for all but the simplest systems and properties due to the combinatorial nature of chemical space, i.e. all compositional, constitutional, and conformational isomers. Consequently, efficient exploration algorithms need to exploit all implicit redundancies present in chemical space. I will discuss recently developed machine learning approaches for interpolating quantum mechanical observables in compositional and constitutional space. Results for our models indicate remarkable performance in terms of accuracy, speed, universality, and size scalability.
2:00 PM - GI01.02.02
Landscape of Phosphorescent Light-Emitting Energies of Homoleptic Ir(III)-Complexes Predicted by a Graph-Based Enumeration and Deep Learning
Jiho Yoo1,Inkoo Kim1,Dongseon Lee1,Youngmin Nam1,Kyungdoc Kim1,Youngchun Kwon1,Yongsik Jung1,Hasup Lee1,Jhunmo Son1,Youn-Suk Choi1,Sunghan Kim1,Hyo Sug Lee1,Jaikwang Shin1
Samsung Advanced Institute of Technology (SAIT)1
Show AbstractPhosphorescent organic light emitting diodes (OLEDs) have been paid attention due to their theoretical internal quantum efficiency of nearly 100% compared with just 25% in fluorescent OLEDs. For the commercialization of phosphorescent OLED, the discovery of promising organometallic complexes as phosphorescent emitters is critical not only to facilitate the intersystem crossing between singlet and triplet states but also to secure sufficient durability. Although many studies have been conducted to find potent emitters for red, green, and blue colors, they still depend on prior knowledge about specific coordinating ligand structures and their atomic composition regarding the target photophysical properties. And the overall potential and limits of candidate materials in feasible chemical space have not yet been figured out through a systematic investigation. To answer this and to shed a light on the energy landscape of the emitters, we have built a complete library of all possible core structures of homoleptic Iridium(III)-complexes, which are representative phosphorescent OLED dopants possessing a short triplet lifetime, good thermal stability, and high quantum yields, by devising a novel graph-based enumeration method. While each atom is treated as a node and each bond as an edge in conventional graph-based molecular generation, ring structure was considered as a node and the fusion between the rings as an edge in our case since the cores of cyclometallated complexes consist of polycyclic rings attached to the metal atom. By referring to the mathematically established planar graphs, entire polycyclic carbon ring structures were enumerated. And further substitution of the carbons with up to five nitrogen atoms generated about 10 million molecular structures consisting of 3-5 rings. To examine the distribution of light emitting wavelengths, T1 energy level for each entry was predicted with the help of a deep neural network (DNN) model. The iterative execution of first-principle excited state calculations for carefully selected set of molecules (5.6×104) followed by an update of the DNN model allowed accurate and efficient prediction of the triplet energies with a mean absolute error of 0.061 eV, and we successfully identified subsets of candidate structures in terms of spectral ranges. For example, the number of molecules in blue color regime accounted for only 1.04% of the total compared with 22.4% in red and green regimes. We expect that the above database can serve as a complete baseline for the development of high-performance OLED emitters. And the proposed graph-based enumeration would be a versatile tool for systematic exploration of whole chemical space in various organic materials developments.
2:15 PM - GI01.02.03
Predicting Hydrogen Storage in Half-a-Million MOFs via Machine Learning
Alauddin Ahmed1,Donald Siegel1
University of Michigan, Ann Arbor1
Show AbstractMetal-organic frameworks (MOFs) are promising solid-state adsorbents thanks to their high gravimetric capacities. However, realizing a high volumetric H2 adsorption capacity, balanced with a high gravimetric density, is one of the main barriers of the successful application of MOFs as solid-state adsorbents. A large database of half-a-million MOFs consisting of real and hypothetical compounds was compiled and screened using semi-empirical and atomistic (grand canonical Monte Carlo) techniques. Several machine learning (ML) algorithms were benchmarked for their ability to predict hydrogen storage in MOFs at multiple conditions. The top performing algorithm, extremely randomized trees, was applied to rapidly identify MOFs with high usable H2 storage capacities across the entire database. A combinatorial approach was then used to understand the importance of crystallographic properties and training set size. This approach identifies the number and combination of crystallographic features needed to achieve the most accurate predictions. Finally, we assess multilinear regression models for their ability to out-perform the well-known Chahine rule for predicting H2 uptake.
2:30 PM - *GI01.02.04
Data-Driven Materials Design Using the Materials Project
Kristin Persson1,2
University of California, Berkeley1,Lawrence Berkeley National Laboratory2
Show AbstractThe tremendous improvements in computational resources, coupled with software development during the last decades, real materials properties can now be calculated from quantum mechanics – much faster than they can be measured. A result of this paradigm change are databases like the Materials Project (www.materialsproject.org) which is harnessing the power of supercomputing together with state of the art quantum mechanical theory to compute the properties of all known inorganic materials and beyond, design novel materials and offer the data for free to the community together with online analysis and design algorithms. The software infrastructure carries out thousands of calculations per week – enabling screening, predictions, characterization and even synthesis suggestions - for both novel solid as well as molecular species with target properties. This growing body of data has finally reached the stage where automated learning algorithms can be effectively trained and utilized to accelerate analyses. To exemplify the approach of data-driven materials design, we will survey a few case studies – from prediction, to synthesis and characterization - showcasing rapid iteration between ideas, computations, insight and new materials development.
3:30 PM - GI01.02.05
Functional Defects by Design—A High-Throughput Approach to Energy Materials Discovery
Panchapakesan Ganesh1,Janakiraman Balachandran1,Jilai Ding1,2,Xiahan Sang1,Wei Guo1,Shreyas Muralidharan1,Jonathan Anchell1,Gabriel Veith1,Craig Bridges1,Yongqiang Cheng1,Christopher Rouleau1,Jonathan Poplawsky1,Lianshan Lin1,Nazanin Bassiri-Gharb2,Raymond Unocic1
Oak Ridge National Laboratory1,Georgia Institute of Technology2
Show AbstractDefects and impurities introduce localized heterogeneities in solids and decisively control the behavior of a wide range of energy technologies. Fuel cell materials, especially proton conducting fuel cells, are a quintessential example in this regard. Designing and developing solid oxide materials that can selectively transport protons will enable us to develop the next generation proton conducting solid oxide fuel cells. Protons require less activation energy compared to oxygen ions which results in a lower operating temperature, higher operating efficiency and better material reliability. In this work [1,2,3,4] we focus obtaining fundamental insights on how properties of host material structure along with dopants, disorder and strain influences proton transport properties in solid oxides by coupling high-throughput computations with functional imaging, neutron spectroscopy and transport measurements.
We initially focus on the perovskite family of compounds (such as doped BaZrO3). We benchmark our calculations against a wide range of experimental measurements such as kelvin probe force microscopy (KPFM), inelastic neutron scattering (INS) and atom probe tomography (APT). To obtain better insights on why certain cubic perovskite/dopant combinations are better at conducting protons compared to others, we developed a high-throughput framework to perform ab initio calculations. The high-throughput framework can scale massively to tens of thousands of nodes to fully exploit the computational capability of Titan at OLCF supercomputing facility. We employ this approach to calculate proton transport properties in several cubic perovskite materials with different host atoms and dopants. The results obtained from these calculations enables us to obtain better insights on how material structure – such as atomic properties (electronegativity, ionic radius) and lattice properties (sub-lattice distortion) influences proton transport. The results obtained from this high-throughput analysis is being employed to develop a machine learning framework to predict structure-property correlations on a larger set of perovskites materials. Finally, we explore the role of disorder on proton transport by studying for example fluorite based lanthanum tungstate materials.
[1] “Defect Genome of Cubic Perovskites for Fuel Cell Applications”, Journal of Physical Chemistry C, 121, 26637 (2017)
[2] “The Influence of Local Distortions on Proton Mobility in Acceptor Doped Perovskites”, (Chemistry of Materials, accepted)
[3]“The Influence of the Local Structure on Proton Transport in a Solid Oxide Proton Conductor La0.8Ba1.2GaO3.9”, J. Mat. Chem. A, 5, 15507 (2017)
[4] “Influence of Non-Stoichiometry on Proton Conductivity in Thin Film Yttrium-doped Barium Zirconate”, ACS Appl. Mater. Interfaces, 2018, 10 (5), pp 4816–4823
3:45 PM - GI01.02.06
Polymer Genome—A Data-Powered Polymer Informatics Platform for Property Predictions
Chiho Kim1,Anand Chandrasekaran1,Huan Tran2,Deya Das1,Rampi Ramprasad1
Georgia Institute of Technology1,University of Connecticut2
Show AbstractThe recent successes of the Materials Genome Initiative has opened up new opportunities for data-centric informatics approaches in several subfields of materials research, including in polymer science and engineering. Polymers, being inexpensive and possessing a broad range of tunable properties, are widespread in many technological applications. The vast chemical and morphological complexity of polymers though gives rise to challenges in the rational discovery of new materials for specific applications. The nascent field of polymer informatics seeks to provide tools and pathways for accelerated property prediction (and materials design) via surrogate machine learning models built on reliable past data. We have carefully accumulated a dataset of organic polymers whose properties were obtained either computationally (bandgap, dielectric constant, refractive index and atomization energy) or experimentally (glass transition temperature, solubility parameter and density). A fingerprinting scheme that captures atomistic to morphological structural features was developed to numerically represent the polymers. Machine learning models were then trained by mapping the fingerprints (or features) to properties. Once developed, these models can rapidly predict properties of new polymers (within the same chemical class as the parent dataset) and can also provide uncertainties underlying the predictions. Since different properties depend on different length-scale features, the prediction models were built on an optimized set of features for each individual property. Furthermore, these models are incorporated in a user friendly online platform named Polymer Genome (www.polymergenome.org). Systematic and progressive expansion of both chemical and property spaces are planned to extend the applicability of Polymer Genome to a wide range of technological domains.
4:00 PM - *GI01.02.07
High-Throughput Materials Discovery and Development—Breakthroughs and Challenges in the Mapping of the Materials Genome
Ilaria Siloi1,Marco Buongiorno Nardelli1
University of North Texas1
Show AbstractHigh-Throughput Quantum-Mechanics computation of materials properties by ab initio methods has become the foundation of an effective approach to materials design, discovery and characterization. This data driven approach to materials science currently presents the most promising path to the development of advanced technological materials that could solve or mitigate important social and economic challenges of the 21st century. In particular, the rapid proliferation of computational data on materials properties presents the possibility to complement and extend materials property databases where the experimental data is lacking and difficult to obtain.
Enhanced repositories such as AFLOWLIB open novel opportunities for structure discovery and optimization, including uncovering of unsuspected compounds, metastable structures and correlations between various properties. The practical realization of these opportunities depends almost exclusively on the the design of efficient algorithms for electronic structure simulations of realistic material systems beyond the limitations of the current standard theories. In this talk, I will review recent progress in theoretical and computational tools for data generation and advanced characterization, and in particular, discuss the development and validation of novel functionals within Density Functional Theory and of local basis representations for effective ab-initio tight-binding schemes.
4:30 PM - GI01.02.08
Feature-Based Data Analysis for Localized Characterization of Electroactive Materials
Nikolay Borodinov1,Anton Ievlev1,Jan-Michael Carrillo1,Andrea Calamari2,Marc Mamak2,John Mulcahy2,Gabe Velarde3,Joshua Agar3,Lane Martin3,Bobby Sumpter1,Sergei Kalinin1,Olga Ovchinnikova1,Petro Maksymovych1
Oak Ridge National Laboratory1,Procter & Gamble2,University of California Berkeley3
Show AbstractEffective research and development efforts in the field of electroactive materials are critically dependent of the characterization methods that can generate information about the physical properties at the relevant scale. When the scope of these efforts includes engineering materials, fundamentally new challenges arise as the intrinsic complexity of industrial-grade samples requires decoupling of the observed phenomena. The characterization of the materials response in this case is impossible to be directly reflected using a single parameter. In order to tackle this challenge, we adopted a different strategy which included expanded representation of the analyzed materials with subsequent multivariate analysis. Its results were used to extract relevant features that may be intrinsic to a specific composition or instead may span across the sample library. We demonstrate this feature-centric approach for the rapid identification of the materials ability to develop triboelectric charge, observation of the electric charge migration on industrial-grade polyethylene terephthalate (PET) samples and property-driven analysis of effective piezoelectric coefficients observed across PZT-PZO (lead zirconate titanate - lead zirconate) phase diagram. We have employed multivariate adaptive regression splines, principal component analysis, large-scale molecular dynamics simulations, finite element analysis, multivariate Naïve Bayes classifier as parts of our method. Resulting insights allow for the isolation of the common features of the dataset reflecting specific piezoelectric and electrostatic phenomena as well as identification of the samples satisfying certain criteria. We believe that feature-base analytics demonstrated in this work can be successfully applied in other materials science fields where the nanoscale behavior of the relevant materials needs to the understood.
References:
1) Nikolay Borodinov et al, Probing static discharge of polymer surfaces with nanoscale resolution, available at arxiv.org/abs/1806.05169
This research was carried out at the Center for Nanophase Materials Sciences, a US Department of Energy Office of Science User Facility. The scope of the work was under a CRADA between Proctor & Gamble Co. and Oak Ridge National Laboratory. This research also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
4:45 PM - GI01.02.09
The Search for P-Type Transparent Conducting Chalcogenides
Rachel Woods-Robinson1,Shyam Dwaraknath1,Kristin Persson1
Lawrence Berkeley National Laboratory1
Show AbstractDespite our increasing demand for renewable energy materials over the past decade, as well as recent advances in photovoltaics and transparent electronics, progress in the field of high-performing p-type transparent conductors (TCs) has been relatively slow. All TCs used in industry are still n-type, since state-of-the-art n-type TCs have figures of merit orders of magnitude higher than p-type analogues due to the localized, 2p character of valence bands and doping difficulties in wide-gap oxides. Both n-type and p-type TCs are conventionally oxides but, due to energetics of the chalcogen 2p orbitals, chalcogenide (S, Se, Te) semiconductors have shown promise of higher hole transport and greater p-type doping propensity than oxides (in exchange for a decreased band gap).
Here, we use high throughput computation to screen a large database of over 10,000 chalcogenide compounds for binary, ternary, and quaternary chalcogenides likely to have a high hole conductivity and high optical transparency. The first round of screening uses GGA and GGA+U functionals from the Materials Project database to calculate band gaps and hole effective mass derived from the BoltzTraP code. Refined screenings use additional transport and HSE calculations. Our ultimate goal is to synthesize promising compounds in the laboratory for use in stable devices, so we also apply a proxy for structural thermodynamic stability.
From these criteria, we discover a large set of computationally stable multi-anionic compounds. Several compounds studied previously as TCs emerge from our screening, including ZnS and sulvanites TaCu3X4 (X = S, Se, Te). We further pare down this list for synthesis by selecting only single anionic compounds, removing compounds with toxic and highly reactive elements, and estimating p-type dopability. A refined list of ten top experimentally-favorable candidates emerges and includes spinel ZnAl2S4, distorted rocksalt BaSnS2, and several other rocksalt structures.
We delve deeper into the bonding characteristics of the valence band that give each predicted candidate its low calculated effective mass, and discuss these computed structures and our screening metrics in the context of state-of-the-art p-type and n-type transparent conductors. Additionally, we discuss defect calculations and dopant selection for the most promising structures, present our initial attempts to combinatorially synthesize and characterize a few of the candidates, and lay out a roadmap for a future high-throughput screening, synthesis, characterization, and device paradigm for new p-type transparent conducting chalcogenides.
Symposium Organizers
Kristofer Reyes, University at Buffalo
John J. Boeckl, Air Force Research Laboratory
Keith A. Brown, Boston University
Stephane Gorsse, Bordeaux INP · ENSCBP
GI01.03: Heterogeneous Domain and Prior Knowledge Representation, Extraction and Utilization
Session Chairs
Keith A. Brown
Aldair Gongora
Benji Maruyama
Tuesday AM, November 27, 2018
Hynes, Level 1, Room 110
8:30 AM - *GI01.03.01
The Role of Theoretical Prediction in Data-Driven Materials Development
Ruth Pachter1
Air Force Research Laboratory1
Show AbstractDevelopment of two-dimensional materials, including graphene and 2D transition metal dichalcogenides (TMDs), has greatly advanced. For example, engineered defective structures are of interest, where graphene meshes were used for sensing, while defective monolayer TMDs exhibited single-photon emission, potentially useful for quantum information processing. Here, we first discuss the theoretical prediction of defect-induced Raman signatures in graphene by a combined first principles-tight binding approach, to be integrated within a data-driven experimental system for defect engineering. Our predictions of D and D’ Raman band intensities demonstrated that it is possible to distinguish between defect types, which also assists in characterization of realistic materials. Next, we note that to overcome limitations in photon-extraction efficiency and integration within photonic circuits when using bulk solid-state materials, the 2D geometry emerged as potentially useful as a single-photon source. In aiming to achieve localized excitons in monolayer WSe2 due to defects, we found theoretically that calculated defect excitons red-shift significantly for experimentally observed patterns. However, although this computational analysis will inspire experimental defect engineering, materials informatics could uncover further improvements. Yet, although development of databases of 2D materials beyond graphene is progressing, so far the focus is on discovery of materials with low exfoliation energies. We provide a perspective on database development and machine learning for 2D materials with defects.
9:00 AM - GI01.03.02
Rapid Screening of Potential Inorganic Scintillator Chemistries Using Physics-Informed Machine Learning
Ghanshyam Pilania1,Kenneth McClellan1,Chris Stanek1,Blas Uberuaga1
Los Alamos National Laboratory1
Show AbstractApplications of inorganic scintillators—activated with lanthanide dopants, such as Ce—are found in diverse fields. As a strict requirement to exhibit scintillation, the 4f ground state (with the electronic configuration of [Xe]4f{n} 5d0) and 5d1 lowest excited state (with the electronic configuration of [Xe]4f{n-1}5d1) levels induced by the activator must lie within the host bandgap. This talk will discuss a new machine learning (ML) based screening strategy that relies on a high throughput prediction of the lanthanide dopants’ ground and excited state energy levels with respect to the host valance and conduction band edges for efficient chemical space explorations to discover novel inorganic scintillators [1]. Building upon well-known physics-based chemical trends for the host dependent electron binding energies within the 4f and 5d1 energy levels of lanthanide ions and available experimental data [2,3], the developed ML model can rapidly and reliably estimate the relative positions of the activator's energy levels relative to the valance and conduction band edges of any given host chemistry. Using a set of perovskite oxides and elpasolites (a class of double perovskite halides) as examples, it will be demonstrated that the developed approach is able to (i) capture systematic chemical trends across host chemistries and (ii) effectively screen promising compounds in a high-throughput manner. While a number of other application-specific performance requirements need to be considered for a viable scintillator, the present scheme can be a practically useful tool to systematically down-select the most promising candidate materials in a first line of screening for a subsequent in-depth investigation.
[1] G. Pilania, K. J. McClellan, C. R. Stanek, and B. P. Uberuaga, J. Chem. Phys. 148, 241729 (2018).
[2] P. Dorenbos, Phys. Rev. B 85, 165107 (2012).
[3] P. Dorenbos, J. Lumin. 151, 224 (2014).
9:30 AM - GI01.03.04
Machine Learning vs Physical Insight in the Discovery of Molecular Materials for Organic Photo-Voltaics
Alessandro Troisi1
University of Liverpool1
Show AbstractA number of proposals have been put forward to describe the properties required to improve the efficiency in organic solar cell. The number of experimental data available allow an evaluation of these proposal in a strict statistical sense, e.g. a good sampling of the space of the experiment can provide support to a given physical hypothesis. Alternatively, one can abandon any hope of physical insight and use a battery of machine learning approaches trying to correlate descriptor and photovoltaic efficiency. This lecture compares the two approaches and consider two specific problems (i) identification of high efficiency electron-acceptor and (ii) identification of high efficiency electron-donors.
[1] Alina Kuzmich, Daniele Padula, Haibo Ma, Alessandro Troisi, Trends in the electronic and geometric structure of non-fullerene based acceptors for organic solar cells, Energy and Environmental Science, 2017,10, 395-401
[2] Harikrishna Sahu, Weining Rao, Alessandro Troisi,and Haibo Ma, Towards Predicting Efficiency of Organic Solar Cells via Machine Learning and Improved Descriptors, Advanced Energy Materials (in press 2018).
9:45 AM - GI01.03.05
Combining Artificial and Human Intelligence to Accelerate Photovoltaic-Materials Development
Tonio Buonassisi1
Massachusetts Institute of Technology1
Show AbstractThe convergence of high-throughput computing, automation, and machine learning is poised to disrupt traditional fields including medical diagnosis, investment banking, legal discovery, and shipping. In this talk, I’ll discuss the potential impacts these technologies may have on energy materials development RD&D within the next ten years, using my background in photovoltaics (PV) as a testbed, and emphasizing future applications in other applications.
Across many disciplines, it historically takes 15–25 years to bring a new material to market, in part because of the lengthy and inefficient feedback loop linking material, process, and characterization. We have successfully combined high-throughput computing, Bayesian inference, and a non-destructive test (a variant of traditional current-voltage measurements) for PV devices, to reduce the diagnosis time for novel devices by 10–100x, often with an increase in precision. We have recently begun expanding this methodology to other energy systems with collaborators, including thermoelectrics. Already, this diagnostic technique has enabled progress toward a closed process optimization feedback loop. In the future, in combination with machine learning and automation, it may be possible to reduce the timeline to novel materials development to 3–5 years, within the time horizon for value capture of most investors.
10:30 AM - *GI01.03.06
Knowledge Representation for Robot Scientists ‘Adam’ and ‘Eve’
Larisa Soldatova1
Goldsmiths, University of London1
Show AbstractRobotics and autonomous systems require formal knowledge representations of the environment they function in, the entities they operate with, and the processes they are involved with. I will discuss knowledge representations for the robot scientists ‘Adam’ and ‘Eve”. These robotic systems are designed to run (semi-) autonomous biomedical experiments. Adam was the first system to autonomously discover new scientific knowledge, and Eve has identified potential drugs for treating neglected tropical diseases. The underpinning technology of robot scientists is transferable to other application domains. I will present several ontologies, the Eve ontology for the description of Eve experiments, the EXACT ontology for the description of experimental protocols, etc. I will also demonstrate the importance of such formal knowledge representations of complex systems using the example of AdaLab, a (semi-) autonomous system for knowledge discovery. AdaLab is based on the robot scientist Eve, and includes several distributed Machine Learning components for hypotheses generation, experiment planning, and refining systems models. The interaction between AdaLab components and interpretation of the experimental results relies on underpinning knowledge representations. The AdaLab system has the potential to advance biomedical knowledge.
11:00 AM - GI01.03.07
Automated Extraction of Phase Diagram Features for Identifying Candidate Binary Alloy Systems for Metallic Glasses
Aparajita Dasgupta1,Connor Mack1,Bhargava Kota1,Ramachandran Subramanian1,Scott Broderick1,S. Setlur1,Venugopal Govindaraju1,Krishna Rajan1
University at Buffalo, The State University of New York1
Show AbstractThe use of machine learning techniques to expedite the discovery and development of new materials is an essential step towards the acceleration of a new generation of domain-specific highly functional material systems. In this paper, we use the test case of bulk metallic glasses to highlight the key issues in the field of high throughput predictions and propose a new probabilistic analysis of rules for glass forming ability using rough set theory. We demonstrate the use of automated machine learning methods that go far beyond text recognition approaches by also being able to interpret phase diagrams. When combined with structural descriptors, this approach provides the foundations to develop a hierarchical probabilistic predication tool that can rank the feasibility of glass formation.
11:15 AM - GI01.03.08
Machine Learning of “Codified Synthesis Recipes”—Making Steps Toward Predictive Synthesis of New Materials
Olga Kononova1,Haoyan Huo1,Tanjin He1,Ziqin Rong2,Tiago Botari1,Vahe Tshitoyan2,Gerbrand Ceder1,2
University of California, Berkeley1,Lawrence Berkeley National Laboratory2
Show AbstractIn the past decade, first-principles methods for high-throughput computational screening and design of new materials have proven to be effective and indispensable for various applications [Curtarolo et al. Nature, 2013; Curtarolo et al. Phys. Rev. Lett. 2003]. Moreover, the design and optimization of novel materials has been transformed by the emergence of genomic approaches, where materials properties of many tens of thousands of materials can be modeled, catalogued in searchable databases, and analyzed for trends.
However, knowing the structure and properties of a novel material is not enough for its successful production. Often, even knowing what material to make, an obstacle emerges how to make it, which results in the slow and tedious experimental process of trials and errors due to high complexity of materials synthesis procedure. Therefore, predicting conditions under which a specific compound or crystal structure can be synthesized is an unresolved and fundamental problem in materials science, which imposes significant constrains on facile design and production of novel advanced materials. Yet, no any universal theory of synthesis exists which would relate synthesizability of materials and their obtained and computable parameters.
In our work, we approach the predictive synthesis problem by trying to answer the question how well synthesis can be learned from existing data. To address it, we developed a data mining pipeline which extracts information about synthesis of inorganic compounds from available scientific publications using machine learning techniques. The most important steps of pipeline include: i) collecting of research articles available on-line, ii) extraction of experimental sections and identification of paragraphs describing ceramics synthesis, iii) extraction of so-called “codified recipes” of the synthesis procedures, iv) accumulating all recipes in the database and their mining.
In this work, we focused specifically on solid state synthesis of materials and used various natural-language processing [Collobert el al. J. Mach. Learn. Res. 2011] techniques to a) find relevant synthesis paragraphs, b) resolve materials entities referring to starting compounds, environment and final products, yet to c) obtain synthesis operations and firing temperatures. As a preliminary result of our work, we extracted 17,500 papers describing solid state synthesis procedure and obtained ~16,000 unique codified recipes. We analyzed these recipes using deep learning algorithms to relate materials features and their synthesis conditions. Our results present a first step toward predictive synthesis theory, and provide insights into further development.
11:30 AM - GI01.03.09
Automated Extraction of Material Synthesis Information from Literatures via Machine Learning
Yixing Wang1,Wei Chen1,Linda Schadler2,L. Catherine Brinson3
Northwestern University1,Rensselaer Polytechnic Institute2,Duke University3
Show AbstractPolymer nanocomposites have great advantages in achieving improved performance with relative low cost compared with traditional materials. By integrating material science, informatics and information technology, efforts on Material Genome Initiative (MGI) have produced different database infrastructures, material data schemas and ontologies as well as material design tools across different material fields including polymer nanocomposite materials. Under the concept of MGI, we have developed a nanocomposite data resource: NanoMine, which consists an online material database, data-driven analysis tools and physics-based simulation tools for polymer nanocomposite data sharing, analysis and design. To appropriately capture the full suite of possible data for nanocomposites, a nanocomposite schema is designed and served as the templates to ingest data from scientific literatures and in-house experiments. The data schema consists six major sections: data source, materials, processing, characterization, property and microstructure. The current database population and curation mainly relies on the manual data abstraction from scientific literatures by humans with expert knowledge, which is an expensive, labor-intensive and error-prone process.
In order to accelerate the data curation, reduce the possible errors and provide more insights on nanocomposite material synthesis, we take a step toward fully automated data extraction by applying recent machine learning and natural language processing methods and develop an end-to-end framework to extract material processing and synthesis information from full-length journal articles. Our method starts from building a paragraph classifier which is able to select the relevant paragraphs containing the material processing information from the whole paper. Then, individual sentences from those relevant paragraphs are further classified into different categories (e.g., material characteristics, experimental action, irrelevant information etc.) using a hierarchical attention neural network based on their semantic meanings. The network is trained on a set of over one hundred human annotated literatures created by material scientist while reading through the material synthesis sections. Lastly, the learning from the attention network is combined with the outputs from grammar parser and different heuristic rules are applied in order to extract the material processing actions and parameters. The extracted material processing steps and parameters are then postprocessed in order to create a machine readable and compatible data structure (XML in our case) so that the data is ready to be populated to the database.
11:45 AM - GI01.03.10
Natural Language Processing for Materials Discovery
John Dagdelen1,2,Leigh Weston2,Vahe Tshitoyan2,Gerbrand Ceder1,2,Kristin Persson1,2,Anubhav Jain2
University of California, Berkeley1,Lawrence Berkeley National Laboratory2
Show AbstractThe majority of all materials data is currently scattered across the text, tables and figures of millions of scientific publications. We present recently developed natural language processing and machine learning techniques to extract materials knowledge by textual analysis of the abstracts of several million journal articles. We describe our use of Word2Vec to map words in our corpus to vector representations, which we then use as inputs to named entity recognition (NER) classifiers to extract materials, structures, properties, applications, synthesis methods, and characterization techniques from the abstracts in our database. With this information, we have created new tools for materials literature review such as: searching within chemical systems, filtering articles by experiment/theory, summarizing the known attributes of a material, or finding similar materials to a target. Furthermore, we report how these techniques can be used not only to automatically summarize existing knowledge, but enable new ways of discovering novel materials such as thermoelectrics or ion-conductors by revealing previously undiscovered relationships between materials and their properties.
GI01.04: Autonomous Research I
Session Chairs
Aldair Gongora
Kristofer Reyes
Larisa Soldatova
Tuesday PM, November 27, 2018
Hynes, Level 1, Room 110
1:30 PM - *GI01.04.01
Automating Science Using Robot Scientists
Ross King1,2
University of Manchester1,The National Institute of Advanced Industrial Science and Technology2
Show AbstractA Robot Scientist is a physically implemented robotic system that applies techniques from artificial intelligence to execute cycles of automated scientific experimentation. A Robot Scientist can automatically execute cycles of hypothesis formation, selection of efficient experiments to discriminate between hypotheses, execution of experiments using laboratory automation equipment, and analysis of results. The motivation for developing Robot Scientists is to better understand the scientific method, and to make scientific research more efficient. The Robot Scientist ‘Adam’ was the first machine to autonomously discover scientific knowledge: it formed and experimentally confirmed novel hypotheses. Adam worked in the domain of yeast functional genomics. The Robot Scientist ‘Eve’ was originally developed to automate early-stage drug development, with specific application to neglected tropical disease such as malaria, African sleeping sickness, etc. More recently we have adapted Eve to work on yeast systems biology, and cancer. We are also teaching Eve to autonomously extract information from the scientific literature.
2:00 PM - *GI01.04.02
Autonomous Experimentation Applied to Carbon Nanotube Synthesis
Benji Maruyama1,Pavel Nikolaev2,Daylond Hooper3,Fred Webber1,Kevin Decker2,Jason Poleski4,Michael Krein4,Richard Barto4,Ahmad Islam2,Rahul Rao2,Abigail Juhl1
Air Force Research Laboratory1,UES Inc.2,InfoScitex, Inc.3,Lockheed Martin Corp.4
Show AbstractWe have developed a first-of-its-kind Autonomous Research System, ARES, capable of designing, executing, and analyzing its own experiments autonomously using artificial intelligence (AI) and Machine Learning (ML). The closed loop, iterative method enables ARES to design new experiments based on prior results dynamically, after each experiment; a first for materials research.
We are applying this method to understand and control the synthesis of single wall carbon nanotubes, in this case optimizing growth rate in (7) - dimensional parameter space. We use automated in situ Raman spectroscopy characterization of growth rate for CVD synthesis of carbon nanotubes as a metric for a target objective used by our AI planner. We use a random forest learning approach which models experimental results, and a genetic algorithm planner to propose new experiments expected to achieve the targeted growth rate.
We expect ARES to be a disruptive advance in the near future, combining advances in robotics, AI, data sciences and operando methods to enable us to attack high dimensional research problems that were previously intractable by current research processes. We are applying the ARES method to multiple problems, including Additive Manufacturing and defect engineering in graphene. Human-robot research teams have to potential to redefine the research process and lead to a Moore’s Law for the speed of research.
2:30 PM - GI01.04.03
Experimental Bayesian Optimization of a 3D Printed Mechanical Structure
Aldair Gongora1,Bowen Xu1,Wyatt Perry1,Chika Okoye1,Kristofer Reyes2,Elise Morgan1,Keith A. Brown1
Boston University1,University at Buffalo2
Show AbstractAdditive manufacturing presents numerous possibilities for design due to the high level of control afforded by the 3D fabrication processes. However, each new design choice introduced by the flexibility of the processing represents a decision that could have important implications on performance. While structures can be optimized for many types of mechanical performance, improving failure properties is particularly challenging due to the stochastic nature of failure and the difficulty in reliably predicting the influence of the microstructure introduced by processing. Here, we explore the degree to which machine learning can guide physical experimentation to produce tough 3D-printed structures in as few experiments as possible. Specifically, we present an experimental optimization of a parametric structure to maximize specific toughness as a model figure of merit that is central to the realization of energy-absorbing structures. Utilizing Bayesian optimization with Gaussian process regression, the mechanical performance of each new design is predicted based on previous experiments in order to continuously predict the optimal design with all available data. Of particular importance, we study the influence of the decision policy, or the algorithm by which subsequent experiments are selected, on convergence to an optimum design. We investigate the effectiveness of policies such as pure exploration, expected improvement, maximum variance, and Bayesian D-optimality in optimizing specific toughness. These approaches are benchmarked against classical design of experiments. The application of these machine learning approaches to an experimental system forces the accommodation of real-world artifacts such as defects introduced through printing, variability between printers, and the stochastic nature of failure. In addition to aiding in the development of high-performance materials and structures, the lessons described will assist in the selection of decision policies for broader classes of experimental optimizations in a manner that facilitates the convergence of machine learning, physical experimentation, and design.
2:45 PM - GI01.04.04
Machine Learning Based Monitoring of Advanced Manufacturing
Brian Giera1,Bodi Yuan1,2,Albert Chu1,Philip DePond1,Gabe Guss1,Du Nguyen1,Congwang Ye1,William Smith1,Nikola Dudukovic1,Sara McMains2,Manyalibo Matthews1
Lawrence Livermore National Lab1,University of California, Berkeley2
Show AbstractAs with most advanced manufacturing (AM) systems, analysis of AM sensor data currently occurs post-build, rendering process monitoring and rectification impossible. Supervised machine learning offers a route to convert sensor data into real-time assessments; however, this requires a wealth of labeled sensor data that traditionally is too time-consuming and/or expensive to assemble. In this work, we solve this critical issue in a variety of AM systems. We develop and implement machine learning (ML) algorithms for the purposes of automated quality assessment and, in some cases, rectification. We discuss ML-based algorithms capable of automated detection in a host of AM technologies such as Laser Powder Bed Fusion and Direct Ink Write and also microfluidic platforms that are used for feedstock production. The common thread within these systems is that routinely collected sensor data (e.g. high-speed video, pressure gauges, etc.) contains pertinent information about the state of the system that can be converted into actionable information in real-time via ML. Successful implementation of these machine learning algorithms will reduce time and cost during process by automating quality assessment and lead to process control.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
3:30 PM - *GI01.04.05
Autonomous Integration of Materials Theory, Experiment and Computation—SARA
R. Bruce van Dover1,John Gregoire2,Carla Gomes1,Bart Selman1,Christopher Wolverton3,Alex Zunger4
Cornell University1,California Institute of Technology2,Northwestern University3,University of Colorado Boulder4
Show AbstractMany new materials have been suggested, either by first principles theory or by heuristic inference, as having critical enabling properties for technologies such as Li-ion batteries, transparent conductors, and photochemical energy capture. Yet theory has not provided adequate guidance regarding the conditions, if any, under which they can be synthesized. Traditional synthesis of novel compounds often involves laborious and slow manual iterative exploration of composition and processing space. We are radically transforming the ability to identify and synthesize new materials using innovative AI-based strategies for reasoning and conducting science, including the representation, planning, optimization, and learning of materials knowledge. The logical structure of our approach, SARA (Scientific Autonomous Reasoning Agent), is based on a community of software agents that cooperatively generate hypotheses and autonomously test them through autonomous execution of the materials discovery/development process, and is enabled through concomitant development of robotic processing and characterization tools, on-the-fly DFT calculations, and AI-based algorithms. Our approach is further augmented with human insights so that the artificial intelligence leverages the human intelligence of expert scientists, creating an unprecedented platform for human-machine collaboration. SARA includes methods and methodologies for the rational design of functional materials and for discovering the requisite synthesis parameters for both stable and metastable materials.
4:00 PM - *GI01.04.06
Autonomous Research System for Biology (ARES-B)—A Machine Learning Approach to Optimizing Materials Synthesis Using Synthetic Biology
Maria Torculas1,Colleen Reynolds1,David Coar1,Jeffrey Stuart1,Michael Jewett2,Ashty Karim2,Michael Krein1
Lockheed Martin1,Northwestern University2
Show AbstractSynthetic biology is an emerging manufacturing method of materials and precursors, such as those typically derived from fossil fuels. However, controlling synthesis pathways to optimize product yield remains a significant challenge in process maturation and scale up. Current strategies rely on laboratory experimentation. Depending on the design of experiments, this can be time and resource intensive, and results are not guaranteed. A metabolic synthesis pathway is typically a multistep process with dozens of degrees of freedom, such as cofactor concentrations and homologue variants. This describes a parameter space intractable to exhaustively explore through experiments. Machine learning provides a means to intelligently and efficiently explore this parameter space by building and optimizing models which describe the synthesis process and guide state-of-the-art cell-free experiments. We adapted principles from our previous work on the Autonomous REsearch System (ARES) in which an autonomous system iteratively performed experiments, analyzed results, and applied machine learning models to automatically design and execute a new set of experiments, drastically reducing the experimental time needed to optimize a chemical synthesis process. Here, we applied our approach to synthetic biology pathway modeling and optimization. We used two machine learning approaches – neural networks and parameter-fit ordinary differential equations (ODEs) – to build models of a 5-step metabolic process for butanol production. These models were used to inform a series of cell free experiments, resulting in a 3-fold improvement in butanol yield over prior trial-and-error experimentation. We found a trade between model interpretability and optimization ability: The ODEs required very little initial information and were interpretable, but fell short of the neural networks’ ability to suggest optimized conditions.
GI01.05: Poster Session: Machine Learning and Data-Driven Materials Development and Design
Session Chairs
Keith A. Brown
Kristofer Reyes
Kristofer Reyes
Wednesday AM, November 28, 2018
Hynes, Level 1, Hall B
8:00 PM - GI01.05.01
Database-Driven Materials-Selection Framework for Semiconductor Heterojunction Design
Andre Schleife1,Ethan Shapera1
University of Illinois at Urbana-Champ1
Show AbstractAt the interface of two semiconductors, where bulk band structures merge into each other, an electronic transition region forms, with band-edge discontinuities that are confined to not more than a few atomic layers near the interface. These discontinuities, also known as valence- and conduction-band offsets, naturally occur at the interface of materials with different band gaps. While their signs decide whether the interface acts as a barrier or conductor for electrons or holes, their magnitudes determine how good of a barrier/conductor the interface is. Such heterojunctions are at the heart of many modern semiconductor devices with tremendous societal impact: Light-emitting diodes shape the future of energy-efficient lighting, solar cells are promising for renewable energy, and photo-electrochemistry seeks to optimize efficiency of the water-splitting reaction.
Unfortunately, design of heterojunctions, e.g. to find optimal electron- and hole-transport layers for new active-component materials, is difficult due to the limited number of materials for which band alignment is known and the experimental as well as computational difficulties associated with obtaining this data. At the same time, the dependence of band alignment on intrinsic properties of the involved materials turns the design of heterojunctions with specific alignment into an interesting materials-design or materials-selection optimization problem. In order to tackle this problem, we show that band alignment based on branch-point energies is a good and efficient approximation that can be obtained exclusively using data from existing electronic-structure databases. To this end, we shows that errors associated with this approach are comparable to those of expensive first-principles computational techniques as well as experiment.
We then incorporate branch-point energy alignment into a framework that is capable of rapidly screening existing online databases to design semiconductor heterojunctions. We showcase our technique for different prototype cases, including successful predicitons of electron- and hole transport layers for CdSe- and InP-based LEDs as well as for novel CH3NH3PbI3- and nanoparticle PbS-based solar absorbers. From our results we show that our framework addresses the challenge of accomplishing fast materials selection for heterostructure design by tying together first-principles calculations and existing online materials databases. We show that it can be used to directly design desired semiconductor heterostructures, or, at least, to reduce the vast candidate search space.
8:00 PM - GI01.05.02
GCMCWorkflow—Fully Autonomous Grand Canonical Sampling of Microporous Materials
Richard Gowers1,Craig Chapman1
University of New Hampshire1
Show AbstractThe study of gas adsorption in microporous solids is vitally important for addressing the crises facing modern society including climate change (carbon dioxide capture) and energy production (hydrogen storage). Whilst databases of possible materials[1] and software for performing Grand Canonical Monte Carlo (GCMC) simulation of these[2] have been created, to date there is no framework for performing the fully autonomous computational screening of these materials.
In this contribution we present a freely available software package, GCMCWorkflow [3], which allows fully autonomous computational modeling of materials in a reliable, repeatable and reproducible way. The package is built upon the Fireworks[4] package and acts as a controller of existing GCMC tools, in this case Raspa[5].
Building upon previous work into the quantification of uncertainty of these simulations and what constitutes “enough” sampling[6], the package is able to adaptively start more simulations as required to achieve the defined sampling. Other features such as restarting simulations after crashes and the massively parallel scale enabled by Fireworks allows GCMCWorkflow to make performing large scale screening on high performance computing clusters far less user intensive.
Finally we show how the abstraction layer this automation provides allows multiple GCMC simulations to be programmed into larger machine learning pipelines. In the first example we show how genetic algorithms can be used to derive force field parameters in a top down approach to match data gathered from experiments. In another example, we show how we can search for adsorption capacities of materials where the exploration of sampling space is automated through data-driven algorithms.
References:
[1] Yongchul G. Chung et al. (2014) Computation-Ready, Experimental Metal–Organic Frameworks: A Tool To Enable High Throughput Screening of Nanoporous Crystals, Chemistry of Materials, 26 (21), pp 6185–6192 DOI: 10.1021/cm502594j
[2] David Dubbeldam et al. (2013) On the inner workings of Monte Carlo codes, Molecular Simulation, 39:14-15, 1253-1292 DOI: 10.1080/08927022.2013.819102
[3] Richard J. Gowers (2018) GCMCWorkflow https://github.com/richardjgowers/GCMCworkflow DOI: 10.5281/zenodo.1289897.
[4] Anubhav Jain et al. (2015) FireWorks: a dynamic workflow system designed for high throughput applications, Concurrency Computat.: Pract. Exper., 27: 5037–5059. DOI: 10.1002/cpe.3505
[5] David Dubbeldam et al. (2015) RASPA: molecular simulation software for adsorption and diffusion in flexible nanoporous materials, Molecular Simulation, 42:2, 81-101 DOI: 10.1080/08927022.2015.1010082
[6] Richard J. Gowers et al. (2017) Automated analysis and benchmarking of GCMC simulation programs in application to gas adsorption, Molecular Simulation, 44:4, 309-321 DOI: 10.1080/08927022.2017.1375492
8:00 PM - GI01.05.04
Dodecahedral Inorganic Cage Discovery Assisted by Machine Learning Algorithms
Kai Ma1,2,Yunye Gong1,Tangi Aubert1,3,Melik Turker1,Teresa Kao1,Peter Doerschuk1,Ulrich Wiesner1
Cornell University1,Current Affiliation: Elucida Oncology2,Ghent University3
Show AbstractArtificial intelligence (AI) is beginning to show significant potential for accelerating research in materials science and engineering. In particular, single-particle three-dimensional (3D) reconstruction of cryo-electron microscopy (cryo-EM) images using machine learning computer algorithms has recently emerged as a powerful tool in structural biology for revealing high-resolution information of biological structures, including virus, protein, and DNA assemblies. However, to the best of our knowledge such AI-based approaches have not yet been successfully applied to synthetic materials discovery. In this contribution, we describe how such machine learning approaches resulted in the discovery of an ultrasmall (~10 nm diameter) dodecahedral silica nanocage, or ‘silicage’ [1].
In the study, tens of thousands of single-particle cryo-EM images were collected on early formation stages of surfactant self-assembly directed silica nanostructures. However, the well-defined structures could not be identified from regular TEM and cryo-EM observations. In contrast, feeding the cryo-EM images into machine learning algorithms for single-particle 3D reconstruction revealed a previously unidentified dodecahedral silica cage. The dodecahedron is the highest symmetry structure of the five Platonic solids known since antiquity and extensively studied by the ancient Greeks. Details of the reconstruction provided insights into possible formation mechanism of these cages around self-assembled surfactant micelles. Cage structures were not limited to silica, but were also observed from other materials including metals and transition metal oxides. Such materials may find a plethora of applications in areas including catalysis and nanomedicine. This discovery, facilitated by artificial intelligence, not only provides novel insights into the fundamental understanding of molecular self-assembly, but also paves the way for the search of related structures with different symmetry and from different materials.
[1] K. Ma, Y. Gong, T. Aubert, M. Z. Turker, T. Kao, P. C. Doerschuk, U. Wiesner, Nature 2018, in press.
8:00 PM - GI01.05.05
Software Tools, Methods and Applications of Machine Learning in Functional Materials Design
Anubhav Jain1
Lawrence Berkeley National Laboratory1
Show AbstractIn this talk, I will describe our group's efforts to build a general framework for performing data mining on materials properties based on structure and composition. I will introduce matminer (https://hackingmaterials.github.io/matminer/), an open-source code capable of extracting materials data, generating thousands of crystal structure and compositional descriptors, and quickly reproducing and extending existing machine learning studies. I will demonstrate how new structural features implemented in matminer based on local environment can be used for machine learning and classification. I will also describe our group's effort to create a "black-box" machine learning model that can be applied to any property and that can be used as a benchmark comparison for machine learning efforts. Finally, I will describe multiple applications, including the prediction of electronic properties, bulk metallic glass behavior, and an effort to learn the properties of materials from unstructured text.
8:00 PM - GI01.05.06
Data Analytics for Mapping of Catalytic Performance Form High Throughput Cyclic Voltammetry Experiments
Kiran Vaddi1,Surya Devaguptapu1,Tianmu Zhang1,Xiaozhou Shen1,Scott Broderick1,E. Bruce Pitman1,Fei Yao1,Olga Wodo1,Krishna Rajan1
University at Buffalo, The State University of New York1
Show AbstractThe high throughput exploration of materials space has been recognized as a new paradigm in materials design and discovery. However, typical high throughput exploration methods deliver high dimensional datasets that pose the challenge of extracting the key features and trends that could guide the discovery process. To address this challenge, we develop data analytics tools to extract irreducible representation of the high throughput exploration data.
We develop computational tools to perform feature extraction from high throughput electro-chemistry experiments. We leverage non-linear dimensionality reduction techniques to project a high dimensional set of cyclic voltammetry (CV) experiments into a lower dimensional space. The main advantage of our approach is the mathematically rigorous extraction of the inherent dimensionality of a given data set. This is in contrast to the traditional Engineering Figure of Merit used to interpret CV curves. To demonstrate the performance of our approach, we analyze CV curves for a compositional library of perovskies devices subjected to a series of various electrochemical tests. Each device's current response to cyclic voltammetry test is recorded over several time points (considered as dimensions from here on). In all these cases, we discovered the inherent dimensionality to be as low as 3 (comparing to input 4200 dimensions) with over
90% variance explained. Moreover, we observed a strong compositional dependency of the current responses. Finally, we unraveled the dominant structure hidden in the data as belonging to two types of manifolds regardless of the noise present in the data.
Our method has capability to inform the iterative design process of linking theory, modeling, and experiments for improvement and acceleration of materials design.
8:00 PM - GI01.05.07
Automated Bayesian Optimization of the CANDLE Implicit Solvation Model
Yuxi Chen1,Henry Herbol1,Paulette Clancy1
Cornell University1
Show AbstractThe creation of hybrid organic-inorganic perovskite thin films is frequently made in solution to promote energy-inexpensive sustainable processing. Unfortunately, the inclusion of solvent molecules greatly increases the computational expense during simulations of the solubilization and subsequent crystallization of said thin films. Implicit solvation models represent a solvent as a continuous medium instead of individual atomically “explicit” solvent molecules, and are commonly used in order to allow for larger-scale studies of solvent-solute interactions in DFT. The Joint Density Functional Theory (JDFTx) is a Density Functional Theory (DFT) software written by Prof. Tomás Arias (Physics, Cornell). JDFTx specializes in its implementation of solvation models, such as the charge-asymmetric nonlocally-determined local-electric solvation model (CANDLE), which has been shown to be particularly efficient and accurate [R. Sundararaman et al., 2014].
Despite the efficiency and accuracy of the model, the solvent parametrization process is tedious. We are working to apply an automated Bayesian Optimization algorithm towards parametrization of CANDLE Models on other organic solvents. This novel approach, as applied to molecular simulations, will make the parametrization process more readily accessible to the scientific community at large. Bayesian Optimization searches for the best combination of the parameter set to maximize an objective function, which is set to be the mean absolute error (MAE) between the experimental solvation energy and the calculated solvation energy of a set of solute molecules. To allow for robustness, solute molecules studied include neutral, anionic, and cationic species.
8:00 PM - GI01.05.08
Semiconducting Materials from Analogy and Chemical Theory (SMACT)
Daniel Davies1,Keith Butler2,Adam Jackson3,Jonathan Skelton1,Aron Walsh4
University of Bath1,ISIS Facility2,University College London3,Imperial College London4
Show AbstractWe present an open-source code SMACT[1] to perform rapid screening of known and hypothetical materials. It combines elemental descriptors and chemical heuristics to navigate large numbers of compounds, which can be fed into machine learning procedures or other materials design workflows.
Forming a four-component compound from the first 103 elements of the periodic table results in more than 10^12 combinations. Such a materials space is intractable to high-throughput experiment or first-principles computation. We introduce a framework to address this problem and quantify how many materials can exist. We apply principles of valency and electronegativity to filter chemically implausible compositions, which reduces the inorganic quaternary space to 10^10 combinations [2]. We demonstrate that estimates of bandgaps and absolute electron energies can be made simply based on the chemical composition and apply this to search for new semiconducting materials to support the photoelectrochemical splitting of water [3].
The applicability to crystal structure prediction by analogy with known compounds is shown, including exploration of the phase space for ternary combinations that form a perovskite lattice. Computer screening reproduces known perovskite materials and predicts the feasibility of thousands more. Due to the simplicity of the approach, large-scale searches can be performed on a single workstation. For example, we have been able to assign likely crystal structures to all hypothetical quaternary oxide compositions produced by SMACT, constituting a database of over 2 million compounds. The stability and properties of these compounds are then assessed methodically using high-throughput workflows for first-principles calculations.
1. https://github.com/WMD-group/SMACT
2. D. W. Davies et al, Chem, 1, 617 (2016)
3. D. W. Davies et al, Chem. Sci., 9, 1022 (2018)
8:00 PM - GI01.05.09
Predicting the Glass Forming Ability of Bulk Metallic Glasses Using Random Forests
Vanessa Nilsen1,Michael Hibbard1,Logan Ward2,Dane Morgan1
University of Wisconsin - Madison1,The University of Chicago2
Show AbstractPredicting the glass forming ability of metallic alloys is an active area of research due to the many existing and potential uses of metallic glasses. Many descriptors accounting for the thermodynamics and kinetics of the vitrification process have been proposed to quantify the likelihood of glass formation for an alloy. A particularly widely used class of descriptors involve simple functions of the glass transition temperature (Tg), the liquidus temperature (Tl), and the crystallization temperature (Tx). The descriptors we focus on in this work include Trg = Tg/Tl, γ = Tx/(Tg + Tl), and ω = Tg/Tx - 2Tg/(Tg + Tl). It is of interest to predict these descriptors quickly for new alloy systems. We note that these transition temperatures are also of interest in their own right, separate from their ability to predict glass forming ability, as they control important design properties of alloys. Accurately predicting these transition temperatures could also prevent the need to first make a glass to calculate the value of any of the descriptors. In this work we have used a random forest machine learning method to model these descriptors as a function of simple elemental properties. We used two approaches to model the value of the glass forming descriptors. First, we modeled Trg, γ, and ω directly. Second, we modeled the transition temperatures and used the resulting predictions to calculate the glass forming descriptors. Interestingly, the latter approach was significantly more accurate. In addition to predicting the glass forming descriptors, we assessed the applicability of the model in various compositional domains by examining the error and uncertainty on predictions of alloys with varying degrees of representation in the training data set.
8:00 PM - GI01.05.11
A Data Driven Analysis of Volcano Plots and Prediction of New Binary Catalysts
Aparajita Dasgupta1,Yingjie Gao1,Thaicia Stona de Almeida1,Scott Broderick1,E. Bruce Pitman1,Krishna Rajan1
University at Buffalo, The State University of New York1
Show Abstract
In this study, we describe a computational approach where we ingrain the underlying theoretical constructs behind these and subsequently use the underlying physical descriptors to expand and extend the database of known catalytic reactions. To this end, we use spectral clustering methods to identify key physical regimes underlying volcano plots by using the volcano plot for ammonia synthesis rate as a function of nitrogen adsorption energy as our case study. The application of dimensionality reduction techniques and graph networks to these catalytic systems is illustrated by applying to elemental and binary transition metal alloys to predict the positions of these systems on volcano plots.
8:00 PM - GI01.05.12
Pycroscopy—A Community-Driven Approach for Analyzing and Storing Materials Imaging Data
Rama Vasudevan1,Suhas Somnath1,Christopher Smith1,Stephen Jesse1
Oak Ridge National Laboratory1
Show AbstractMaterials science is undergoing profound changes, driven by continual improvements to instrumentation that have resulted in an explosion in the data volume, dimensionality, complexity, and variety, in addition to increased accessibility to high-performance computing (HPC) resources, and more sophisticated computer algorithms. These changes are especially prominent in the functional imaging of materials. However, the current software typically do not provide access to advanced or user-defined data analysis routines, and store measurement data in proprietary file formats. These proprietary software and file formats not only impede data analysis but also hinder continued research and instrument development, especially in the era of “big data”. Therefore, moving to the forefront of data-intensive materials research requires general and unified data curation and analysis platforms that are HPC-ready and open source.
We have developed a free and open-source python package called Pycroscopy for analyzing, visualizing and storing data. Pycroscopy is freely available via popular software repositories, and therefore lifts any financial burden for interrogating data. Pycroscopy describes data in an instrument-agnostic structure that allows it to represent data of any size, dimensionality, or complexity acquired on a regular grid of positions or random positions as in compressed sensing. This data model is stored in open hierarchical data format files (HDF5) that can be interrogated using any programming language, scale well from kilobyte to terabyte sized datasets, and can readily be used in HPC environments unlike proprietary data formats. As a consequence, Pycroscopy-formatted data files are curation-ready and therefore both meet the guidelines for data sharing issued to federally funded agencies and satisfy the implementation of digital data management as outlined by the United States Department of Energy. The generalized data format allows data processing and analysis algorithms to be generalized in-turn allowing a single version of the algorithm to be applied to data collected from instruments from different brands or even modalities. The simple structure and comprehensive documentation in pycroscopy enable even novice programmers to easily translate physical or chemical problems into computational problems. Unlike many other open-source packages that focus on analytical or processing routines specific to an instrument, the general definition of the Pycroscopy data format can be readily adopted for different microscopy techniques. Furthermore, the generality of Pycroscopy provides material scientists access to a vast and growing library of community-driven data processing and analysis routines that far exceed those provided by instrument manufacturers and are desperately needed in the age of big data. This research was conducted at the Center for Nanophase Materials Sciences, which is a US DOE Office of Science User Facility.
8:00 PM - GI01.05.13
Theoretical Classification for Softmagnetic Compounds Applying Regression-Based Model Selection
Masakuni Okamoto1,Masami Yamasaki2
Hitachi Ltd.1,Yokohama City University2
Show AbstractIn order to improve the magnetic properties of the compounds forming the main phase of the soft-magnetic materials, we study the magnetization (Ms) and the magneto-striction constant (λ001) of ternary alloy compounds consisting of 3d transition-metal elements by using the first principles calculation and the machine learning technique.
First we prepare totally 1393 training data using first-principles calculation for bcc/fcc-(Fe,Co,Ni)-based ternary compounds. We obtain 36 promising compounds after manual screening of 1393 compounds under several conditions such as | λ001 | < 2 x 10-5.
We expect magnetic permeability (μ) of these compounds is large due to a correlation between μ and λ001.
Next we classify the calculated 1393 data by using the machine learning technique. We use new descriptors based on the Voronoi-polygons, in addition to the common descriptors for materials informatics, such as lattice constants, ionization energy, atomic angular momentum L, S, J. Nonlinear regression with a Gaussian kernel and sparse modelling accurately reproduces the values Ms of 1393 compounds. The relevant descriptors for prediction of Ms are L, S, J and Voronoi volumes. On the other hand, prediction of λ001 is poor at present, some new descriptors may be necessary.
We demonstrate how the first-principles calculation and the regression-based model selection technique can be utilized for classifying new materials.
8:00 PM - GI01.05.14
Predicting the Charge Density of Organic Molecules and Polymers Using Deep Learning
Deepak Kamal1,Anand Chandrasekaran1,Rampi Ramprasad1
Georgia Institute of Technology1
Show Abstract
Over the past few decades, the Kohn-Sham scheme of density functional theory (KS-DFT) has become the customary method to probe solid-state properties of materials. Despite its preeminence, the computational cost of this methodology renders it intractable for large-scale calculations involving thousands/millions of atoms. The simulation of mesoscale properties of polymers is one such example where this limitation is most apparent. Here we propose a method to accurately predict charge densities of organic systems (both molecules and polymers) by learning from pre-calculated examples of smaller systems. We start with creating a database of organic compounds which cover a wide range of configurational and bonding environments. Further, we introduce a novel fingerprinting scheme which maps the charge density to the local atomic environment using deep neural networks. The model thus obtained is systematically improved (both in terms of accuracy and transferability) by selectively training on poorly predicted local environments encountered in new configurations. Following the proposed methodology, we develop a robust model that can rapidly predict the charge density of large organic systems. This charge density can then be used as input to orbital free density functional theory scheme to swiftly access a broad range of materials properties, thus bypassing the need to explicitly solve the Kohn-Sham equation.
8:00 PM - GI01.05.15
Accelerated Informatics Base Design of Multi-Component Systems
Xiaozhou Shen1,Tianmu Zhang1,Scott Broderick1,E. Bruce Pitman1,Krishna Rajan1
University at Buffalo, The State University of New York1
Show AbstractWe introduce an integrated computational chemistry/ informatics approach to accelerate the design and discovery of new complex materials. Our approach involves coupling topological data (TDA) analysis classification methods with Hirshfeld Surface Analysis (HSA). 3-dimensional Hirshfeld surfaces, encode both chemical bonding and molecular geometry information; while TDA captures the “shape” of data in a multiscale manner to probe for hidden correlations between crystallographic structure and electronic structure. When applied collectively, we show how new classifications and insights into crystal chemistry can be revealed.
8:00 PM - GI01.05.16
Fast Evaluation of Microstructure-Property Relation in Duplex Alloys Using SEM Images
Thantip Krasienapibal1,Yasuhiro Shirasaki1,Momoyo Enyama1,Akiko Kagatsume1,Minseok Park1,Sayaka Tanimoto1
Hitachi Ltd.1
Show AbstractRecently, fast material development and design is highly in-demand due to the needs of high performance materials. Microstructure-property relation, a key in material development, has long been investigated qualitatively. For example, controlling crystal structure and crystal size can manipulate material strength and anti-corrosion properties in alloys. To achieve highly-efficient experimental planning such as process tuning and controlling trade-off of multiple properties, the quantitative and large volume of datasets of microstructure-property relation is required. However, a method that accurately and automatically provide microstructure features in a short time is still a challenge.
For alloys, the microstructure i.e. crystal phase and orientation, is usually evaluated by electron back-scattered diffraction (EBSD) measurement [1]. The measurement needs highly-skilled technicians and long time limiting the automation for lots of samples. Backscattered electron (BSE) images acquired from scanning electron microscope (SEM) also provide crystal information. Using Z-contrast in BSE images, crystal phase information can be simply obtained due to different average atomic density [2]. In comparison to EBSD, using BSE images is promising since it can be acquired within few minutes and easy to operate. Hence, an approach to realize microstructure-property using BSE images should be considered.
In this research, we propose a method to utilize microstructure features in alloy from BSE images. By applying machine learning i.e. DNN-based segmentation, microstructure features, such as the amount of each phase present are extracted and are used for evaluation of microstructure-property relation. We evaluated the relation of microstructure and mechanical properties in Cr-duplex alloy. In addition, we study the detail of preparation of training data for the DNN-based segmentation under different measurement conditions of BSE images.
The result suggested that by applying DNN-based segmentation on BSE images of Cr-duplex alloy, accurate phase distinction was performed with 90% accuracy in average resulting in reliable extraction of microstructure features i.e. the amount of each phase. The extracted amount of each phase is well-agreed with the result of EBSD measurement. The relation of the microstructure features and mechanical properties such as Vickers hardness, abrasion loss, and 0.2% yield strength of Cr-based duplex alloy was evaluated using regression model. Using the relation, prediction of the mechanical properties was performed and the error was less than 10%. From these results, we have demonstrated a method to utilize microstructure features from BSE images leading to fast and easy evaluation of microstructure-property relation in duplex alloy.
References
[1] A.J. Wilkinson et.al., Mater. Today 15, 9 (2012) 366.
[2] L. Reimer, Scanning Electron Microscopy, 2nd ed., Springer (1988).
8:00 PM - GI01.05.17
Elastic Strain Engineering Reaches Six Dimensions via Machine Learning
Zhe Shi1,Evgenii Tsymbalov2,Alexander Shapeev2,Ju Li1
Massachusetts Institute of Technology1,Skolkovo Institute of Science and Technology2
Show AbstractThe controllable incorporation of strain on materials merits unfold scientific and technological potential, yet poses a challenge of exploring the vast six-dimensional space of admissible elastic strains. Here we demonstrate that systematic machine learning can make the problem of representing electronic structure as a function of six-dimensional strain computationally tractable. Specifically, we develop a number of general methods for surrogate modeling of elastic strain engineering which, relying on a limited amount of data from ab initio calculations, can be used to fit the required properties with sufficient accuracy. In particular, an artificial neural network predicts the band structure within the accuracy of 19 meV in case of three-dimensional strain.
8:00 PM - GI01.05.18
Oxygen Removal from Ce1-xZrxO2 Solid Solution Under Reducing Condition Using Genetic Algorithm
Ki-Yung Kim1,Yurie Kim2,Jason Kim2,Dong-Gung Shin1,Jun-Yeong Jo1,Yeong-Cheol Kim1
KoreaTech1,Pohang University of Science and Technology2
Show AbstractMean field approach has been employed to consider random distribution of atoms in solid solutions [1]. The mean field approach, however, shows averaged characteristics of the solid solutions, and, therefore, cannot represent energetically favorable solid solutions. We studied Ce0.75Zr0.25O2 solid solutions using genetic algorithm and density functional theory (DFT) [2]. Initial population, fitness function, selection, crossover, and mutation were varied in genetic algorithm to find an energetically favorable solid solution. In this study, we employed lattice dynamics to increase the calculation speed and to consider temperature effect. The increased calculation speed also allowed us to consider bigger systems. The bond strengths of the oxygen atoms in the favorable solid solution structure should vary because each oxygen atom is surrounded by Ce and/or Zr atoms differently; the oxygen atom with weak bond strength would be removed first under reducing conditions. We employed genetic algorithm again to find easily removable oxygen atoms in the solid solution under reducing conditions.
References
[1] G. Balducci, M. S. Islam, J. Kaspar, P. Fornasiero, and M. Graziani, Chem, Mater. 2003, 15, 3781.
[2] J. Kim, D.-H. Kim, J.-S. Kim, and Y.-C. Kim, Comput. Mat. Sci., 2017, 138, 219.
8:00 PM - GI01.05.19
Machine Learning Approach to Discover the Correlation Between Core-Loss Spectra and Materials Information via Clustering and Decision Trees
Shin Kiyohara1,Teruyasu Mizoguchi1
University of Tokyo1
Show AbstractSpectroscopy is one of the most promising techniques to reveal atomic and electronic structures of material inside and surface. Among a variety of spectroscopy, core-loss spectroscopy using electron (ELNES) and X-ray (XANES) has nano- or sub nano-scaled spatial resolution and nano- or femto-scaled temporal resolution, enabling to analyze lattice defects and chemical reaction. However, interpreting ELNES/XANES spectra is not straightforward, and therefore comparing experimental spectra with those in database, namely, “experimental fingerprint”, has been generally used. While such a large database has been constructed, we often come across an unknown spectrum which isn’t contained by the database. To overcome the infeasibility of interpreting an unknown spectrum, theoretical calculation must be effective. Devoting a lot of effort enabled to calculate ELNES/XANES spectra of most of elements and atomic configurations, whereas calculating just single spectrum requires much computation time. Since the spatial and temporal resolutions of ELNES/XANES can generate a thousand of spectra in an experiment, such one by one “theoretical fingerprint” is impracticable. In that situation, utilizing the correlations between the spectra and atomic information discovered from the database can be an alternative method.
Here, we developed a new interpretation approach based on machine leaning which can deal with a big data and be implemented without the theoretical calculation.
First, a spectra database was constructed by the theoretical calculation. Then, hierarchical clustering was performed on the database, resulting in categorizing similar spectra into clusters. Spectral similarity was measured by cosine distance. Cutting the hierarchical tree at an arbitrary threshold makes some clusters of the spectra. We successively lowered the cutting threshold to each branch point, resulting in making two clusters at every branch point. At those branch points, we repeatedly implemented decision tree analysis considering the newly generated two clusters of the spectra as the training label and their structure information as the descriptors.
The queries of the constructed decision trees can provide the best (most characteristic) features for distinguishing each of the clusters. These features are also believed to be physically important, and detailed comparison of the two types of spectra in the clusters can endow the origins of the differences between these two types of spectra.
We applied this approach to approximately a lot of O-K edge ELNES spectra of SiO2 polymorphs. The results will be discussed in my presentation.
8:00 PM - GI01.05.20
Analytic Continuation via “Domain-Knowledge Free” Machine Learning
Hongkee Yoon1,Jae-Hoon Sim1,Myung Joon Han1
KAIST1
Show AbstractWe present a machine-learning approach to a long-standing issue in quantum many-body physics, namely, analytic continuation. This notorious ill-conditioned problem of obtaining spectral function from imaginary time Green's function has been a focus of new method developments for past decades including many numerical approaches; such as maximum entropy method [1,2], stochastic method [3], Pade approximation [4]. Here we demonstrate the usefulness of modern machine-learning techniques including convolutional neural networks and the variants of stochastic gradient descent optimizer. Machine-learning continuation kernel is successfully realized without any `physical domain knowledge' and the outstanding performance is achieved for both insulating and metallic band structure [5]. Unlike other methods, our method is additional parameter free fully automatic approach with no further human intervention. Our machine-learning-based approach not only provides the more accurate spectrum than the conventional methods in terms of peak positions and heights but is also more robust against the noise which is the required key feature for any continuation technique to be successful. Furthermore, the ML-based kernel is 104-105 faster than conventional analytic continuation algorithms and more robust to noise from Green’s function. Our approach to tackling ill-posed problems by statistical data-based ML shows the applicability of ML in other ill-posed physical problems.
[1] Jarrell, M. & Gubernatis, J. E. Bayesian inference and the analytic continuation of imaginary-time quantum Monte Carlo data. Physics Reports 269, 133–195 (1996).
[2] Bergeron, D. & Tremblay, A.-M. S. Algorithms for optimized maximum entropy and diagnostic tool for analytic continuation. Phys. Rev. E 94, 023303 (2016).
[3] Sandvik, A. W. Stochastic method for analytic continuation of quantum Monte Carlo data. Phys. Rev. B 57, 10287 (1998).
[4] Vidberg, H. J. & Serene, J. W. Solving the Eliashberg equations by means of N-point Padé approximants. J. Low Temp. Phys. 29, 179 (1977).
[5] Yoon, H., Sim, J.-H. & Han, M. J. Analytic continuation via ‘domain-knowledge free’ machine learning. ArXiv1806.03841 Cond-Mat Physics (2018).
8:00 PM - GI01.05.21
First-Principles Theory Based Machine Learning Force Fields for 2D Materials
Xiaofeng Qian1,Yang Yang1,2,Hongxiang Zong2,Hua Wang1,Xiangdong Ding2
Texas A&M University1,Xi’an Jiaotong University2
Show AbstractMolecular dynamics simulation is a powerful tool to understand the underlying physics of dynamic behavior of materials. However, its application is greatly limited by the availability of accurate force fields. Here we demonstrate machine learning as a strong tool to combine the accuracy of first-principles density functional theory and low-cost MD simulations. By learning the database generated from density functional theory calculations, we are able to generate accurate and effective force fields. We will show a few interesting examples of our method for elemental materials and 2D materials where the generated force fields can capture both energetic and structural properties.
8:00 PM - GI01.05.23
Machine Leaning Induction of Parameters in Numerical Models of Capacity-Voltage Curve in Lithium-Ion Batteries
Takuya Hiramoto1,Masahiro Soeno1,Misato Nakamura1,Takashi Kusachi1,Hiromitsu Takaba1
Kogakuin University1
Show AbstractNumerical model of battery performance is indispensable for developing and optimization of battery materials and structures. Since Newman-model built one-dimensional model based on electrochemical reactions and ion transport, a numerical simulation to evaluate a charge/discharge characteristics as well as a capacity-voltage curve of lithium ion batteries have been widely applied in the field of battery research. It is difficult to know, however, all needed parameters in the model only from the experiments. Moreover, the induction of unknown parameters mostly depends on the experience of expert researchers. We have studied a numerical model for describing a charge/discharge characteristics of lithium air battery where the precipitation of discharged compound, lithium peroxide, on the surface of carbon is modeled in which the deposition rate and distribution on the carbon surface are considered as the parameters [1]. There are more than 20 parameters in this numerical model. It would takes a lot of time to obtain all of them only from the experiments. In this study, we inducted the parameters needed for the numerical model for describing the charge and discharge or capacity voltage curves of lithium ion and lithium air batteries using a machine-learning. Used learning method is the neural network (NN) that shows remarkable progress in the field machine learning. NN has a layered structure, and between layers are formed by multiply-add operation with a weighting factor. In this research, NN learnings is carried out to induct parameters used in the numerical model. Training-data (capacity-voltage or discharge-capacity curves) are generated by employing the numerical model calculation with various different input parameters. For instance, we constructed a deduction model using 24 charge using NN and discharge curves as a training set and confirmed the reasonable accuracy. In addition, we also check the accuracy for the induction of multiple parameters such are the reaction rate and temperature, etc. The accuracy of induction tends to depend on the number of kind of unknown parameters, the accuracy and their dependency on the detailed algorithm of NN will be presented and discussed. Consequently, machine learning is useful tool for quickly determination of unknown parameters in the numerical model that are sometimes difficult to measure directly.
[1] Wataru Yamamoto, Md. Khorshed Alam, Hiromitsu Takaba, ECS Transactions, 61(13) (2014) 55-61.
8:00 PM - GI01.05.24
Bayesian Inference Enabled Experimental Determination of Materials and Transport Descriptors in Thermoelectrics
D V Repaka1,Ady Suwardi1,Zekun Ren2,Tonio Buonassisi2,3,Kedar Hippalgaonkar1
IMRE A-STAR1,Singapore-MIT Alliance for Research and Technology2,Massachusetts Institute of Technology, Cambridge3
Show AbstractThermoelectrics, which can convert heat into electrical power based on the Seebeck effect and vice-versa based on the Peltier effect can be a very useful alternative source of both electrical and thermal power. So far, machine learning approaches have only been used to predict promising new thermoelectric compositions from density functional theory calculations, building upon open-source databases towards the discovery of high performance materials. Further, high-throughput materials screening approaches are still rudimentary, limited by the lack of universally defined material and thermoelectric transport descriptors. Finally, rapid and accurate materials characterization that can directly measure these descriptors are arduous (for example, method of four coefficients) or do not exist. Our work discloses a rapid and accurate way to determine the material and transport descriptors of thermoelectric performance by feeding simple single-leg power-load experimental data to a Bayesian machine learning algorithm using Boltzmann transport theory. The accurate generalized forward model we have developed allows the use of Bayesian inference demonstrating its utility as an ideal machine-learning (ML) tool for material diagnostics. Employing only two input parameters (temperature gradient and external load resistance) and the observed power output, the Bayesian inference algorithm is able to extract thermoelectric parameters ranging from material-layer properties (Seebeck coefficient, electrical resistivity) to transport-layer characteristics (energy-dependent scattering parameter, band gap offset, etc.) as well as extrinsic contributions such as parasitic contact resistance. In addition, systematic error from measurement can also be identified and corrected. This is made possible since we have devised an experimental setup developed in-house that can generate temperature-dependent data for ML-enabled characterization. Hence, we are able to predict band and transport descriptors that can be measured directly using experiments for the first time. While these reveal the complex dynamics of scattering in thermoelectric materials, we envision that they will also provide universal screening criteria for high performance thermoelectrics in the near future.
8:00 PM - GI01.05.25
Accelerating Molecular Dynamics with On-the-Fly Machine Learning
Jonathan Vandermause1,Steven Torrisi1,Simon Batzner2,Alexie Kolpak2,Boris Kozinsky1
Harvard University1,Massachusetts Institute of Technology2
Show AbstractAb initio molecular dynamics (MD) is a powerful tool for accurately probing the dynamics of molecules and solids, but it is limited to system sizes on the order of 1000 atoms and time scales on the order of 10 ps. We present a scheme for rapidly training a machine learning (ML) model of the interatomic force field that approaches the accuracy of ab initio force calculations but can be applied to larger systems over longer time scales. Gaussian Process (GP) models are trained “on-the-fly”, with density-functional theory (DFT) calculations of the atomic forces performed whenever the model encounters atomic configurations sufficiently far outside of the training set. This active learning scheme includes a principled means of deciding when to run additional DFT calculations, accelerating the model's exploration of parameter space while reducing the time spent training the model. Furthermore, we demonstrate that additional ML models can be trained in parallel to predict other quantities of interest, including the ground state energy and charge density, making it possible to efficiently capture with ML the wealth of information provided by full DFT calculations. We demonstrate the flexibility of our approach by testing it on a range of single- and multi-component molecular and solid-state systems, including benzene, silicon, and silicon carbide.
8:00 PM - GI01.05.26
Prediction of Repeat Unit of Optimal Polymer by Bayesian Optimization
Takuya Minami1,2,Masaaki Kawata3,Toshio Fujita1,2,Katsumi Murofushi1,Hiroshi Uchida1,Kazuhiro Omori1,Yoshishige Okuno1
Showa Denko K. K.1, Research Association of High-Throughput Design and Development for Advanced Functional Materials2,National Institute of Advanced Industrial Science and Technology3
Show AbstractIn recent years, an inverse analysis that can predict materials with desired physical properties has attracted attention as a method that enables rapid design of materials [1].
Bayesian optimization is one of the approaches which efficiently design optimal materials [2]. For examples, numerous successful studies have been reported on Bayesian optimization in inorganic material design [3] and automatic laboratory robots [4]. However, although it is important for industries, there are only a few reports on Bayesian optimization in polymer design. In this study, to confirm the effectiveness of Bayesian optimization for polymer design, we performed the case study on the prediction of repeat unit of polymer with optimal glass transition temperature. To evaluate the number of trial and error for achieving optimal physical properties, a trial and error test was conducted using a known data set. Here, the prediction model was constructed by a part of prepared dataset, and the optimal polymer was explored for the rest dataset. The prediction model was made from the features of structural formulas [5] and the glass transition temperatures of polymers, by using Gaussian process regression. As the result, we found that the Bayesian optimization could significantly reduce the number of trial and error in the prediction of repeat unit of polymer, compared to the random search.
[1] A. Agrawal and A. Choudhary, APL Materials, 4, 053208 (2016).
[2] J. Snoek, et. al., Advances in Neural Information Processing Systems, 25, 2951 (2012).
[3] A. Seko, et. al., Phys. Rev. Lett. 115, 205901 (2015).
[4] D. P. Tabor, et. al., Nature Reviews Materials, 3, 5 (2018).
[5] T. Minami, Y. Okuno, MRS Advances. DOI: 10.1557/adv.2018.454.
8:00 PM - GI01.05.28
Realizing Real-Time Crystallographic and Materials-Based Analysis Using Deep Learning Tools
Jeffery Aguiar1,2,Matthew Gong2,1,Tolga Tasdizen2
Idaho National Laboratory1,The University of Utah2
Show AbstractExtending from the micron to the atomic scale transmission electron microscopy is a powerful research tool for structural and chemical analysis for materials research. The breadth of data collected simultaneously in the latest generation of scanning transmission electron microscopes presents challenges and opportunities for advancements in microscopy, multi-modal data analytics, image-based forensics, and materials research. Recent advancements in deep learning have made it possible to analyze these massive data sets and perform complex imaging tasks. However, deep learning and augmented analysis have not yet disrupted the microscopy and microanalysis community like they have the computer vision community. Breakthroughs in automating and augmenting microscopy data collection and analysis could more than halve research cycle times in fields that rely on microscopy including materials and biological research.
The goal of recent technological developments and research is to create a suite of tools to expand the real-time analytic capabilities of microscopy as well as post-hoc analysis for diffraction-based tools. By applying cutting edge deep learning, computer vision, and signal processing techniques, our team aims to make real-time event tracking and automation of imaging, diffraction, and spectroscopy acquisition a reality in the future. This suite of computational tools and analytical packages is being developed in collaboration with commercial partners, national laboratories, and universities/academia. The software has now been publicly released and designed to draw from standard materials libraries including the Materials Project database and the Open Crystallography database, but also can be further enhanced by research and experimental data from the greater materials community.
In this late breaking poster, we will present our developments and look to the community to further evaluate our emerging real-time augmented feedback framework for materials prediction. This includes pending developments that utilize hybridized first principal and deep learning models for augmented analysis of material properties, spectroscopy, and diffraction patterns. We further look forward to discussing the growing potential of automating data collection from materials-centric data feeds for real-time event tracking and potentially prediction.
8:00 PM - GI01.05.29
Application of Deep Learning Methods to the Analysis of Mass Spectra in Atom Probe Tomography
Scott Broderick1,Arpan Mukherjee1,Krishna Rajan1
University at Buffalo, The State University of New York1
Show AbstractWe present the use of deep learning methods to develop automated methods for the deconvolution of time of flight spectra in atom probe tomography. This work describes the nature of the algorithms underlying the deep learning methods. This new automated process of labeling of atoms replaces the manual process of ranging and shows high accuracy.
8:00 PM - GI01.05.30
Statistical Learning and Prediction of Electronic Transport in Multilayered Non-Ideal Semiconductor Architectures
Sanghamitra Neogi1,Artem Pimachev1
University of Colorado Boulder1
Show Abstract
Computing components are being aggressively inserted into electronic, optical, sensing, robotic, bio-system or energy transport devices, to perform multitudes of data-centric operations at high rates. Modern fabrication techniques have reached the sizes of quantum confinement regime, therefore making it possible to model electronic properties of device components with first principle calculations. The contact interfaces between these components dictate the device performance, especially as the device dimension approaches nanoscale. These interfaces are often marked by point defects, dislocations and additional strains due to lattice mismatch between the components. Ab initio methods become expensive and infeasible to predict electronic properties of integrated architectures with such a great number of compositional and configurational degrees of freedom. In recent years, there has been a large effort in the materials science community to employ data driven methods to accelerate materials discovery or to develop new understanding of materials behavior. However, the number of efforts, employing first-principle based data-driven methods to predict device performance incorporating processing variability, is almost non-existent.
In this study, we employ machine learning (ML) algorithms to predict electronic structure and transport properties of non-ideally fabricated multilayered thin film Si/Ge nanostructures. The ML model is trained on inexpensive ~200 DFT calculations of SixGe1-x substitutional alloys: the training data set is populated exploiting the relationship between geometrical features or local atomic environments in these systems and their electronic structure properties. The predictor variables are obtained with Voronoi tessellation approach and the response variables are calculated with the decision tree regression algorithm. [1] This approach has successfully predicted formation energies to expedite materials discovery. [2] Our ML model trained on random alloys, has shown remarkable ability to predict electronic band structures and Onsager electronic transport coefficients of large non-ideal thin film Si/Ge superlattices. We show the predictive power of our model by comparing the predicted band structures learned from small 16-atom alloy unitcells with the electronic states of large Si/Ge superlattices unfolded to 4x4 monolayer superlattice Brillouin zones [3]. The ML framework has been especially effective in capturing crucial trends in electronic properties for a range of multilayered structures. Our ML framework will facilitate the development of inverse design approach to engineer interface profiles of integrated semiconductor architectures, to accomplish desired device performance and functionalities.
[1] L. Breiman, Mach. Learn. 45, 5 (2001)
[2] L. Ward, R. Liu, A. Krishna, V. I. Hegde, A. Agrawal, A. Choudhary, and C. Wolverton, Phys. Rev. B 96, 024104 (2017)
[3] V. Popescu and A. Zunger, Phys. Rev. B 85, 085201 (2012)
8:00 PM - GI01.05.31
Machine Learning for Perovskite Solar Cells
John Howard1,Marina Leite1
University of Maryland1
Show AbstractPerovskite solar cells have recently reached power conversion efficiency > 22%, representing a promising option for high-performance and low-cost photovoltaic (PV) devices [1]. Yet, their performance is dynamic, varying as a function of time under intrinsic (bias, temperature, and light) and extrinsic (water and oxygen) parameters [2,3]. Because the influence of each stressor on device response depends on the perovskite chemical composition (>9000 options with potential application in PV) and on the order of exposure and the value range of each parameter, an extremely large number of possible combinations is expected. Therefore, we propose a machine learning (ML) paradigm based on an artificial neural network (ANN) to determine the optimal conditions for the perovskites operation, including their ‘reap’, ‘rest’ and ‘recovery’ phases [3]. ‘Reap’ is required for harvesting energy (PV device operation), while both ‘rest’ and ‘recovery’ stages are needed for assuring that the changes in the optical and electrical responses are reversible, preventing material/device degradation and maximizing long-term performance [4]. The successful implementation of the ANN routine for ultimate unsupervised learning relies on our suggested ‘knowledge-shared’ tactic, where researchers from academia, national laboratories and companies will gather positive and negative experimental results in a shared data repository. We highlight that while most ML efforts applied to perovskites focus on screening thermodynamically stable options; our artificial intelligence approach targets the dynamic response of this promising class of material, never exploited before. Our ML strategy will accelerate monitoring and controlling of device performance recovery, paving the way for the fast development of stable perovskites and its commercialization.
[1] E. M. Tennyson et al., ACS Energy Letters 2, 1825 (2017). Invited Perspective
[2] J. M. Howard et al., J. Phys. Chem. Letters 9, 3463 (2018)
[3] J. L. Garrett et al., Nano Letters 17, 2554 (2017)
[4] J. M. Howard et al., Submitted (2018)
Symposium Organizers
Kristofer Reyes, University at Buffalo
John J. Boeckl, Air Force Research Laboratory
Keith A. Brown, Boston University
Stephane Gorsse, Bordeaux INP · ENSCBP
GI01.06: Machine Learning for Imaging, Characterization and Inverse Problems
Session Chairs
Aldair Gongora
Noa Marom
Olga Wodo
Wednesday AM, November 28, 2018
Hynes, Level 1, Room 110
8:30 AM - GI01.06.01
Helium Ion Microscopy for Imaging and Quantifying Porosity at the Nanoscale
Alex Belianinov1,Matthew Burch1,Kyle Mahady1,Holland Hysmith1,Philip Rack1,Olga Ovchinnikova1
Oak Ridge National Laboratory1
Show AbstractNanoporous materials play a key role as components in a vast number of applications from energy to drug delivery and to agriculture. However, a comprehensive analysis to analytically measure and quantify salient features e.g.: surface structure, pore shape, and size, remain limited, or prohibitively expensive. The most common approach is gas absorption, where volumetric gas absorption and desorption are measured. The gas absorption approach has a few fundamental drawbacks such as low sample throughput and a lack of direct surface visualization. In this work, we demonstrate Helium Ion Microscopy (HIM) for imaging and quantification of pores in industrially relevant SiO2 catalyst supports. We start with the fundamental principles of ion-sample interaction, and expand to experiment; where we observe and quantify pores on the surface of the catalyst support by using the HIM, and image data analytics. We contrast our experimental results to gas absorption and demonstrate full statistical agreement between two techniques. The principles behind the theoretical, experimental, and analytical framework presented herein offer an automated framework for visualization and quantification of pore structures in a wide variety of materials, and offer data processing solutions to automate these types of imaging workflows.
Acknowledgements
The HIM imaging, image analytics, and simulations portion of this research was conducted at the Center for Nanophase Materials Sciences, which is a DOE Office of Science User Facility. This research was funded by the Center for Nanophase Materials Sciences, which is a U.S. Department of Energy Office of Science User Facility (H.H., A.V.I., P.D.R., O.S.O.), part of the data analytics work was supported by the Laboratory Directed Research and Development Program (A.B.), and the ExxonMobil Chemical Company (M.J.B.). The authors acknowledge Robert Colby, David Abmayr, Sergey Yakovlev, Lubin Luo, and Bill Lamberti from the Exxon Mobil Corporation for much appreciated input and helpful discussions.
8:45 AM - *GI01.06.02
Thoughts on Automated Electron Microscopy
Mark Ruemmeli1,2
SIEMIS1,CMPW2
Show AbstractThe use of materials in society cannot be underestimated. Indeed, it is often said that our ability to use materials has developed civilizations throughout human history and this is even reflected in how we name historical ages, for example the stone age and the bronze age. As we have developed ever better scientific understanding so has our knowledge base on materials and ways to use them increased. We are now at an age where we are beginning to deal with materials in terms of their processing and properties at the atomic scale. At the same time there is an important drive to discover complex materials and be able to manipulate their properties with an ever grater efficiency. This is leading to an immense drive for instrument automation, data interpretation the development of new ways to do this both with existing and novel instrument and equipment.
In our quest to achieve these goals, particularly at the atomic level, it is clear one of the more important techniques will be microscopy, with electron microscopy based techniques probably playing key roles. The automation and interpretation of data from electron microscopy is enormously challenging. That said, approaches toward these goals are being developed and are expected to grow rapidly in the next decades.
In this presentation some of the advances in automation and data interpretation are presented. Numerous challenges in achieving automation, data interpretation, sample preparation and measurement are discussed. The importance of this for new materials discovery (viz. synthesis (including combinatorial materials science) and structure-property relationships at the atomic scale) are also discussed and early ideas and approaches, such as the development of ins situ/in operando electron microscopy are presented.
9:15 AM - GI01.06.03
Real-Time Tomography with Interactive 3D Visualization Using tomviz
Robert Hovden1,Jonathan Schwartz1,Chris Harris2,Cory Quammen2,Shawn Waldon2,Yi Jiang3,Peter Ercius4,Marcus Hanwell2
University of Michigan1,Kitware Inc.2,Argonne National Laboratory3,Lawrence Berkeley National Laboratory4
Show AbstractThree-dimensional (3D) characterization at the nano- and meso-scale using scanning / transmission electron microscope (S/TEM) is now possible[1,2] but high-throughput tomography still requires innovative tools for reconstruction and visualization of large datasets. Currently, the best tomographic reconstructions are obtained from algorithms that are slow and iterative and will run from hours to days depending upon the size of the data set and the algorithm(s) employed. Thus, it has been a longstanding desire to see a reconstruction and begin 3D analysis before it completes. Continuous feedback provides high-throughput and early diagnoses for 3D structure or opportunity to optimize experimental parameters for maximal reconstruction quality.
Here we demonstrate interactive 3D visualization displayed in real time as tomographic reconstructions proceed and as new data arrives using the open-source tool, tomviz. In the actual software, the 3D visualizations are dynamically updated throughout the computation. This means that scientists need not wait for a reconstruction to complete, or all data to be collected before beginning the interpretation of results. The iterative nature of tomographic methods allows tomviz to show intermediate results with minimal impact on performance. This enables interactive 3D analysis of the current reconstruction state while the reconstruction proceeds on a separate thread. A robust graphical interface allows objects to be rendered as shaded contours or volumetric projections and these objects can be rotated, cropped, or sliced[3]. Experimental nanomaterial datasets were made public and used to validate live reconstruction[4].
Additionally, tomviz visualizes new data as it arrives. As aligned projections are provided from each new specimen tilt, the tomographic reconstruction quality improves in real-time. tomviz accomplishes this by monitoring data / directories for change, and upon arrival of new data, the data is imported and all associated 3D data visualizations are dynamically updated. If data processing routines are present in the pipeline—such as alignment, preprocessing, and reconstruction—all steps will automatically rerun. This capability opens radically new possibilities for developing high-throughput, real-time tomographic reconstruction algorithms. Ultimately, interactive real-time visualization will allow researchers to make early judgments to best answer scientific questions or guide experimental design.
tomviz is publically available for download at www.tomviz.org
[1] D. De Rosier, A. Klug, A. Nature 217, 130-134 (1968)
[2] P.A. Midgley, et al., Chemical Communications, 10, 907-908 (2001)
[3] B. D.A. Leven, Y. Jiang, E. Padgett, S. Waldon, et al. Microscopy Today, 1, 12-16 (2018)
[4] E. Padgett, R. Hovden, J. C. DaSilva, et al, Microscopy and Microanalysis 23, 1150-1158 (2017)
[5] tomviz is supported from DOE Office of Science contract DE-SC0011385.
9:30 AM - *GI01.06.04
Data Science Solution to Unlock Information from Dynamic Nanoscale Imaging
Yu Ding1
Texas A&M University1
Show AbstractDynamic imaging instruments are transforming the landscape of manufacturing. They impart an unprecedented capability to directly observe and control structuring of matter down to the atomistic scale. A critical bottleneck is the lack of data science frameworks capable of converting this high volume, high velocity and complex dynamic data to machine intelligible information. The National Nanotechnology Initiative also identifies the same as one major technical roadblock impeding the design and discovery of new nanoscale materials, namely that "existing methods are time-consuming, expensive, and require high-tech infrastructure and high skill levels to perform." What is pressingly needed is to establish a mathematical and statistical foundation, to formulate relevant data science problems, and to devise machine learning solutions, for enabling reliable and automated processing of dynamic image data and leading to in-process control.
In this talk, we will discuss some recent efforts in formulating dynamic imaging as a foundational data science problem, which is to model, estimate, track, and online update a time-varying, nonparametric probability density function. Material scientists care more about the collective changes in the distribution of nano objects, because the collective change can be connected with the governing dynamics based on foundational physical principles, whereas the change exhibited by any single nano object may not be representative. The collective changes observed by dynamic imaging instruments are supposed to be reflected in this time-varying, nonparametric distribution function. Both retrospective analysis and prospective analysis have been studied in our effort. The retrospective analysis centers on off-line video data, to signal possible change points delineating the stages of growth; doing so produces strong clues about where to concentrate one’s effort to understand the highly stochastic material evolving processes and enable discovery of the unexpected. The prospective analysis is to develop a dynamic and forward-looking model that can track the growth trajectory of a material characteristic, anticipate an upcoming change, and, design specific interventions, to steer the course of the material production towards the designed target through “on the fly” changes of process variables. These data science methods, when successfully developed and validated, should be able to benefit material scientists and manufacturing engineers in their practice of discovery, design, and control.
10:30 AM - GI01.06.05
Machine Learning Enhanced Pair Distribution Function (PDF) Analysis for Local Structure Discovery
Simon Billinge1,2,Chia-Hao Liu1,Yuanzhe Tao1,Ji Xu1,Ran Gu1,Qiang Du1,Daniel Hsu1
Columbia University1,Brookhaven National Laboratory2
Show AbstractDefects, nanoscale structures, interfaces, surfaces, or multi-scale heterogeneities in materials have important impacts on materials properties but are difficult to study. Pair distribution Function (PDF) analysis of x-ray and neutron powder diffraction data is a powerful approach for studying such non-periodic structural signals. However, the degraded information obtained in experiments from defective materials introduces challenges in both modeling and experimental interpretation during PDF analysis. From the modeling aspect, we need a way of generating models that capture signals emerging from the imperfections, and from the experimental aspect, a systematic approach to quantify information content encapsulated in the measured signals is needed. In this presentation, we will introduce a machine-learning aided approach of generating candidate structures used in PDF analysis and we will also propose a statistics approach for quantifying uncertainties of information extracted from PDF analysis.
10:45 AM - GI01.06.06
Automated Materials Classification from Spectroscopy/Diffraction Through Deep Neural Networks
Nicola Ferralis1
Massachusetts Institute of Technology1
Show AbstractSpectroscopy and diffraction techniques are essential for understanding structural, chemical and functional properties of complex and heterogeneous materials system. Beyond data collection, quantitative insight relies on experimentally assembled, or computationally derived spectra. Inference on the chemical or physical properties (such as crystallographic order, chemical functionality, etc.) of a heterogeneous material (for example, inorganic vs organic phases, polytypes, etc.) is based on fitting unknown spectra and comparing the fit with consolidated databases. The complexity of fitting highly convoluted spectra, often limits the ability to infer chemical characteristics, and limits the throughput for extensive datasets. A particularly complex example is the identification of phases and facies from both natural and synthetic mineral systems. With the emergence of heuristic approaches to pattern recognitions though machine learning, in this work we investigate the possibility and potential of using supervised deep neural networks trained on available public spectroscopic database to directly infer and classify materials phases and inferred properties from unknown spectra. Using Raman spectra of minerals from publicly available databases, we train neural network models to classify mineral and organic compounds (pure or mixtures). As expected, the accuracy of the inference is strongly dependent on the quality and extent of the training data. We will identify a series of requirements and guidelines for the training dataset needed to achieve consistent high accuracy inference, along with methods to compensate for limited of data. The flexibility of the approach and the tools developed, can be extended beyond Raman spectra to be applied to any spectroscopy or diffraction technique.
11:00 AM - GI01.06.07
Machine Learning Clustering Technique Applied to X-Ray Diffraction Patterns to Distinguish Alloy Substitutions
Ryo Maezono2,Rutchapon Hunkao1,Keishu Utimula2,Masao Yano3,Hiroyuki Kimoto3,Shogo Kawaguchi4,Sujin Suwanna1,Kenta Hongo2
Mahidol University1,JAIST2,Toyota Motor Corporation3,JASRI4
Show AbstractSmFe12 is one of the candidate of the main phase in rare-earth permanent magnets[1]. The origin of intrinsic properties emerging at high temperature as well as that of the phase stability has not yet been clarified well. Introducing Ti and Zr to substitute Fe and Sm is found to improve the magnetic properties and the phase stability. To clarify the mechanism how the substitutions improve the properties, it is desired to identify substituted sites and its amount quantitatively, preferably with high throughput efficiency for accelerating the 'materials tuning'. Motivated by the above, we have developed a machine learning clustering technique to distinguish powder XRD patterns to get such microscopic identifications about the atomic substitutions.
Ab initio calculations are used to generate supervising references for the machine learning of XRD patterns: We prepared several possible model structures with substituents located on each different sites over a range of substitution fractions. Geometrical optimizations for each model give slight different structures each other. Then we generated many XRD patterns calculated from each structure. We found that the DTW (dynamic time wrapping) analysis can capture slight shifts in XRD peak positions corresponding to the differences of each relaxed structure, distinguishing the fractions and positions of substituents. We have established such a clustering technique using Ward's analysis on top of the DTW, being capable to sort out simulated XRD patterns based on the distinction.
The established technique can hence learn the correspondence between XRD peak shifts and microscopic structures with substitutions over many supervising simulated data. Since the ab initio simulation can also give several properties such as magnetization for each structure, the correspondence in the machine learning can further predict functional properties of materials when it is applied to the experimental XRD patterns, not only being capable to distinguish the atomic substitutions. The 'machine learning technique for XRD patterns' developed here has therefore the wider range of applications not limited only on magnets, but further on those materials which properties are tuned by the atomic substitutions.
[1] K. Kobayashi et al., J. Magn. Magn. Mater. 426, 273 (2017).
11:15 AM - GI01.06.08
Deep Learning for Inverse Imaging Problems
Rama Vasudevan1,Nouamane Laanait1,Ondrej Dyck1,Maxim Ziatdinov1,Stephen Jesse1,Andrew Lupini1,Junqi Yin1,Mark Oxley1,Sergei Kalinin1
Oak Ridge National Laboratory1
Show AbstractInverse problems in imaging constitute some of the most interesting and challenging tasks, with examples ranging from tomographic reconstruction to protein structure determination. Here, we present an approach to tackling the inverse problem in imaging via large-scale forward modeling of many physically realizable examples, and then using these simulations to train deep neural networks to learn the inverse mapping. We show this method with two test cases: in the first, we introduce a deep convolutional neural network trained on simulated 2D diffraction patterns, to perform a classification task to sort each diffraction pattern into one of the five possible Bravais lattice types. The DCNN is tested on experimental imaging data from both scanning transmission electron microscopy and scanning tunneling microcopy, and is found to be robust, providing advantages over more traditional thresholding-based methods and is largely insensitive to both scale and rotations of the lattice. By using Monte-Carlo dropout on the predictions, a confidence estimate is obtained for each symmetry classification.
In the second example, we train a 3D convolutional neural network on hundreds of gigabytes of simulated convergent beam electron diffraction patterns of the LaAlO3-SrTiO3 interface. The network is trained on simulated data to classify whether the interface is diffuse or stepped in nature and shows extremely high accuracy on a simulated validation set. The approach outlined here is general to any situation where forward simulations exist, but wherein the inverse is difficult to obtain using traditional means.
The work was supported by the U.S. Department of Energy, Office of Science, Materials Sciences and Engineering Division. Research was conducted at the Center for Nanophase Materials Sciences, which also provided support and is a US DOE Office of Science User Facility. This research used resources of the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
11:30 AM - GI01.06.09
Accelerating Materials Development Through Rapid Analysis of Experimental Data Using Machine Learning Based Tools
Marcus Schwarting1,Caleb Phillips1,Andriy Zakutayev1,Robert White1,Kristin Munch1,Magali Ferrandon2,Deborah Myers2,John Perkins1
National Renewable Energy Laboratory1,Argonne National Laboratory2
Show AbstractTo effectively use high-throughput experiments (HTE) as envisioned in the Materials Genome Initiative (MGI) requires not only rapid experiments but also the ability to efficiently transform the resultant large sets into usable knowledge. To address this issue, we are developing machine learning based tools to address the specific challenge of rapid analysis of large amounts of x-ray diffraction (XRD) data including factoring and clustering analysis along with custom data visualizations. These capabilities will be demonstrated by application to both combinatorially grown Co-Zn-Ni-O thin film and bulk powdered Fe-N-C catalyst materials. The composition-gradient thin-film libraries were grown on glass substrates using off-axis co-sputtering and the powdered Fe-N-C materials were synthesized by pyrolysis of Fe substituted ZIFs. XRD patterns were measured using a commercial (Bruker D-8 Discover) diffractometer equipped with either a 2D detector (thin film libraries) or a point detector (bulk powder samples). The developed tools are built in Python and leverage the machine learning algorithms available in scikit-learn (scikit-learn.org). After testing a variety of algorithms, we are currently using Orthogonal Matching Pursuit (OMP) for factoring analysis and Spectral Clustering for clustering analysis. Reference XRD patterns for factoring are pulled from the ICSD for known materials as well as from MatDB (materials.nrel.gov) and the Materials Project (materialsproject.org) for unknown or unmeasured materials. Experimental data can be pulled from HTEM-DB (htem.nrel.gov), a recently-launched public-facing database of experimental synthesis and property data for inorganic thin film materials with more than 50,000 entries. Select data visualization and machine learning based tools have been implemented in unmix_xrd, our custom Python package which combines the analysis and visualization into simple single line command calls to facilitate use. For deployment, we are testing locally hosted and cloud hosted Jupyter notebooks with the latter aimed at creating free standing analysis tools intended for deployment on the Energy Materials Network (EMN) such as via the ElecroCAT datahub (datahub.electrocat.org). In addition, this analytics package can be run from within program capable of issuing a system command which we will demonstrate using Igor Pro, a commercial analysis program widely used at NREL. The overall result is a scientist-friendly extensible analysis environment with project specific machine learning analysis and data visualizations.
11:45 AM - GI01.06.10
Learning Phase Diagrams from Local Measurements—A Statistical-Mechanical Autoencoder Approach
Rama Vasudevan1,Lukas Vlcek1,Sergei Kalinin1
Oak Ridge National Laboratory1
Show AbstractIn the traditional materials optimization paradigm, a material system is characterized by forming many different samples across different compositions and measuring their properties as a function of e.g. temperature or pressure to produce a phase diagram. As such, each measurement represents a single point in chemical space, and spanning the phase diagram requires many samples and measurements which are time-consuming and expensive. At the same time, small fluctuations in the samples are considered a hindrance and limiting factor to the interpretation of the results. Here, we present an alternative approach that instead exploits this fact, leveraging statistical physics to take atomic observations of structural and chemical fluctuations to map them into generative lattice models that have predictive power. We use a newly developed statistical distance framework to perform model optimization based from imaging data, and then use the model to produce configurations for a range of compositions and temperatures. We apply the method to understand segregation in FeSexTe1-x single crystal, attempting to identify the segregation tendency of the chalcogen atom, as well as the divalent cation segregation in a manganite thin film. The generative model produces configurations that well approximate those observed, and further are run for different temperatures and compositions. A variational autoencoder is then applied on the simulated configurations to map the observations to a single latent parameter, allowing easy visualization of any anomalies in the phase diagram. In this way, a measurement of a single point in chemical space is turned into a prediction across a finite range of chemical space. This method is general and can be used to add to the current materials design paradigm, providing more information on microscopic driving forces, and reducing the need to finely sample the chemical space.
The work was supported by the U.S. Department of Energy, Office of Science, Materials Sciences and Engineering Division (R. K. V., S. V. K., L.V). Research was conducted at the Center for Nanophase Materials Sciences, which is a DOE Office of Science User Facility.
GI01.07: High-Throughput Methods for Enumerating and Searching Combinatorial Spaces II
Session Chairs
John J. Boeckl
Keith A. Brown
Aldair Gongora
Wednesday PM, November 28, 2018
Hynes, Level 1, Room 110
1:30 PM - *GI01.07.01
Tools for Automated, High Throughput Exploration of Process-Structure-Property Relationships
Olga Wodo1
University at Buffalo1
Show AbstractThe microstructure of a material intimately affects the performance of a device made from this material. The microstructure, in turn, is affected by the processing pathway used to fabricate the device. This forms the process–structure–property triangle that is central to materials science. There has been growing interest to comprehensively understand and subsequently exploit process–structure–property (PSP) relationships to design processing pathways for optimal microstructures. However, unraveling process–structure–property relationships usually requires systematic and tedious combinatorial search of process and system variables to identify the microstructures while generating high dimensional datasets. This is further complicated by the necessity to interrogate the properties of the huge set of corresponding microstructures. Motivated by this challenge, we focus on developing a generic methodology to establish and explore PSP pathways.
We leverage recent advances in cloud computing to execute high throughput exploration of PSP spaces. Our key idea is that PSP exploration can be naturally formulated in terms of a standard paradigm in cloud computing, namely the MapReduce programming model. We show how reformulating PSP exploration into a MapReduce workflow enables us to take advantage of advances in cloud computing while requiring minimal specialized knowledge of high performance computing.
To address high dimensionality challenge related to microstructure evolution during processing we develop a novel nonlinear manifold learning methodology that overcomes the presence of strong temporal correlation among observations belonging to the same process pathways. Although the methodology is applicable to any dynamic process, in this talk I show how mapping data obtained through simulations to a low-dimensional manifold facilitates better understanding of the underlying processes, and ultimately their optimization.
Both key ingredients of general methodology will be illustrated through relevant questions in the area of organic electronics.
2:00 PM - GI01.07.02
Large Scale Data Mining of Coordination Environments in Oxides
Geoffroy Hautier1,David Waroquiers1,Xavier Gonze1,Gian-Marco Rignanese1,Catherin Welker-Nieuwoudt2,Frank Rosowski2,Stephan Schenk2,Peter Degelmann2,Robert Glaum3
University Catholique de Louvain1,BASF Corporation2,Bonn University3
Show AbstractCoordination environments (e.g, tetrahedra and octahedra) are powerful descriptors of the structure of a solid. An automatic and robust detection of these environment is an important step towards the data mining of the large databases (experimental or theoretical) currently available to materials scientists. In this work, we present a tool to automatically determine coordination environments in a given structure. The identification is performed based on the sole consideration of the geometrical knowledge of the structure. Distortions are taken into account and we allow the description of an environment as a mixture of several environments. After outlining our algorithm, we will illustrate the approach by presenting a statistical analysis of coordination environments for all oxides from the Inorganic crystal structure database (ICSD). We will discuss the implication of our study to the understanding of crystal chemistry in oxides and outline how this tool can be used to accelerate the materials design process.
2:15 PM - GI01.07.03
Materials Discovery for Thermal Energy Storage—A High Throughput Computational and Machine Learning Approach
Steven Kiyabu1,Alauddin Ahmed1,Jeffrey Lowe1,Donald Siegel1
University of Michigan1
Show AbstractSalt hydration reactions show great promise for thermal energy storage (TES) due to their high energy densities, cost effectiveness, and potential for reversible operation at moderate temperatures. While a number of salt hydrate compositions have been investigated previously for TES, many have yet to be explored. The goal of this work is to identify hypothetical salt hydrate structures that are thermodynamically stable and can out-perform known materials. All 25 distinct crystal structures of the form MXm●nH2O (where Mm+ is a metal cation, X1- is a halide, and n ≥ 6) found in the Inorganic Crystal Structure Database were used as structural templates. A total of 1,824 hydrate structures were generated from systematic cation and anion substitution and were characterized by Density Functional Theory calculations according to their energy densities and operating temperature range. A variety of classification and regression machine learning (ML) algorithms were trained on the data to predict stability as well as TES performance. Thousands of mathematical combinations of the basic ionic and structural properties were generated and the most promising features were selected from this list to be used in the ML algorithms. In addition to identifying new, promising materials for TES, our study identified which features are relevant to TES performance as well as predictive models that can be used to further accelerate our screening of hypothetical salt hydrate structures for TES.
3:30 PM - *GI01.07.04
Accelerated Search for Materials with Targeted Properties
Turab Lookman1
Los Alamos National Laboratory1
Show AbstractFinding new materials with targeted properties with as few experiments as possible has been a goal of the materials genome initiative. The enormous complexity due to the interplay of structural, chemical and microstructural degrees of freedom in materials makes the rational design of new materials rather difficult. Machine learning and optimization, used in industry for solving complex problems, are increasingly being adapted for the design of new materials by learning from past data and making smart decisions about what the test next. However, the number of well characterized samples available as sources of data to learn from is often relatively small. I will review how we have utilized Bayesian Global Optimization to iteratively guide experiments to discover new alloys and ceramics.
4:00 PM - GI01.07.05
Accelerated Search for Ultra-Incompressible, Superhard Materials Through Machine Learning
Aria Mansouri Tehrani1,Jakoah Brgoch1
University of Houston1
Show AbstractIn the search for materials with exceptional mechanical properties, we have developed a machine-learning model to predict the elastic moduli of inorganic materials, which act as a proxy for hardness. Materials project database of elastic moduli has been used as the training set and the machine learning model is developed using support vector regression method implementing 150 compositional and structural variables. Further, a genetic algorithm-based variable selection is performed using partial least square regression method resulting in a cross-validated root mean square error (RMSE) of 17.2 GPa and 16.5 GPa for bulk and shear modulus respectively. Subsequently, 118,287 compounds from crystalline databases are screened regardless of their chemical composition and atomic disorder for compounds with high bulk and shear moduli having potential for superhardness. We then identified compounds of two interest, a ternary rhenium tungsten carbide and a quaternary molybdenum tungsten borocarbide for experimental investigation. These materials synthesized using arc melting and characterized with high-pressure diamond anvil cell measurements to confirm the machine learning predictions with <10% error. Vickers microhardness measurements revealed the extremely high hardness nature of these compounds making these the hardest transition metal carbide and borocarbide reported. The successful identification of these superhard materials using state-of-art machine learning and materials screening techniques emphasizes the effectiveness of these methods in materials discovery and development.
4:15 PM - GI01.07.06
Machine Learning for Searching the Stable Structure of Crystal Interface
Teruyasu Mizoguchi1,Shin Kiyohara1
The University of Tokyo1
Show AbstractInterfaces are a lattice defect inside materials, and influence to the overall material properties. For instance, interfaces in polycrystalline materials, i.e., grain boundaries (GB), determine ion transportation properties and high-temperature mechanical properties. The fact that interfaces have different properties from the bulk is a consequence of the fact that they have different atomic configurations from that inside the bulk. Thus, for a comprehensive understanding of the interface properties, determination of the atomic structure of the interface is crucial.
However, extensive calculations are necessary to determine even one interface structure because of the geometrical freedom of the interface. The number of atomic configurations to be considered often reaches 10,000 in even the simplified coincidence site lattice grain boundary, namely Σ grain boundaries. To accelerate interface structure searching, very efficient methods based on machine learning techniques, including virtual screening and Bayesian optimization have been proposed by the present authors.
We applied the virtual screening method to [001] symmetric tilt GB of Cu. The predictor was constructed using two Σ5 and two Σ17 GBs, and totally 83 descriptors related to the geometrical data, such as a bond length and atom density, were used. The constructed predictor successfully determined other 12 GBs of Cu, Σ13 ~ Σ125 [1].
In addition to the virtual screening, we have developed an alternative and powerful method with the aid of a geostatistics approach called kriging[2]. Kriging is an effective interpolation method based on a Bayesian optimization and Gaussian process governed by prior covariance. The kriging method has been applied and demonstrated to determine the grain boundaries of fcc-Cu, bcc-Fe, MgO, rutile-TiO2, and CeO2 GBs [3-4].
The details of those studies will be shown in my presentation.
[1] S. Kiyohara, H. Oda, T. Miyata, and T. Mizoguchi, Sci. Adv. 2(11), e1600746 (2016).
[2] S. Kiyohara, H. Oda, K. Tsuda, and T. Mizoguchi, Jpn. J. Appl. Phys. 55(4), 2 (2016).
[3] H. Oda, S. Kiyohara, K. Tsuda, and T. Mizoguchi, J. Phys. Soc. Japan 86(12), 123601 (2017).
[4] S. Kikuchi, H. Oda, S. Kiyohara, and T. Mizoguchi, Physica B, 532 (2018) 24-28.
[5] This study is supported by JST-PRESTO, MEXT, and Tenkai-kenkyu.
4:30 PM - GI01.07.07
Closed-Loop Optimization of Battery Fast-Charging Protocols
Peter Attia1,Aditya Grover1,Norman Jin1,Kristen Severson2,Bryan Cheong1,Jerry Liao1,Michael Chen1,Zi Yang3,Nicholas Perkins1,Muratahan Aykol4,Patrick Herring4,Stephen Harris5,Richard Braatz2,Stefano Ermon1,William Chueh1
Stanford University1,Massachusetts Institute of Technology2,University of Michigan–Ann Arbor3,Toyota Research Institute4,Lawrence Berkeley National Laboratory5
Show AbstractFast charging protocols for lithium-ion batteries are critical for widespread adaptation of electric vehicles. However, a limited understanding of battery degradation modes during fast charging and the large manufacturing variability of commercial lithium-ion batteries are major challenges to the development of high-performing fast charging protocols. In this work, we optimize a three-step charging protocol for commercial 18650 lithium-ion batteries that achieves 80% state of charge in ten minutes. We employ two key elements to reduce the optimization cost: early prediction of failure, which uses cycling data from the first 100 cycles to predict cycle lives that reach up to 1200 cycles, and adaptive Bayesian optimal experimental design, which significantly reduces the number of experiments required. We identify promising fast charging protocols with identical charging times but lifetimes that exceed the baseline charging protocol out of a candidate pool of nearly 180 protocols. This method can be extended to accelerate development of other tasks in battery manufacturing and deployment, such as formation cycling and state-of-health estimation.
4:45 PM - GI01.07.08
Artificial Intelligence in Accelerated Materials Development—Case Study on Stober Process
Mikhail Kovalev1,2,Shuyun Chng1,Joachim Yam1,Linda Wu1
Singapore Institute of Manufacturing Technology1,A*STAR2
Show AbstractRecent advancements in Data Science for materials development and automated chemical synthesis methods will soon lead to the dramatic improvement in the accelerated materials development. Machine learning algorithms work efficiently with high dimensional spaces, and this could be used for optimization and prediction of the innovative material properties that is faster than traditional materials screening [1]. The development of the algorithms for rapid exploration of gigantic materials functional space is challenging while rapidly converging onto ‘manufacturable’ materials that adhere to constraints of cost and environmental impacts, and provide a step-change in performance.
There are a number of examples that demonstrated successful application of machine learning algorithms for the optimization of novel materials exploration. For example, Wolf et.al. [2] used evolutionary approach to optimize eight oxides components of the catalysts based on it. This demonstrated the possibility of the acceleration for the catalysts development compared with traditional screening methods. The successful application of genetic algorithms for formulation optimization was later used in number of research papers to show that it could be effective for other catalysts design as well [3,4].
In this work, we demonstrate how machine learning may improve development speed, using traditional Stober Process [5] for silica nanoparticle synthesis as a model system. The Stober process is a well-known technique for silica nanoparticle synthesis; it could be proceeded with different silica precursors, pH media and solvents. This system is highly dimensional and the challenge is to predict how components and synthesis parameters may influence the resulting particle size or if the system will lead to a mixture gelation.
Systematic study for the synthesis of silica nanoparticles (SiNPs) via base-catalysed sol-gel method demonstrated the versatility of the method, with the controlled synthesis of silica nanoparticles of specific size ranges, namely sub-10 nm, 20 - 30nm, 50 - 100nm and 150 - 200nm. The initial set of the experiments were used as a-priori knowledge for model building and training. The machine learning models developed in accordance to fit into multidimensional parameter space such for particles size prediction using ensemble methods showed good results when DoE optimization genetic algorithms used.
Refrences:
[1] T. C. Le, D. A. Winkler. Chem. Rev. 2016, 116 (10), 6107-6132.
[2] D. Wolf, O. V. Buyevskaya, M. Baerns. Appl. Catal., A. 2000, 200, 63– 77.
[3] Y. Yamada, A. Ueda, K. Nakagawa, T. Kobayashi. Res. Chem. Intermed. 2002, 28,397– 407
[4] Y. Kobayashi, K. Omata, M. Yamada. Ind. Eng. Chem. Res. 2010, 49, 1541– 1549
[5] W. Stober, A. Fink, E. J. Bohn. Colloid Interface Sci., 1968, 26, 62
Symposium Organizers
Kristofer Reyes, University at Buffalo
John J. Boeckl, Air Force Research Laboratory
Keith A. Brown, Boston University
Stephane Gorsse, Bordeaux INP · ENSCBP
GI01.08: Machine Learning Enhanced Computational Materials
Session Chairs
Soojung Baek
Keith A. Brown
Patrick Riley
Thursday AM, November 29, 2018
Hynes, Level 1, Room 110
8:00 AM - *GI01.08.01
Soft Matter Design and Characterization in the Era of Machine Learning
Nicholas Jackson1,Juan de Pablo1
University of Chicago1
Show AbstractThe advent of innovative molecular modeling algorithms, optimization strategies, and machine learning techniques is ushering a new era of materials science and engineering in which computational tools are routinely used to probe, design, and interrogate matter and functional materials systems. In this presentation I will illustrate some of these ideas in the context of a variety of examples taken from chemical engineering, physics, biology and materials science. In the first, I will discuss the simultaneous interpretation of scattering data from multiple sources by relying on molecular models. In the second I will present models of biological systems that use machine learning to integrate experimental and computational information form a wide range of sources, and to discover collective variables that can be used to enhance sampling. In a third demonstration, I will explain how machine learning can by itself be used to improve and accelerate enhanced sampling algorithms. In a fourth example, I will discuss how evolutionary optimization and machine learning can be used to create new mechanical metamaterials, and to predict electronic transport in organic conductors.
8:30 AM - GI01.08.02
Pathway to Systematic Improvement of Machine Learning Based Force Fields Using Active Learning
James Chapman1,Rohit Batra2,Huan Tran2,Ghanshyam Pilania3,Blas Uberuaga3,Rampi Ramprasad1
Georgia Institute of Technology1,University of Connecticut2,Los Alamos National Laboratory3
Show AbstractEmerging machine learning (ML)-based atomic force fields provide a powerful tool to accurately model a variety of large-scale (length-scale > nm and time-scale > ns) physical and chemical processes, which remain outside the realm of quantum mechanical methods—such as density functional theory (DFT)—owing to their computational cost [1,3,4]. In the past, we proposed one such approach to construct a ML-based force field, where only the vectorial force on an atom is “learned” directly from its atomic environment [1]. The capability and transferability of this approach was demonstrated by accurately reproducing the structural, mechanical, transport and vibrational properties of elemental Al [1,2], and correctly modeling atomic forces for diverse elemental solids such as Cu, Ti, W, Si and C [5]. In this contribution, we showcase the true power of such ML-based force fields, the capability to systematically improve by actively learning from “failed” cases. The general strategy is based on the idea of repeated augmentation of poorly predicted configurations to the reference dataset, and re-training the model in a cyclic manner. Such a pathway for the iterative improvement of these force fields is straightforward, efficient, and universal. Here, we demonstrate this approach for Cu, Al and Pt using actively trained ML-based force fields that capture an array of phenomena such as surface diffusion mechanisms, stacking fault energies, di-vacancy behaviors and screw dislocation core structure. We conclude that this simple strategy can be adopted to construct targeted, application-specific ML force fields, extending our ability to capture materials phenomena which lie beyond the current reach of classical and/or quantum mechanical methods.
[1] V. Botu et al, J. Phys. Chem. C, (2017)
[2] V. Botu et al, Phys. Rev. B. (2015)
[3] J. Behler et al, Phys. Rev. Lett. (2007)
[4] A. Bartok et al, Phys. Rev. Lett. (2010)
[5] T. Huan et al, npj Comp. Mat. (2017)
8:45 AM - GI01.08.03
Reliable and Transferable Machine-Learning Potentials by Overcoming Sampling Bias
Wonseok Jeong1,Kyuhyun Lee1,Dongsun Yoo1,Seungwu Han1
Seoul National Univ1
Show AbstractThe machine-learning potentials (MLP) are anticipated as promising next-generation interatomic potentials for its self-learning capability and bonding character invariant nature. While MLPs have been demonstrated to be very accurate on describing the potential energy surface (PES) of various atomic systems, we found that MLP can suffer from inaccurate description of PES even with the structures included in the training set. The training failure is caused by highly inhomogeneous distribution of training data points within the featurespace. Due to the problem, MLPs may cause a catastrophic simulation failure and/or inaccurate description of physical properties. In this presentation, we discuss the sampling bias problem with neural network potentail (NNP), which is one of the most studied and powerful MLP. To overcome the problem, we suggest a novel indicator of the Gaussian density function (GDF) that quantifies the sparsity of training points, and propose a weighting scheme that can effectively rectify the sampling bias. With various examples, we show that that the GDF weighting significantly improves reliability of NNPs compared to the conventional training method. We also confirm that the transferability and atomic energies mapping accuracy of NNPs improve when the weighting scheme is applied. The present method can be equally applied to other machine-learning potentials and will extend the application range of the machine-learning potential.
9:00 AM - GI01.08.04
Neural Network Force Field for Molecular Dynamics of Multi-Element Polymer System
Jonathan Mailoa1,Mordechai Kornbluth1,Georgy Samsonidze1,Boris Kozinsky1,2
Bosch1,Harvard University2
Show AbstractThe scale of typical polymer molecular dynamics (MD) simulation usually involves at least several thousand atoms, which is necessary because the motion and dynamics of polymer chains strongly depends on the length of the polymer chains. This in turn makes it impossible to use ab-initio MD on systems which are representative of real polymer system, making it necessary to fit ab-initio data from separate fragments of organic functional groups in gas phase to classical force field with complex functional forms (such as the OPLS force field, for example). These functional forms typically do not describe chemical reactions due to its definition of explicit unbreakable atomic bonds. In addition, they do not accurately describe ionic interactions and other properties which involve electronic polarizability and charge transfer of atoms which only shows up under quantum mechanics in real bulk atomic systems.
In this work, we develop a multi-element neural network force field fitting procedure suitable for simulating randomly large bulk atomic systems such as polymers. We show that atomic snapshots under a certain radius of the central atom contains sufficient information to describe quantum mechanical atomic forces accurately. Consequently, small atomic cluster snapshots can be simulated within quantum mechanical simulations in reasonable time scale to generate the training set (in the order of several tens of thousands of small atomic snapshots). Force vector projection on internal structural axes in combination with Behler-Parrinello style fingerprinting scheme enables direct training of atomic force vectors of individual atom elements, as opposed to training on the aggregate scalar energy of the entire ab-initio ensemble in the original Behler-Parrinello scheme. The neural network force field fitting algorithm will be presented, as well as accuracy results for polyethylene oxide (PEO) polymer typically used in Lithium battery polymer electrolytes.
9:15 AM - GI01.08.05
A Direct and Local Deep Learning Model for Atomic Forces in Solids
Nataly Kuritz1,Goren Gordon1,Amir Natan1
Tel Aviv University1
Show AbstractThe evaluation of atomic forces and total energy is a key challenge for large-scale atomistic simulations of materials. Ab-initio molecular dynamics (AIMD) is a successful approach but it becomes computationally too expensive for large systems. In this work, we demonstrate a direct and local Deep Learning (DL) model for atomic forces. We demonstrate this model for bulk aluminum, silicon and sodium and show that the model errors are comparable to other state of the art algorithms. Our model allows the calculation of forces in large cells using a training data that we built from smaller cells that were calculated with Density Functional Theory (DFT). In addition, we examine the question of temperature transferability of the mode and show that we can train the model with data that was produced at a high temperature and then test it on data that was produced at lower temperatures. Finally, we show that the physical properties of the system (e.g. number of nearest neighbors) is manifested in the model convergence with respect to some of its parameters.
10:00 AM - *GI01.08.06
Molecular Crystal Structure Prediction with GAtor and Genarris
Noa Marom1
Carnegie Mellon University1
Show AbstractMolecular crystals are bound by dispersion interactions, whose weak nature produces potential energy landscapes with many local minima. Hence, molecular crystals often exhibit polymorphism, whereby the same molecule crystallizes in several structures. Polymorphs may exhibit markedly different physical and chemical properties. Crystal structure prediction is challenging due to the high accuracy required for the small energy differences between polymorphs and the high dimensionality of the configuration space. We present the genetic algorithm (GA) code, GAtor [J. Chem. Theory Comput., 14, 2246 (2018)], and its associated structure generation package, Genarris [J. Chem. Phys.,148, 241701 (2018)]. Both rely on dispersion-inclusive density functional theory (DFT) for geometry relaxations and energy evaluations.
Genarris generates random structures with physical constraints and uses a Harris approximation to construct the electron density of a molecular crystal by superposition of single molecule densities. The DFT energy is then evaluated for the Harris density without performing a self-consistent cycle, enabling fast screening of initial structures with an unbiased first-principles approach. Genarris creates a maximally diverse initial pool of structures by using machine learning for clustering based on structural similarity with respect to a relative coordinate descriptor (RCD) designed for molecular crystals.
GAs rely on the evolutionary principle of survival of the fittest to perform global optimization. GAtor offers a variety of crossover and mutation operators, designed for molecular crystals, to create offspring by combining/modifying the structural genes of parent structures. GAtor achieves massive parallelization by spawning several GA replicas that run in parallel and read/write to a common pool of structures. GAtor performs evolutionary niching [Faraday Discussions, DOI: 10.1039/C8FD00067K (2018)] by using machine learning for dynamic clustering on the fly. A cluster-based fitness function is then used to steer the GA to under-sampled low-energy regions of the potential energy landscape. This helps overcome initial pool biases and selection biases.
10:30 AM - GI01.08.07
Machine Learned Models for Transition Metal Dichalcogenide
Henry Chan1,Mathew Cherukara1,Badri Narayanan1,Subramanian Sankaranarayanan1
Argonne National Laboratory1
Show AbstractTransition metal dichalcogenide (TMD) are novel nanomaterials that can behave like conductors, semiconductors, or insulators depending on the type of transition metal used. With a thickness as small as 3 atoms and size dependent properties, TMDs have a great potential in applications such as flexible and wearable electronics. Despite the early discovery of TMDs and their synthesis via vapor deposition, fundamental understanding on their growth mechanisms remains largely unknown, which hindered the preparation of these materials on a larger scale. Molecular simulations can be used to address this problem, but the lack of accurate interatomic potentials and the large effort required to develop them presents a major barrier. Here, we demonstrate the use of a machine learning based framework in the development of a reactive force field model for tungsten diselenide. Our data-driven procedure led to a model that accurately captures various properties, including structures, dynamical stability, phonon, and various energetics, which is obtained without the need to rely heavily on human intuitions or lengthy development time. With the model, we perform molecular dynamics simulations to investigate the growth of 2D tungsten diselenide under different vapor deposition conditions as well as to study their mechanical properties.
10:45 AM - GI01.08.08
Developing Computationally Efficient Potential Models by Machine Learning
Alberto Hernandez1,Fenglin Yuan1,Tim Mueller1
Johns Hopkins University1
Show AbstractInteratomic potential models, or force fields, are used in materials science and engineering, chemistry, biology and physics to perform atomistic calculations at extended length and time scales. As the computational cost of a potential model limits the length and time scales of a simulation, we have developed a machine learning algorithm based on genetic programming to discover computationally efficient and parsimonious potential models. Our approach was validated by rediscovering the Lennard Jones potential and the Sutton Chen embedded atom model from training data generated using these models. By using training data generated from density functional theory calculations, we found simple and fast potential models for elemental systems. We present our approach, the forms of the discovered models, and assessments of their transferability, accuracy and speed.
11:00 AM - GI01.08.10
Using Machine Learning for Efficient Extraction of Higher Order Force Constants in Solids
Fredrik Eriksson1,Erik Fransson1,Paul Erhart1
Chalmers University of Technology1
Show AbstractHigher order force constants are essential for the description of, e.g., thermal transport and metastable materials. They originate in the theory of lattice vibrations and can be used in perturbative approaches as well as atomistic simulations. Usually, the force constants of second and third order are obtained systematically by enumeration. The underlying crystal symmetry is exploited to constrain the force constants and reduce the number of independent calculations. This approach, however, scales badly with increasing order and for systems with low symmetry. This results in a steep increase of the number of reference calculations (typically based on density functional theory) whence this systematic approach is limited.
In this contribution we demonstrate how techniques from machine learning can be exploited to dramatically reduce the number of reference calculations and break the unfavorable scaling with system size and symmetry. Our implementation enables us to extract force constants (up to fourth order and beyond) even for systems with low symmetry and large primitive unitcells. This is demonstrated by applications to e.g. transition metal dichalcogenides, clathrates and the metastable phases of transition metals.
11:15 AM - GI01.08.11
Are There “Genes” for Fluorescent DNA-Stabilized Silver Clusters?
Alexander Gorovits1,Petko Bogdanov1,Stacy Copp2,Steven Swasey3,Elisabeth Gwinn3
University at Albany, State University of New York1,Los Alamos National Laboratory2,University of California, Santa Barbara3
Show AbstractDNA-stabilized silver clusters (Ag-DNAs) are composed of ~10-30 silver atoms wrapped in short strands of DNA [1]. Ag-DNAs can be highly fluorescent and are finding use for chemical sensing and DNA detection [2]. The nucleobase sequence that stabilizes a cluster controls silver cluster size, leading to well-known yet little-understood sequence-tuned Ag-DNA fluorescence colors. The large space of possible stabilizing sequences challenges the understanding of how sequence selects Ag-DNA properties, limiting the design of new applications for Ag-DNA. Are there subsequences, akin to genes, that are important for stabilization of products with desired properties? Can these subsequences be used for predictive Ag-DNA design?
To answer these questions, we develop a closed-loop framework combining machine learning and experimentation. We determine the fluorescence spectra of Ag-DNAs stabilized by over 1000 distinct DNA oligomers of fixed length using high-throughput synthesis and fluorimetry. Then, we mine for DNA subsequences, or motifs, that are discriminative of fluorescence brightness and color. As a result, each DNA template is represented by a feature vector encoding the inclusion/exclusion of discriminative motifs. We train classification models based on the above training data and further employ those to generate and screen new sequences, improving both fluorescence brightness [3] and the selectivity for a desired Ag-DNA color [4]. Our methodology improves color selectivity by 330% for Ag-DNAs with peak emission beyond 660 nm. The discovered motifs also provide physical insights into how DNA sequence controls silver cluster size and color. This data-driven design approach for color of DNA-stabilized silver clusters demonstrates the potential of machine learning and data mining to increase precision and efficiency of nanomaterials design.
In our ongoing work, we extend these methods to varying-length templates and Ag-DNAs emissive at unobserved wavelengths. Are the motifs detected in a training set of fixed-length DNA templates predictive when templating with DNAs of different length? Furthermore, is it possible to employ our data-driven framework to predict and design for unobserved Ag-DNA colors in the near-infrared? Strong evidence for the utility of motifs in the above tasks will support the hypothesis of “motifs serving as genes” for Ag-DNAs and further direct the study of these important materials.
[1] E. Gwinn, D. Schultz, S. Copp, and S. Swasey, Nanomaterials 5, 180 (2015).
[2] J. T. Del Bonis-O’Donnell, D. Vong, S. Pennathur, and D. K. Fygenson, Nanoscale 8, 14489 (2016).
[3] S. M. Copp, P. Bogdanov, M. Debord, A. Singh, and E. Gwinn, Adv. Mater. 26, 5839 (2014).
[4] S. M. Copp, A. Gorovits, S. M. Swasey, S. Gudibandi, P. Bogdanov, and E. G. Gwinn, ACS Nano 12, 8240 (2018).
11:30 AM - GI01.08.12
Data Generated Formula for Creep and Stress Relaxation
Tong-Yi Zhang1,Sheng Sun1
Shanghai University1
Show AbstractAnalytic formula is the best and ideal means in the clear description of time-, stress-, and temperature-dependent deformation behavior and in the understanding of the deformation mechanism. To have such a formula for creep and stress relaxation, multi-temperature creep, stress-relaxation, and indentation creep tests were conducted on nanograined copper and nanotwinned copper under various loading levels to generate sufficiently large number of data, which allows one to develop an analytic formula of plastic strain rate versus temperature and stress (or hardness in indentation creep) by using symbolic regression with domain knowledge. The analytic formula explicitly involves the athermal stress (or hardness) exponent, activation energy, and activation volume, which values are determined reliably from the experimental data from the multi-temperature tests.
GI01.09: Deep Learning and Neural Networks for Materials
Session Chairs
Soojung Baek
John J. Boeckl
Kristofer Reyes
Thursday PM, November 29, 2018
Hynes, Level 1, Room 110
1:30 PM - *GI01.09.01
Promise and Perils of Machine Learning in the Sciences
Patrick Riley1
Google1
Show AbstractAll around we see excitement about machine learning to make a meaningful change in a variety of scientific disciplines. Sometimes lost in this discussion is a clear understanding and discussion of the risks, expected utility, and proper application of these powerful techniques. Through vignettes of real applications of machine learning to scientific problems by the Google Accelerated Science group, this talk will illustrate important lessons for scientists who want to make machine learning part of their toolkit.
2:00 PM - GI01.09.02
Automatic Classification of Hypothetical Zeolites with Unsupervised Machine Learning
Daniel Schwalbe-Koda1,Rafael Gomez-Bombarelli1
Massachusetts Institute of Technology1
Show AbstractZeolites are inorganic nanoporous materials with wide industrial applications for their catalytic, adsorbent and ion-exchange properties. Because of their broad applications, selectivity, robustness and cost-effectiveness, intense basic and applied research is devoted to the discovery new zeolitic materials. Whereas hundreds of thousands of hypothetical zeolite-like frameworks are theoretically possible, only ~235 different known geometries have been synthesized to date [1]. The virtual space of possible zeolites has been explored by algorithmic generation and later filtered by physics-based simulations of thermodynamic driving force [2]. Several hand-made criteria based on geometrical or thermodynamic analysis have been proposed to differentiate synthetically accessible zeolites, but the underlying aspects that define zeolite feasibility remain unknown.
Aiming to accelerate the discovery and design of realizable zeolites, we have applied unsupervised machine learning methods to explore the topological and geometrical degrees of freedom responsible for the synthetic accessibility of zeolitic frameworks. Similar deep learning models have been successfully applied to map molecules onto a continuous space [3]. A generative model comprised of a Wasserstein autoencoder [4] was coupled to graph convolutional neural networks customized for crystalline solids [5]. Using a large dataset of hypothetical zeolites, the model was trained to reconstruct the original unit cells from a low dimensional representation. Going through this information bottleneck allows to automatically identify the key collective variables that describe zeolitic frameworks. In addition, the model was jointly trained to predict physics-based labels in a semisupervised fashion using existing simulations, thus contributing to create a smooth and robust latent representation. In a last step, hypothetical and known feasible zeolites were projected onto the latent space. By exploring common patterns, the deep learning toolset allows to single out hypothetical zeolite frameworks that are likely to be feasible.
[1] Zimmermann, N. E. R. and Haranczyk, M. History and Utility of Zeolite Framework-Type Discovery from a Data-Science Perspective. Cryst. Growth Des. 16, 3043-3048 (2016).
[2] Deem, M. W. et al. Computational Discovery of New Zeolite-Like Materials. J. Phys. Chem. C 113, 21353–21360 (2009).
[3] Gómez-Bombarelli, R. et al. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 4 (2), 268-276 (2018).
[4] Tolstikhin, I. et al. Wasserstein Auto-Encoders. arXiv preprint. arXiv:1711.01558 (2017).
[5] Xie, T., and Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 120 (14), 145301 (2018).
[6] Engel, J. et al. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models. arXiv preprint arXiv:1711.05772 (2017).
2:15 PM - GI01.09.03
CGCNN—A Graph Representation of Materials for Property Prediction and Materials Design
Tian Xie1,Jeffrey Grossman1
Massachusetts Institute of Technology1
Show AbstractEmerging machine learning (ML) based methods provide powerful tools to extract structure-property relations from the ever-growing materials simulation and experimental databases. However, most existing ML methods depend on feature vectors to represent different materials, which are limited to specific material groups and some computationally expensive features. In this talk, we present a crystal graph convolutional neural networks (CGCNN) framework [1] that represents an arbitrary periodic structure with an arbitrary number and type of atoms in a unit cell. We achieve state-of-the-art prediction performance for 8 different material properties using only the crystal structure information. We also extract materials design knowledge from the neural networks which can help accelerate the screening of materials. Finally, we show several examples of how the method can be applied to the design of Li metal battery electrolytes [2] and other material systems.
[1] Xie, T., & Grossman, J. C. (2018). Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Physical review letters, 120(14), 145301.
[2] Ahmad, Z., Xie, T., Maheshwari, C., Grossman, J. C., & Viswanathan, V. (2018). Machine Learning Enabled Computational Screening of Inorganic Solid Electrolytes for Dendrite Suppression with Li Metal Anode. arXiv preprint arXiv:1804.04651.
2:30 PM - GI01.09.04
Deep Transfer Learning for Active Optimization of Functional Materials Properties in the Data-Limited Regime
Brian DeCost1,Gilad Kusne1
National Institute of Standards and Technology1
Show AbstractThe recent development of learned representations for molecules and crystals via variants of graph convolutional neural networks (GCNs) has enabled significant improvements in the performance of data-driven models of chemical and physical properties of materials. This class of neural networks learns complex hierarchical representations of molecules and crystals using only a small number of elemental properties and the topology of the molecule or crystal. The representations learned by GCNs are built up by sequentially performing graph convolutions (i.e. local weighted sums) over the bonding network, where the graph convolution weights are optimized by stochastic gradient descent via backpropagation. We explore a deep transfer learning approach to leverage this representational power in solid state systems for which available data is limited and especially where additional data is expensive to acquire. In this proof of concept study, we hold out entire classes of functional materials (e.g. perovskite photovoltaic candidates, Heusler and Heusler-related compounds) during the GCN training process, using large density functional theory (DFT) calculation datasets for both training and active learning test sets. We then perform de novo active learning using GCN-derived features to optimize the relevant functional materials properties for each held out materials system. We compare the computational efficiency of the active learning process using GCN features to that using competitive engineered features, such as the Magpie feature set. Future efforts will focus on directly driving experimental efforts and DFT calculations using this deep transfer learning strategy.
3:15 PM - *GI01.09.05
Applying Deep Learning to Proteins and Small Molecules
Lucy Colwell1
University of Cambridge1
Show AbstractThe evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using machine learning and artificial intelligence to exploit the natural variation present in homologous protein sequences to infer these constraints and thus identify sequences with different protein phenotypes.
Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the higher order structure of the data. We show that models learned from data are capable of predicting key aspects of protein function. These include (i) the inference of residue pair interactions that are accurate enough to predict all atom 3D structural models; and predictions of (ii) binding interactions between different proteins and (iii) binding between protein receptors and their target ligands. Finally, I will discuss current efforts to search the space of possible sequences for those that correspond to proteins with specific functional properties that exploit recent advances in deep learning.
3:45 PM - GI01.09.06
Design of Bioinspired Hierarchical Systems Using Machine Learning, Additive Manufacturing and Experiment
Grace Gu1,Chun-Teh Chen1,Deon Richmond1,Markus Buehler1
Massachusetts Institute of Technology1
Show AbstractDevelopments in the materials science community are tending more and more to a materials-by-design approach, no longer accepting the status quo with the insight that we can tailor-make materials with desired properties for specific applications. Biological materials, such as bone, wood, and seashells, have multifunctional properties that often surpass that of their synthetic counterparts. Composed of a limited set of building blocks, nature intelligently organizes these building blocks into hierarchical architectures, allowing them to overcome the limitations of their constituents while combining their best attributes. These complex architectures pose a challenge for traditional manufacturing techniques due to their multi-material construction, hierarchy, and potential voids. Additive manufacturing is a new tool that can overcome these limitations. In this work, we use convolutional neural networks (CNN) to cast the natural process of evolution into a computational framework to study bioinspired hierarchical structures. With an integrated approach of simulation, machine learning, additive manufacturing, and experimental testing, we investigate the optimal microstructural patterns that lead to tougher and stronger materials. In the future, this bioinspired machine learning approach will enable materials-by-design of complex architectures to tackle demanding engineering challenges.
4:00 PM - GI01.09.07
Neural Network Analysis of Dynamic Fracture in a Layered Material
Pankaj Rajak1,Rajiv Kalia1,Aiichiro Nakano1,Priya Vashishta1
University of Southern California1
Show AbstractDynamic fracture of a two-dimensional MoWSe2 membrane is performed with molecular dynamics (MD) simulation. The system consists of a random distribution of WSe2 patches in a pre-cracked matrix of MoSe2. Under strain, the system shows toughening due to crack branching, crack closure and strain-induced structural phase transformation from 2H to 1T' crystal structures. Different structures generated during MD simulation are analyzed using a three-layer, feed-forward neural network (NN) model. A training data set of 36,000 atoms is created where each atom is represented by a 50-dimension feature vector consisting of radial and angular symmetry functions. Hyper parameters of the symmetry functions and network architecture are tuned to minimize model complexity with high predictive power using feature learning and Bayesian optimization which shows an increase in model accuracy from 67% to 95%. The NN model classifies each atom in one of the six phases which are either as transition metal or chalcogen atoms in 2H phase, 1T phase and defects. Further t-SNE analyses of learned representation of these phases in the hidden layers of the NN model show that separation of all phases become clearer in the third layer than in layers 1 and 2.
4:15 PM - GI01.09.08
Revealing Ferroelectric Switching Character Using Deep Recurrent Neural Networks
Joshua Agar1,2,Brett Naul2,Shishir Pandya2,Stefan van der Walt2,Joshua Maher2,Ren Yao3,Tess Schmidt2,Jeffrey Neaton2,Sergei Kalinin4,Rama Vasudevan4,Ye Cao3,Joshua Bloom2,Lane Martin2
Lehigh University1,University of California, Berkeley2,The University of Texas at Arlington3,Oak Ridge National Laboratory4
Show AbstractThe ability to create and manipulate domain structures in ferroelectrics allows the control of the phase and polarization orientation, imparting deterministic changes to the local and macroscale susceptibilities (e.g., electrical, thermal, mechanical, optical, etc.) providing a foundation for next-generation devices. While there have been demonstrations of nanoscale manipulation and control of such structures the majority of this work, however, has focused on the static creation of desired domain structures, and thus lacks an internal self-regulating feedback loop required for automatic operation in functional devices. Here, we develop an unsupervised sequence-to-sequence neural network, which considers the temporal dependence in the data, to extract inference from band-excitation piezoresponse spectroscopy (BEPS). To test our approach, we conducted BEPS on a tensile-strained PbZr0.2Ti0.8O3thin films wherein strain drives the formation of a hierarchical c/aand a1/a2domain structure. We develop and train a deep-learning-neural-network-based sparse autoencoder on the piezoresponse hysteresis loops to demonstrate parity with conventional approaches. We then apply this approach to extract insight from the resonance response which has a form too complex to be properly analyzed using conventional techniques. Using the information learned, we identify geometrically-driven differences in the switching mechanism which are related to charged-domain-wall nucleation and growth during ferroelastic switching. This insight could not have been extracted using conventional machine-learning approaches and provides unprecedented information about the nature of the specific domain-structure geometries that should be explored to enhance local and macroscale susceptibilities. Furthermore, the ability to automate the extraction of inference regarding ferroelectric switching from multichannel nanoscale spectroscopy provides a route for real-time controlled creation and verification of interconversion of functional domain structures and interfaces. The developed approach is extensible to other forms of multi-dimensional, hyper-spectral (wherein there is a spectra at each pixel) images which are commonly acquired in: time-of-flight secondary-ion mass spectrometry, scanning Raman, electron energy loss spectroscopy, etc.To promote the utilization of this approach, we provide open access to all data and codes in the form of a Jupyter notebook. Ultimately, this work represents an example of how unsupervised deep learning can highlight features relating to ferroelectric physics overlooked by human-designed-machine-learning algorithms, and how such approaches can be broadly adapted to analyze hyperspectral data.
4:30 PM - GI01.09.09
Deep Learning Bandgaps of Topologically Doped Graphene
Jian Lin1,Yuan Dong1,Chuhan Wu1,Chi Zhang1,Jianlin Cheng1
University of Missouri-Columbia1
Show AbstractManipulation of physical and chemical properties of materials via precise doping has afforded an extensive range of tunable phenomena to explore. Recent advance shows that in the atomic and nano scales topological states of dopants play crucial roles in determining their properties. However, such determination is largely unknown due to the incredible size of topological states, which makes it almost impractical to search by experiments or ab initio calculations. Here, we present a case study of using deep learning algorithms to predict bandgaps of boron-nitrogen pair doped graphene with random dopant topologies. In the study, the bandgaps are calculated by the accurate first-principle calculations, and together with the structure information they are fed as datasets to train three types of convolutional neuron networks (CNNs), including VGG16 convolutional network (VCN), residual convolutional network (RCN), and concatenate convolutional network (CCN). All of three CNNs afford good prediction accuracy, outperforming non-convolutional support vector machine (SVM). We further perform the transfer learning by leveraging data generated from smaller systems to improve the prediction for large supercell systems. The success of this work provides a cornerstone for future investigation of topological doping in graphene and other 2D materials beyond graphene. Furthermore, it will stimulate widespread interests in applying DL algorithms to topological design of materials crossing atomic, nano-, meso-, and macro- scales.
Symposium Organizers
Kristofer Reyes, University at Buffalo
John J. Boeckl, Air Force Research Laboratory
Keith A. Brown, Boston University
Stephane Gorsse, Bordeaux INP · ENSCBP
GI01.10: Autonomous Research II
Session Chairs
Soojung Baek
Keith A. Brown
Kristofer Reyes
Friday AM, November 30, 2018
Hynes, Level 1, Room 102
9:00 AM - *GI01.10.01
Autonomous Materials Research Systems—Phase Mapping
Gilad Kusne1,2,Brian DeCost1,Jason Hattrick-Simpers1,Ichiro Takeuchi2
National Institute of Standards and Technology1,University of Maryland2
Show AbstractThe last few decades have seen significant advancements in materials research tools, allowing researchers to rapidly synthesis and characterize large numbers of samples - a major step toward high-throughput materials discovery. Machine learning has been tasked to aid in converting the collected materials property data into actionable knowledge, and more recently it has been used to assist in experiment design. In this talk we present the next step in machine learning for materials research - autonomous materials research systems. We first demonstrate autonomous measurement systems for phase mapping, followed by a discussion of ongoing work in building fully autonomous systems. For the autonomous measurement systems, machine learning controls X-ray diffraction measurement equipment both in the lab and at the beamline to identify phase maps from composition spreads with a minimum number of measurements. The algorithm also capitalizes on prior knowledge in the form of physics theory and external databases, both theory-based and experiment-based, to more rapidly hone in on the optimal results. Materials of interest include Fe-Ga-Pd, TiO2-SnO2-ZnO, and Mn-Ni-Ge.
9:30 AM - GI01.10.02
Streaming Data Analysis Software Taking Us Towards Autonomous Experimentation—SHED and Streamz
Christopher Wright1,Jason Hattrick-Simpers2,Simon Billinge1,3
Columbia University1,National Institute of Standards and Technology2,Brookhaven National Laboratory3
Show AbstractAutonomous experiments have three major parts, data acquisition, data analysis, and feedback. While all three are crucial to performing the experiment, the data analysis portion may be the most challenging. The data analysis must take raw data, produce meaningful quantities of interest, and translate them into actionable experimental parameters. Importantly all of this must be done live. In this talk I will discuss our approach to streaming data analysis for x-ray total scattering measurements, using analysis pipelining programs we have written, streamz and SHED. Streamz and SHED provide a simple, powerful, and easy to use way to build streaming data processing protocols in python. The SHED system also provides seamless data provenance with minimal user input. Finally we'll discuss the application of these pipelines to kriging driven autonomous experimentation for discovering novel glass forming alloys via atomic pair distribution function analysis.
10:15 AM - *GI01.10.03
Industrial Waste Gas Mixture as a Feedstock in Efficient Carbon Nanotube Growth—Using an Autonomous Research System to Probe Growth Kinetics and Mechanisms
Placidus Amama1,Brian Everhart1,Pavel Nikolaev2,3,Rahul Rao2,3,Benji Maruyama2
Kansas State University1,Air Force Research Laboratory2,UES Inc.3
Show AbstractFischer–Tropsch synthesis (FTS) is an environmentally friendly process used in industry for the conversion of syngas (CO and H2) – usually obtained from low-value biomass, natural gas, and coal – to high-value hydrocarbon liquid fuels over transition metal catalysts (typically Fe or Co). The gaseous product mixture (FTS-GP) usually consists of C1-C4 hydrocarbons and unconverted CO and H2, which we use as a feedstock for the chemical vapor deposition (CVD) growth of carbon nanotubes (CNTs). A comparison of the growth curve of FTS-GP CVD using an Fe catalyst with other conventional CVD processes for CNT forest growth, reveals growth behavior (in terms of catalyst lifetime and growth rate) superior to existing CVD approaches. The objective of this study is to develop the fundamental understanding required to couple catalytic CVD for CNT growth to the waste-gas stream of FTS (FTS-GP) for scalable, continuous, and controlled growth of CNT arrays. Due to the breadth of parameters that affect CVD growth of CNTs, rapid experimentation is necessary for effective growth condition optimization and detailed understanding of the role of FTS-GP. Here we employ an autonomous research system (ARES), equipped with in situ Raman spectroscopy, to probe the growth kinetics of CNTs and elucidate the secret role of FTS-GP in enhancing catalyst lifetime and growth efficiency. ARES allows for both rapid, automated experimentation as well as autonomous growth, in which the system self-generates experiments to maximize growth rate based on previous results. Our study reveals the dependence of growth temperature, catalyst properties, feedstock composition, and partial pressure of FTS-GP on activity and lifetime of catalysts. As examples, the optimal growth temperature and FTS-GP partial pressure for CNT growth on an Fe catalyst were determined to be approximately 850 °C and 20 Torr, respectively. We have also studied the effects of continual water generation on the surface of a catalyst, and its capability of enhancing growth by prolonging catalyst lifetimes. This study is expected to illuminate the complex interdependence of catalysts and carbon feedstock and facilitate rational design of catalysts and growth recipes for efficient and controlled CNT growth.
10:45 AM - GI01.10.04
The Odds of Synthesis—Predictions from Network Analysis and Phase Diagrams
Muratahan Aykol1
Toyota Research Institute1
Show AbstractSynthesis of new materials is a complex, multi-faceted process driven not only by thermodynamics or kinetics but also by the availability of precursors and techniques, expertise, intuition and many other circumstantial factors. With this complexity, predictive synthesis is emerging as the new grand challenge in materials discovery. In this talk, we will present a tractable informatics approach to predicting the likelihood of successful experimental synthesis of hypothetical materials, such as those identified via high-throughput (HT) density functional theory (DFT), prototype searches or various other modeling techniques. The method combines network interpretation of the free energy-composition space obtained from HT-DFT, i.e. the convex-hull, and the discovery timeline of materials extracted from publications, to build accurate machine-learning models that forecast probability of successful synthesis in the laboratory. *This work was done in collaboration with Santosh Suram, Patrick Herring, Linda Hung, Vinay Hegde, Jens Hummelshoej and Chris Wolverton.
11:00 AM - GI01.10.06
A Bayesian Framework for Selection, Calibration and Uncertainty Quantification of Thermodynamic Property Models
Noah Paulson1,Elise Jennings1,Marius Stan1
Argonne National Laboratory1
Show AbstractThermodynamic property models form the basis of numerous technologically important applications including the calculation of equilibrium phase diagrams and the simulation of microstructure evolution. Traditionally, the selection of models, weighting of data and removal of outliers are informed by the intuition of the researcher. Furthermore, model calibration is deterministic, rarely resulting in uncertainty intervals for the model predictions. In this work, we present a framework for the selection, calibration and uncertainty quantification (UQ) of thermodynamic property models. Enabled by recent advances in numerical sampling algorithms, we employ fully Bayesian methods to address each of these tasks in addition to common issues seen in thermodynamic data including outliers, inaccurately reported or missing error bars and systematic errors. In addition, the framework enforces consistency between the thermodynamic quantities while optimally leveraging data from all available sources. The framework is demonstrated through the construction of thermodynamic property models for the alpha, beta and liquid phases of Hafnium metal for specific heat, enthalpy, entropy and Gibbs free energy.
11:15 AM - GI01.10
Panel Discussion: AI for Disruptive Manufacturing
Show AbstractGI01.11: Late News—Machine Learning and Data-Driven Materials Development and Design
Session Chairs
Soojung Baek
John J. Boeckl
Keith A. Brown
Friday PM, November 30, 2018
Hynes, Level 1, Room 102
1:45 PM - GI01.11.02
Learning UV-Vis Spectroscopy from Images on the World's Largest Experimental Materials Database
Helge Stein1,Dan Guevarra1,Paul Newhouse1,Edwin Soedarmadji1,John Gregoire1
California Insitute of Technology1
Show AbstractUV-Vis spectroscopy is the first step in assessing light absorbers for solar fuels generation, but the community lacks sufficiently large experimental datasets and predictive models for experimental optical properties. Based on the largest and most diverse experimental materials science dataset of 180,902 distinct materials, including 45 elements, and more than 80,000 unique quinary oxide and 67,000 unique quaternary oxide compositions we trained different deep neural nets that enable us to predict complete UV-Vis absorption spectra from a materials sample image. The models learn how to spectrally hyperscale from a low energy but high spatial resolution input. Extracting direct bandgaps from predicted spectra yields an accuracy of bandgap prediction of 0.2 eV RMSE, which is well within the uncertainty of traditional extraction of bandgaps. Building upon these models we will present a one million experimental materials sample image dataset with complete data lineage and UV-Vis characterization. We will discuss methods and challenges in predicting optical properties from composition featurizers and chart pathways to autonomous experiment planning using state of the art visualization tools.
2:00 PM - GI01.11.03
Material Image Segmentation with Machine Learning Method and Complex Network Method
Yuexing Han1,Chuanbin Lai1,Leilei Song1,Qian Li1,Hui Gu1
Shanghai University1
Show AbstractThe study of the relationship among the manufacturing process, the structure and the property of materials can help to develop the new materials. Material images contain the microstructures of materials, therefore, the quantitative analysis for material images is the important means to study the characteristics of material structures. Generally, the quantitative analysis for material microstructures is based on the exact segmentation of the materials images. However, most material microstructures are shown with various shapes and complex textures in images, and they seriously hinder the exact segmentation of the component elements. In this research, machine learning method and complex networks method are adopted to the challenge of automatic material image segmentation. Two segmentation tasks are completed: one, images of titanium alloy are segmented based on pixel-level classification through feature extraction and machine learning algorithm; the other, ceramic images are segmented with complex networks theory. In the first task, texture and shape features near each pixel in titanium alloy image are calculated, such as Gabor filters, Hu moments and GLCM(Gray-Level Co-occurrence Matrix) etc.. Feature vector for the pixel may be obtained by arraying these features. Then, classification is performed with the random forest model. Once each pixel is classified, the image segmentation is also completed. In the second task, a complex network structure is built for the ceramic image, by using K-means algorithm and gridding method. Then, a clustering algorithm of complex network is used to obtain network connection area. Finally, the clustered network structure is mapped back to the image and getting the contours among the component elements. The experimental results demonstrate that these methods can accurately segment materials images. The segmentation methods can provide the data foundation for further quantitative analysis of the materials microstructures.
2:15 PM - GI01.11.04
On Converting Material Phase Dynamic Transformation Problem into Material Video Frame Variation Problem
Quan Qian3,Guangtai Ding1,2,Jianxun Fu3,2,Huiran Zhang1,2,Tong-Yi Zhang2
School of Computer Engineering and Science, Shanghai University1,Materials Genome Institute of Shanghai University2,School of Material Science & Engineering, Shanghai University3
Show AbstractIn this paper, based on dynamic image analysis technique, material videos or image sequences captured by high temperature laser scanning confocal microscope are analyzed so as to reveal or interpret some facts of martensitic transformation, etc. The main goal of the research is to establish the quantitative relationship between the micro-structural features of the materials, especially their two-dimensional spatial and one-dimensional temporal characteristics, and their video image features. Dynamic image analysis theory and algorithms oriented to martensitic phase transformation problem will be mainly discussed. The method and algorithms in this paper can be applied and generalized to austenite, ferrite, pearlite and other metal phase transformation analysis and applications.
3:00 PM - GI01.11.05
Machine Learning Method for Parameter Development in Additive Manufacturing
Voramon Dheeradhada1,Natarajan Chennimalai Kumar1,Laura Dial1,Vipul Gupta1,Tim Hanlon1,Joe Vinciquerra1,Jim Grande1
GE Global Research1
Show AbstractRecent development of nickel superalloys for additive manufacturing has shown to be challenging due to the susceptibility to micro cracking in as build microstructure. Significant effort has gone into optimizing build parameters for these hard to process alloys. A new protocol was developed by leveraging machine learning algorithm to accelerated the development cycles. In this paper, example of the use of machine learning method to guide parameter development for hard to weld alloys will be presented.
3:15 PM - GI01.11.06
Development of a Machine Learning-Based Approach for Diffusion Studies in Crystals—Application to Diffusion in III-V Semiconductors
Mardochee Reveil1,Paulette Clancy2
Corning Incorporated1,Cornell University2
Show AbstractRecent developments in machine learning have created unprecedented opportunities for incorporation of artificial intelligence techniques in scientific research in a variety of fields including medicine, astronomy, robotics, etc. Application of such advanced techniques in molecular design of materials is still in its infancy and, indeed, lags behind the fields mentioned above. Here, we explore and develop a novel approach whereby recent advances in machine learning techniques are used to study diffusion in crystals, which is critically important for the semiconductor industries. We show that this method offers a viable alternative to traditional techniques used to qualitatively (i.e., gain mechanistic insights) and quantitatively uncover how different species diffuse in a crystal lattice. We explain how this method can be applied to the study of defect diffusion in III-V semiconductors. III-V materials, like GaAs or InGaAs, are a promising class of materials for use in next-generation computing devices for their combination of high performance and better heat management characteristics. But the combinatorial nature of this design space makes it challenging to efficiently explore potential candidates. By providing enhanced screening capabilities, this new method represents a significant step in the right direction for faster design of next-generation materials. Finally, we explain why Machine Learning could be a powerful tool to help tackle other traditional chemical engineering problems.
3:30 PM - GI01.11.07
Combined Data-Driven Identification and Physically-Based Understanding to Materials Development—Building a Solid Process-Structure-Property Link in Ga-Doped ZnO Films
Yuyun Chen1,Feng Huang1
Ningbo Institute of Materials Technology and Engineering1
Show AbstractBuilding a solid process-structure-property link to improve electrical properties effectively is difficult, especially in the vapor-deposited functional films with hierarchical structures. Here, we have introduced a semi-empirical method combined top-down data-driven identification and bottom-up physically-based understanding to build such a link for the magnetron sputtered Ga-doped ZnO (GZO) films. Artificial neural network (ANN) was utilized to identify the most correlative inverse structure-property and process-structure relationships. Moreover, a physically-based understanding of ANN results was to examine the rationality of the identified inverse relationships and capture the dominant mechanism for further materials design. It has been demonstrated that this combined method can identify the feasible process space to tailor the structures at a "correct length scale" and thus significantly improve the conductivity of our GZO films. Our semi-empirical method is probably valid for effectve enhancement in physical properties of other vapor-deposited thin films.
3:45 PM - GI01.11.08
Exploring Large Scale ToF-SIMS Data Matrices Using Artificial Neural Networks—Polymers and Biointerfaces
Paul Pigram1,Robert Madiona1,Nicholas Welch2,David Winkler1,2,3,Benjamin Muir2
La Trobe University1,CSIRO Manufacturing2,Monash University3
Show AbstractTime-of-flight secondary ion mass spectrometry (ToF-SIMS) is continuously advancing. The data sets now being generated are growing dramatically in complexity and size. More sophisticated data analytical tools are required urgently for the efficient and effective analysis of these large, rich data sets. Standard approaches to multivariate analysis are being customised to decrease the human and computational resources required and provide a user-friendly identification of trends and features in large ToF-SIMS datasets.
We demonstrate the generation of very large ToF-SIMS data matrices using mass segmentation of spectral data in the range 0 – 500 m/z in intervals ranging from 0.01 m/z to 1 m/z. No peaks are selected and no peak overlaps are resolved. Sets of spectra are calibrated and normalized then segmented and assembled into data matrices. Manual processing is greatly reduced and the segmentation process is universal, avoiding the need to tailor or refine peak lists for difficult sample types or variants.
ToF-SIMS data for standard polymers (PET, PTFE, PMMA and LDPE) and for a group of polyamides are used to demonstrate the efficacy of this approach. The polymer types of differing composition are discriminated to a moderate extent using PCA. PCA fails for polymers of similar composition and for data sets incorporating significant random variance.
In contrast, artificial neural networks, in the form of self organising maps (SOMs) deliver an excellent outcome in classifying and clustering different and similar polymer types and for spectra from a single polymer type generated using different primary ions. This method offers great promise for the investigation of more complex bio-oriented systems.
We compare the analysis of large scale mass segmented matrices with those formed using conventional selection of ToF-SIMS peak lists. SOMs are used to cluster and discriminate antibody fragments bound at surfaces and to demonstrate antibody orientation in optimised ELISA format assays.
4:00 PM - GI01.11.09
Machine Learning with Force-Field-Inspired Descriptors for Materials—Fast Screening and Mapping Energy Landscape
Kamal Choudhary1
National Institute of Standards and Technology1
Show AbstractWe present a complete set of chemo-structural descriptors to significantly extend the applicability of machine learning (ML) in material screening and mapping the energy landscape for multicomponent systems. These descriptors allow differentiating between structural prototypes, which is not possible using the commonly used chemical-only descriptors. Specifically, we demonstrate that the combination of pairwise radial, nearest-neighbor, bond-angle, dihedral-angle, and core-charge distributions plays an important role in predicting formation energies, band gaps, static refractive indices, magnetic properties, and modulus of elasticity for three-dimensional materials as well as exfoliation energies of two-dimensional (2D)-layered materials. The training data consist of 24549 bulk and 616 monolayer materials taken from the JARVIS-DFT database. We obtained very accurate ML models using a gradient-boosting algorithm. Then we use the trained models to discover exfoliable 2D-layered materials satisfying specific property requirements. Additionally, we integrate our formation-energy ML model with a genetic algorithm for structure search to verify if the ML model reproduces the density-functional-theory convex hull. This verification establishes a more stringent evaluation metric for the ML model than what is commonly used in data sciences. Our learned model is publicly available on the JARVIS-ML website (https://www.ctcms.nist.gov/jarvisml), property predictions of generalized materials.
4:15 PM - GI01.11.10
Crystal Site Feature Embedding Enables Deep Image Recognition Based Exploration of Chemical Spaces Exceeding One Billion Compounds
Mikhail Askerka1,Kevin Ryczko2,Oleksandr Voznyy1,Kyle Mills2,Isaac Tamblyn3,2,Edward Sargent1
University of Toronto1,University of Ottawa2,National Research Council3
Show AbstractRecent years have seen rapid advancements in artificial intelligence, with computer vision methods achieving >96% accuracy on image classification problems. Mapping materials science problems onto computational frameworks that can leverage advances in image recognition could accelerate the discovery of new materials for applications ranging from energy storage to efficient photon capture. Here we translate the problem of ordering atoms in crystals into an image recognition problem. In Crystal Site Feature Embedding (CSFE), we partition the crystalline lattice into sites according to their spatial arrangement and map it onto a 4D feature vector. This volumetric vector is analogous to a 3D color image, where the first three dimensions reflect the sites’ arrangement, and the fourth dimension corresponds to the sites’ physical properties. We show that this new compact, position-agnostic representation carries sufficient physical insight to machine learn materials properties predicted by complex and time-consuming electronic structure methods such as Density Functional Theory (DFT). Only by using CSFE to leverage image recognition techniques such as Convolutional and Extensive Deep Neural Networks, do we achieve an impressively low mean absolute test error of 3.5 meV/atom on DFT total energies and 0.07 eV on DFT bandgaps of mixed halide perovskites. This enables us to capture nontrivial property trends, such as the U-shape of the bandgap of MAPbxSn(1-x)I3 even though the learned algorithm was not explicitly trained in any of the intermediate compositions that make up this nonmonotonic bandgap vs. composition behavior. The method provides an unprecedented >1010 times acceleration factor compared to DFT alone. Additionally, we use CSFE to explore chemical spaces beyond those used for training, taking advantage of Brillouin zone folding and nature of deep image recognition methods.
4:30 PM - GI01.11.11
Data-Driven Equation Discovery—Peak Current of Cyclic Voltammetry Simulations
Tong-Yi Zhang1,Sheng Sun1
Shanghai University1
Show AbstractCyclic voltammetry test is popular in electrochemical field to study the redox reaction kinetics at the electrode/electrolyte interface. Cyclic voltammogram (CV) can be plotted showing the current as a function of periodic linear swept potential. The current peak (Ip) in a CV under different experiment setup and reaction condition is a significant indicator of the underneath reaction kinetics. However, the explicit expression of Ip cannot be derived theoretically. Here, we demonstrate the success of data-driven model discovery by using symbolic regression (SyR) combining with sparse regression (SpR) for Ip. At first, SyR and SpR were shown to be able to produce exactly the widely used expressions of Ip as a function of diffusion constant, potential scan rate, reaction constant and initial concentration of the oxide for reversible and irreversible reactions. Then, a very accurate expression of Ip across all reaction regions, including reversible, quasi-reversible and irreversible reactions, was obtained by SyR, when a few expert knowledge was introduced. This preliminary work indicates that SyR should be a powerful tool to find expressions of Ip for more complex reactions with other influence factors, such as stress field and electrode shape.