Panayotis Manganaris1,Jiaqi Yang1,Arun Kumar Mannodi Kanakkithodi1
Purdue University1
Panayotis Manganaris1,Jiaqi Yang1,Arun Kumar Mannodi Kanakkithodi1
Purdue University1
We report a materials design pipeline for halide perovskites. The primary objective being to recommend perovskite alloy compositions corresponding to targeted properties. Here targets are chosen to yield stable compounds with high photovoltaic (PV) performance. Thus, we focus on models of the electronic band gap, decomposition energy (therefore, perovskite stability), and PV efficiency. We use Spectroscopic Limited Maximum Efficiency (SLME) for synthetic data and Power Conversion Efficiency (PCE) for physical data. We use nanoHUB, an NSF-funded, Purdue-based computational repository, to host reproducible notebooks documenting our model development workflow [1]. We also host an interoperable database and inverse design pipeline for public access. Thereby, we enable the scientific community to use and improve these tools. Thus perovskites with alternative optoelectronic properties, possibly targeting quantum computing or metrology, may be discovered. The design pipeline makes recommendations using continuous surrogate models trained to connect a discretely sampled composition space to the targeted properties. A Genetic Algorithm (GA) selects optimal compositions where fitness involves minimizing euclidean distance between the predicted and targeted properties and by ensuring compositional feasibility.<br/><br/>Our GA optimizer prioritizes exploration so locally optimal candidates can be found without fixating on a small subset of fit regions. GA is also efficient in high dimensional space. In our past published work [2], the discrete alloy space based on a finite supercell experiences combinatorial scaling with supercell size. Going from discrete sampling to continuous surrogate models sees this scaling continue to infinite resolution. We provide machine learning (ML) models for the optimizer to work on: A rigorously optimized Random Forest Regressor (RFR), a Gaussian Process (GP) Regressor, and a Sure Independence Screening and Sparsifying Operator (SISSO) regressor. Naturally, the GA's solutions will only be as good as the ML accuracy.<br/><br/>We subdivide approximately 1000 physical and synthetic records into tables by record accuracy. The largest table of ~500 compounds contains optoelectronic properties simulated using density functional theory (DFT). Here, geometry optimization of pseudo-cubic ABX3 supercells with arbitrary mixing at each site is followed by static band structure and optical absorption calculations performed at the GGA-PBE level of theory. Also, ~300 compounds were subjected to more expensive hybrid HSE06 computations, both with and without spin-orbit coupling (SOC), for better electronic properties. Finally, ~100 of the same compositions also record experimental measurements for band gap and efficiency.<br/><br/>We combine tables using multi-fidelity modeling techniques so each of our architectures can make predictions with physical accuracy. Currently we use only composition information as descriptors. This simplifies featurization because a composition vector can be procedurally obtained for experimental and synthetic data alike simply by parsing a string encoding the ABX3 perovskite formula corresponding to each record with relative ease.<br/><br/>No more than this composition vector is needed to make a prediction. However, additional predictors derived from elemental properties easily obtained from the trusted Mendeleev databases are included in model design. Finally, we discuss using state of the art graph based surrogates to extend this optimization loop to none-cubic phases. The nanoHUB pipeline is constantly improving with fresh DFT data, ML models, and design tools.<br/><br/>[1] P. Manganaris, et al. MRS ICMS tutorial, 2022. URL https://nanohub.org/resources/36041?rev=90.<br/>[2] Mannodi-Kanakkithodi, A., & Chan, M. K. Y. (2022). Data-driven design of novel halide perovskite alloys. EES., 15, 1930–1949. http://dx.doi.org/10.1039/D1EE02971A