Rachel Kurchin1
Carnegie Mellon University1
Rachel Kurchin1
Carnegie Mellon University1
Whether or not we take an explicitly probabilistic perspective, and whether or not we are using black-box models or traditional physics-based ones, learning from and with distributions (of data, probability, and/or parameters) is an important tool in the data-driven modeler’s toolbox. In this talk, I’ll present two stories from my group’s work emphasizing why this is important.<br/> First, I’ll discuss some past and present work in using Bayesian parameter estimation to characterize materials and interfaces in photovoltaic devices. In this work, we use a drift-diffusion solver to simulate current-voltage curves with a variety of different materials properties for both the layers and the interfaces within the device. By comparing these simulated curves to measured data we can effectively “invert” the device model to get posterior distributions reflecting our knowledge about the input parameters to the model as a function of measurements corresponding to the output.<br/> Next, I’ll shift to focus on some ongoing work utilizing diffusion models (the same type of machine learning model behind generative image tools such as DALL-E and Midjourney) to generate realistic microstructures of solid-oxide cell electrodes for use in device degradation simulations. A key aspect of validating these models is running their outputs through software that computes various microstructural properties of interest, including phase fractions, interfacial areas, and triple phase boundary densities. In particular, we examine the distributions of these properties in the generated data as compared to the training data and investigate whether, as has been reported in other work, diffusion models are less prone to issues such as mode collapse compared to other types of generative models such as generative adversarial networks.<br/> In both of these stories, understanding how we can learn from and with both data and distributions is paramount. The insights that we have on how physical processes in our systems of interest should affect values and distributions in our datasets are critically important to incorporate in the way we analyze and learn from that data.