The rapid growth of data science and machine learning techniques across science is starting to change how battery degradation research is performed. In this tutorial, we introduce the data science tools that are standard-of-practice for data science alongside those being created to solve specific challenges within electrochemical energy storage research. Topics will include how to store and retrieve characterization data from web-enabled databases, the emerging landscape of tools for analyzing and visualizing battery data, an introduction into several classes of machine learning techniques, and an overview of tools available for making synthetic data. Each session will briefly cover the principles of each subject (e.g., how neural networks are trained) as well as practical tools for using these approaches in your own research.
Introduction to Battery Data Science
Logan Ward, Argonne National Laboratory
We will start the morning session with an overview of the pillars of battery data science: databases, analysis tools, machine learning, and simulation. The morning session will give context for how these subjects link together and present motivating examples for how they have been used to speed the development of new batteries.
Schemas, Formats, and Databases for Testing Data
Amalie Trewartha, Toyota Research Institute
The future of battery data science will be supported by an ecosystem of battery testing databases. Our first session will teach the approaches and tools needed to effectively access data and store new data in these databases. We will explain the efforts to standardize formats used to describe data, introduce the tools to convert data from battery testers or battery systems into these formats, and access large sets of battery data from several of the services for storing battery data.
Analyzing and Manipulating Data with the PyData Stack
Valerio De Angelis, Sandia National Laboratories
There is a growing resource of tools for performing standard analysis and visualization tasks with little effort. This session will introduce several open-source tools to visualize data with web tools and prepare them for analysis and machine learning. At the end of the session, learners will have the knowledge and tools to build battery data repositories and analysis applications for their groups.
Practical exercises for storing, accessing, visualizing and analyzing battery testing data. Participants will work through a problem set that walks them through common data analysis tasks using public data and open-source tools.
Machine Learning with Uncertainties
Paul Gasper, National Renewable Energy Laboratory
Machine learning methods are powerful tools for automatically constructing empirical models of battery behavior, but it is difficult to know when such models are actually useful. Our first machine learning session will explain how to diagnose state-of-health with machine learning, how to validate model performance and how to estimate how much to trust individual predictions. An illustrative example will use Gaussian Process Regression to predict state-of-health from raw impedance data with uncertainty, regularizing models via feature selection to improve model performance on unseen test data, and evaluating performance of a model pipeline across various data sets (EIS recorded under different conditions) to demonstrate the limits of model extrapolation.
Time Series Prediction with Neural Networks
Logan Ward, Argonne National Laboratory
The unparalleled flexibility of neural networks makes them the field-standard method for building models from large resources of data. Our second ML section will explain the the variety of types of models that are possible to express with neural networks (e.g., convolution networks for images, recurrent networks for times series), and demonstrate their applications to battery life estimation. We will apply neural networks to benchmark challenges established as part of the Battery Data Commons effort as part of this tutorial.
Generating Synthetic Data for Machine Learning
Bor-Rong Hypo Chen, Idaho National Laboratory, Sangwook Kim, Idaho National Laboratory, and Zonggen Yi, Idaho National Laboratory
The understanding of underlying aging modes and mechanisms responsible for battery aging is crucial. Insights into aging modes/mechanisms at early battery life will help battery developers and end users to iterate battery design parameters quickly for avoiding a particular failure mode, develop corrective actions for mitigation of a particular aging mode, and enable more reliable prediction of battery life. Battery synthetic data could play a vital role in achieving those goals. Synthetic data are generated from battery models that simulate electrochemical behaviors for different aging mode combinations arising from different stress factors. The synthetic data aid in the high-throughput analysis by alleviating the needs of generating extensive experimental data, which could take months to even years. With reliable synthetic data generated, one will be able to investigate the cause-effect relationship between aging phenomena and electrochemical behaviors. When combined with advanced data analytics, machine learning could potentially pave the way for establishing a rapid battery aging prediction framework. In this tutorial, we will present how synthetic incremental capacity data are being used in constructing a deep-learning-based framework to classify and quantify different aging modes.