2022 MRS Spring Meeting & Exhibit Landing Banner

Tutorial DS04—MLOps for Materials Science—What Comes After Building a Machine Learning (ML) Model

Sunday, May 8, 2022
8:30 AM - 12:00 PM
Hawaiʻi Convention Center, Level 3, Room 313B

The ability to leverage machine learning (ML) is rapidly becoming a part of the material scientist’s toolkit, whether for aiding ab initio computational work, screening datasets of candidate materials or interfacing with in-lab experimenters and equipment. This tutorial aims to address the problems that frequently arise after a materials scientist builds their first few successful ML models: How can multiple versions of models, and their predictions, be effectively tracked? Are there ways to automatically test models before running large batch predictions?

This tutorial will use case studies from recent work in combinatorial science and materials screening to present an interactive Python tutorial for participants. Familiarity with basic machine learning principles along with Git version control are recommended for participation in this tutorial.

Participants will learn:

  • The concept of using correlation IDs to track models and predictions, which is a best practice from large-scale ML in the (web-based) technology industry
  • Using automated workflow tools, such as the free Github Actions, to automate testing and sanity checks for ML models before time and resources are committed for large predictions
  • Recommendations of MLOps resources for participants to explore further after the tutorial, such as tools for:
    • Logging ML parameters with experiment-tracking libraries (e.g., MLflow)
    • Packaging Python dependencies to improve reproducibility
    • Orchestrating ML pipelines to allow partial restarts (e.g., re-run predictions without retraining) and easier debugging

Prerequisites: Familiarity with Python and version control (e.g., Github). Prior experience building machine learning models is a necessary prerequisite. A laptop with Python and Github access to follow along in real-time is strongly recommended.

Tracking your ML Models and Their Predictions
Edward Kim, Xero; Jason Hattrick-Simpers, University of Toronto; Arun Mannodi Kanakkithodi, Purdue University

This session will include a case study from research in autonomous materials science and an interactive code walk through on preserving the lineage of and linkages between ML models and their predictions. The session will also include Q&A on some recommended tools, best practices and applications across materials domains.

Machine Learning with Automated Pipelines to Reduce Risk
Edward Kim, Xero; Jason Hattrick-Simpers, University of Toronto; Arun Mannodi Kanakkithodi, Purdue University

This section will include a case study from research in materials screening and a guided code walk through on automated testing of ML models and leveraging pipeline tools to manage ML workflows. The session will also include Q&A on some recommended tools, best practice, and applications across materials domains.

Publishing Alliance

MRS publishes with Springer Nature

Symposium Support