MRS Meetings and Events

 

DS02.09.06 2022 MRS Fall Meeting

Enzeptional—Enzyme Optimization via a Generative Language Modeling-Based Evolutionary Algorithm

When and Where

Dec 1, 2022
9:30am - 9:45am

Hynes, Level 2, Room 210

Presenter

Co-Author(s)

Yves Gaetan Nana Teukam1,2,Matteo Manica1,Francesca Grisoni2,Teodoro Laino1

IBM Research Europe - Zurich1,Technische Universiteit Eindhoven2

Abstract

Yves Gaetan Nana Teukam1,2,Matteo Manica1,Francesca Grisoni2,Teodoro Laino1

IBM Research Europe - Zurich1,Technische Universiteit Eindhoven2
Enzymes are molecular machines optimized by nature to allow otherwise impossible chemical processes to occur. Besides the increased reaction rates, they present remarkable characteristics to enable more sustainable reactions: mild conditions, less toxic solvents, and reduced waste. Billion years of evolution have made enzymes extremely efficient. However wide adoption in industrial processes requires faster design using in-silico methodologies, a daunting task far from being solved. The majority of methods operate by introducing mutations in an existing amino acid (AA) sequence using a variety of assumptions and strategies to introduce variants in the original sequence. More recently, machine learning and deep generative networks have gained popularity in the field of protein engineering by leveraging prior knowledge on protein binders, their physicochemical properties, or the 3D structure. Here, we cast the problem of enzyme optimization as an evolutionary algorithm where mutations are modeled via a generalized autoregressive language model trained on fragments of AA sequences from UniProtKB. Relying on a pre-trained language model, we apply transfer learning and train a Random Forest as the scoring model on a dataset of biocatalysed chemical reactions to drive the optimization process. With our approach, using the least amount of assumptions, we can adapt active sites to perform new reactions. Our methodology allows designing enzymes with higher predicted biocatalytic activity, emulating the evolutionary process occurring in nature by sampling optimal sequences modeling the underlying proteomic language.

Keywords

protein

Symposium Organizers

N M Anoop Krishnan, Indian Institute of Technology Delhi
Mathieu Bauchy, University of California, Los Angeles
Ekin Dogus Cubuk, Google
Grace Gu, University of California, Berkeley

Symposium Support

Bronze
Patterns, Cell Press

Publishing Alliance

MRS publishes with Springer Nature