Abstract

Protein engineering is essential for a variety of applications, such as designing biologic drugs, optimizing enzymes, and developing novel functional molecules. Accurate protein fitness landscape modeling, such as predicting protein properties in sequence space, is critical for efficient protein engineering. Yet, due to the complexity of the landscape and high-dimensional sequence space, it remains as an unsolved problem. In this work, we present μFormer, a deep learning framework that combines a pre-trained protein language model with three scoring modules targeting protein features at multiple levels, to tackle this grand challenge. μFormer achieves state-of-the-art performance across diverse tasks, including predicting high-order mutants, modeling epistatic effects, handling insertion/deletion mutations, and generalizing to out-of-distribution scenarios. On the basis of prediction power, integrating μFormer with a reinforcement learning framework enables efficient exploration of the vast mutant space. We showcase that this integrated approach can design protein variants with up to 5-point mutations and potentially significant enhancement in activity for engineering tasks. The results highlight μFormer as a powerful and versatile tool for protein design, accelerating the development of innovative proteins tailored for specific applications.

Results

Fitness score of TEM1/AAV

Reset

Epistatic effects

Applications

Beta-lactamase

We employed the TEM-1-cefotaxime system to investigate if µFormer can effectively guide protein optimization. To achieve this, we developed a reinforcement learning (RL) method to search for TEM-1 variants with 1-5 point mutations that possess enhanced activity against cefotaxime. The RL method utilizes µFormer as the reward function to navigate the search, and enables efficient exploration of the vast mutant space comprising 6 × 10^18 sequences. To ensure candidate diversity, we incorporated Dirichlet noise into the PPO algorithm, which has recently been used to align large language models with human preferences.

SARS-CoV2 antibody

Method & Concept

Protein LLM

Our µFormer is a deep learning solution for mutation effect prediction, i.e., predicting the fitness score of a mutated protein sequence. Accurate predictions are achieved in two steps: first, we pre-train a masked protein language model (LM) using a large database of unlabeled protein sequences; second, we introduce three scoring modules (each with a small set of new parameters) into the pre-trained protein LM for the final fitness score prediction and train all parameters using a set of mutant protein sequences with measured fitness scores.

Reinforcement Learning

Our RL method utilizes the µFormer model as a reward function and searches for high-functioning mutants. In each episode, the RL agent sequentially mutates a single-site residue until reaching a fixed horizon. The RL algorithm alternates between two phases: exploration and learning. During the exploration phase, we use a mutation site policy network and a mutation type policy network to generate potentially high-functioning mutants. During the learning phase, we use the µFormer model to label the generated mutants and update the policy networks to provide mutants with higher fitness scores. During the exploration phase, we use a mutation site policy network and a mutation type policy network to generate potentially high-functioning mutants, aided by Dirichlet noise.