Active Hackathon

Can ML Algorithms Simplify The Process Of Protein Engineering?

Machine learning is rapidly being employed in the field of protein engineering.

Machine learning algorithms aid in protein engineering by decreasing the experimental burden associated with techniques like directed evolution, which entails several rounds of mutagenesis and high-throughput screening. Although numerous machine learning techniques exist, only a few utilise the target protein’s evolutionary history. This is where the Evolutionary Context-integrated Neural Network (ECNet) algorithm, a type of deep learning, comes into play.

Huimin Zhao, the Steven L. Miller Chair of Chemical and Biomolecular Engineering, is also the Director of the National Science Foundation-funded Molecule Maker Lab Institute. He stated that “Using ECNet, we can examine the target protein and all of its homologs to determine which residues are connected together and hence critical for that protein. We then integrate this data and employ a deep learning framework to determine which mutations are critical for the target protein’s function.”


Sign up for your weekly dose of what's up in emerging technology.

What is Protein Engineering?

The process of protein engineering entails the development of useful or valuable proteins. It is a relatively new field of study, with significant research being conducted to gain a better knowledge of protein folding and recognition of protein design principles.

ML Models And Protein Engineering

An ML model is trained to learn the sequence-function link using sequencing and screening data in ML-assisted directed evolution. The model simulates and predicts the fitness of all potential sequences in one round of directed evolution and then uses a restricted list of the best-performing variants as the starting point for the next round of directed evolution. In contrast to classical directed evolution, ML-assisted directed evolution may learn the complete functional landscape from data and escape from the local optimum. It makes full use of all available sequencing and screening data, even those for unimproved variants, allowing for a more efficient traversal of the fitness landscape.

Numerous ML algorithms have been developed to forecast mutational effects using the evolutionary knowledge contained in homologous sequences. However, due to their unsupervised nature, these approaches cannot exploit the fitness data from tested variants accessible during the directed evolution process and so may have limited accuracy when leading the protein engineering process.

Understanding ECNet

ECNet, a deep learning model that guides protein engineering by predicting the fitness of proteins based on their sequence, is a deep learning model that guides protein engineering. ECNet learns the mapping between protein sequences and their associated functional measurements from data. 

Source: ECNet

The researchers employed the LSTM neural network architecture and large-scale deep mutational scanning datasets to train protein-specific models.

Model Development

The construction of a machine learning model that reliably maps protein sequences to functions for unseen variants is a significant challenge in machine learning-guided protein engineering. While models for qualitatively classifying protein sequences into function classes have been created, such as those used in the Critical Assessment of Functional Annotation (CAFA) challenge,​​ the protein engineering prediction models need to provide a more fine-grained summary. ECNet, which predicts the function of proteins based on their sequence, hence facilitates the process of protein engineering. Recently, supervised ML algorithms for predicting protein sequence-function correlations have been investigated.

This ML model is unusual in that it learned the sequence-function link using a biologically driven sequence modelling technique, resulting in higher performance in predicting the fitness of protein variations. ECNet beat various existing ML models for protein engineering when evaluated against a large set of deep mutational scanning investigations. Additionally, ECNet accurately captures the epistasis effects of mutations within protein sequences and may be expanded to anticipate higher-order mutants’ functions using lower-order mutants’ data.

The researchers anticipate that ECNet will be a useful tool for machine learning-guided protein engineering. The sequence-to-function model can be used in conjunction with other sequence design techniques to pick the next batch of variations to screen during a round of directed evolution.


  1. Envision dataset
  2. DeepSequence dataset

To know more about the ECNet, read the article.

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM