Protein Wars: It’s ESMFold vs AlphaFold

While AlphaFold 2 and RoseTTAFold have similar accuracy, ESMFold inference is faster at enabling the exploration of structural spaces of metagenomic proteins
Protein Wars: It’s ESMFold vs AlphaFold

Design by Made using DALL.E

Listen to this story

Last month, Meta AI’s researchers launched a breakthrough model called Evolutionary Scale Modeling, or ESM, for protein structure prediction. This new model is touted to be one of the closest alternatives to DeepMind’s AlphaFold 2, which essentially solved the 50-year-old grand challenge of protein folding. Over the years, Meta AI has launched several models, and its most recent work has been released to the public. 

Check out the GitHub repository here

Besides ESMFold and AlphaFold, there are plenty of protein prediction models, including RoseTTAFold, IntFOLD, RaptorX and others. Here’s a quick overview of the models: 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

ESMFold vs AlphaFold 

Meta AI claimed that AlphaFold 2 and RoseTTAFold have similar accuracy, but ESMFold inference is faster at enabling the exploration of structural spaces of metagenomic proteins. Metagenomics is a technique of sequencing DNA purified directly from a natural environment. 

While AlphaFold uses a network-based model, ESMFold leverages a large-scale language model for protein prediction. Meta AI team said that the improvements in language modelling perplexity and structure learning continue through 15 billion parameters. In comparison, the team said their latest model, ESM2, at 15 million parameters, is better than their older model, ESM1b, at 650 million parameters. 

In addition, AlphaFold 2 and other alternatives use multiple sequence alignments (MSAs) and templates of similar proteins to achieve optimal performance or breakthrough success in atomic-resolution structure prediction. However, ESMFold generates structure prediction using only one sequence as input by leveraging the internal representations of the language model. 

With a single sequence as input, ESMFold produces more accurate atomic-level predictions than AlphaFold and competes with RoseTTAFold when given full multiple sequence alignments (MSAs). 

ESMFold produces comparable predictions for low-perplexity sequences, and that structure prediction accuracy correlates with language model perplexity in general. In other words, when a language model can better comprehend a sequence, it can comprehend a structure better. 

One of the advantages of ESMFold is that it offers a faster prediction speed than existing atomic resolution structure predictors. This, in a way, allows it to bridge the gap between the rapid growth of protein sequence databases containing billions of sequences alongside the slower development of protein structure and function databases. The model is used to rapidly compute one million predicted structures representing a diverse subset of metagenomic sequence spaces that lacks labelled structure or function. 

Last month, DeepMind, in collaboration with European Bioinformatics Institute (EMBL-EBI), released predicted structures for nearly all catalogued proteins, which will expand the AlphaFold database by over 200x – from nearly 1 million structures to over 200 million structures – with the potential to increase our understanding of biology significantly. 

AlphaFold, initially launched in 2018, published its second version in 2020, and released an open-source version of its deep-learning neural network AlphaFold 2 last year. With this, the team said that the new model significantly increases the accuracy of predicted multimeric interfaces over input-adapted single-chain AlphaFold, while maintaining high intra-chain accuracy. 

One of the biggest performance drivers for ESMFold has been the language model. For instance, when ESM-2 understands the protein sequence well, you can obtain predictions comparable to those made by other models when language modelling perplexity is high. In other words, it is possible to obtain accurate atomic resolution structure predictions with ESMFold – i.e. up to two orders of magnitude faster than AlphaFold 2. 

Meta AI said billions of protein sequences have unknown structures and functions, many from metagenomic sequencing. ESMFold makes it possible to map this structural space in practical timescales, where they can fold a random sample of 1 million metagenomic sequences in a few hours. Moreover, the researchers believe that ESMFold can help to understand regions of protein space that are distant from existing knowledge. 

A new ‘super fast’ protein-predicting model emerges 

ESMFold and AlphaFold are not alone. OmegaFold, developed by Chinese biotech firm Helixon, also predicts high-resolution protein structure from a single primary sequence. Recently, this model outperformed rival RoseTTAFold while achieving similar prediction accuracy to AlphaFold 2. 

Only recently, the company made its code publicly available, joining the likes of  AlphaFold and ESMFold, which are also open source.

Why is this a big deal? 

The folding of proteins helps researchers and scientists understand the underlying cause of many diseases. Knowing these protein folding, protein design, etc., helps find a cure, design new medicines, drugs, pharmaceutical solutions, etc. 

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox