Protein Wars Part 2: It’s OmegaFold vs AlphaFold

OmegaFold achieved much higher statistical prediction accuracy in comparison to AlphaFold 2
The mirror circuit versus cycle benchmarking method for quantum computation
Listen to this story

On July 20, 2022, Chinese biotech firm Helixon launched OmegaFold, the first computational method to predict high-resolution protein structure from a single primary sequence successfully. This new study by Chinese researchers fills a much-encountered gap in structure prediction and inches closer to understanding protein folding in nature.

Recently, the company open-sourced its project, joining the likes of DeepMind’s AlphaFold, RoseTTAFold, and Meta AI’s ESMFold, among others, which are also open source. The initial version of the code and model is available on GitHub

Understanding protein folding helps researchers and scientists know the underlying cause of many diseases and abnormalities. It also helps find a cure, design new medicines, pharmaceutical solutions, and alternative treatments. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

This new model developed by Helixon claims to outperform RoseTTAFold and achieve similar prediction accuracy to AlphaFold 2 on the recently released structure. In a study, the researchers said they had used a new combination of a protein language model that allows them to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures. 

In addition, OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionality characterised protein family and antibodies that tend to have noisy MSAs (multiple sequence alignments) due to fast evolution. 


Download our Mobile App



OmegaFold vs AlphaFold vs ESMFold 

A month ago, Meta AI launched a breakthrough model called Evolutionary Scale Modelling, or ESM, for faster protein structure prediction. This model, too, claimed to have similar accuracy as AlphaFold2 and RoseTTAFold, but ESMFold inference is faster at enabling the exploration of structural spaces of metagenomic proteins. 

There seem to be glaring similarities between ESMFold, AlphaFold, and OmegaFold. The team said that the overall model of OmegaFold is conceptually inspired by advances in language models for NLP coupled with deep neural networks used in AlphaFold2. 

OmegaFold leverages a deep transformer-based protein language model, trained on a large collection of unaligned and unlabeled protein sequences, to learn single-and pairwise -residue representations as powerful features that model the distribution of sequences. 

The Omega protein language model (PLM) can capture structural and functional information encoded in the amino-acid sequences through the embeddings. These are later fed into Geoformer, a new geometry-inspired transformer neural network, to distill the structural and physical pairwise relationships between amino acids. Finally, a structural module predicts the 3D coordinates of all heavy atoms. 

ESMFold, on the other hand, leverages a large-scale language model for protein prediction. The improvements in language modelling perplexity and structure learning continue through 15 billion parameters. Meanwhile, AlphaFold uses a network-based architecture and training proceeds based on evolutionary, physical and geometric constraints of protein structures. 

The researchers noted that their model (OmegaFold) performs well on CASP and CAMEO benchmark datasets, spanning a wide range of prediction difficulty levels. In comparison, OmegaFold, with a single sequence as input, were as accurate as the advanced MSA-based methods, including AlphaFold 2 and RoseTTATold. 

As shown below, OmegaFold structures had a mean local-distance difference test (LDDT) score of 0.82 on the CAMEO dataset, with comparable accuracy to RoseTTAFold structures (0.75 mean LDDT score) and similar to AlphaFold 2 structures (0.86 mean LDDT) predicted from MSAs. Local-distance difference tests, or LDDT, are commonly used metrics for structure evaluation. 

On the CASP dataset, OmegaFold structures were also quite accurate, with an average TM-score of 0.79, slightly lower than that of RoseTTAFold structures (0.81 mean TM-score) and equivalent to AlphaFold 2 structures (0.79 mean TM–score). Meanwhile, ESMFold achieved a TM-score of 0.71 on the CAMEO test set and 0.53 on the CASP dataset. TM-score is a common metric for assessing protein structure’s topological similarity. 

A score above 0.90 is considered roughly equivalent to the experimentally determined structure. 

On single-sequence input, OmegaFold wins 

Over the years, several companies have used deep learning to exploit evolutionary information in MSAs (multiple sequence alignments) to accurately predict protein structures. On the contrary, MSAs of homologous proteins are not always available, including orphan proteins and antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its 3D structure. The OmegaFold team suggested that evolutionary information and MSAs should not be necessary to predict a protein’s folded form. 

This is where the new ‘super fast’ protein production model OmegaFold comes into the picture. It outperformed AlphaFold 2 and RoseTTAFold on single-sequence inputs. Further, OmegaFold achieved much higher statistical prediction accuracy in comparison to AlphaFold 2, likely due to the advantages of its single-sequence-based prediction method, both on antibody loops and orphan proteins. 

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.