Protein Wars Part 2: It’s OmegaFold vs AlphaFold

OmegaFold achieved much higher statistical prediction accuracy in comparison to AlphaFold 2
The mirror circuit versus cycle benchmarking method for quantum computation

Design by The mirror circuit versus cycle benchmarking method for quantum computation

Listen to this story

On July 20, 2022, Chinese biotech firm Helixon launched OmegaFold, the first computational method to predict high-resolution protein structure from a single primary sequence successfully. This new study by Chinese researchers fills a much-encountered gap in structure prediction and inches closer to understanding protein folding in nature.

Recently, the company open-sourced its project, joining the likes of DeepMind’s AlphaFold, RoseTTAFold, and Meta AI’s ESMFold, among others, which are also open source. The initial version of the code and model is available on GitHub

Understanding protein folding helps researchers and scientists know the underlying cause of many diseases and abnormalities. It also helps find a cure, design new medicines, pharmaceutical solutions, and alternative treatments. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

This new model developed by Helixon claims to outperform RoseTTAFold and achieve similar prediction accuracy to AlphaFold 2 on the recently released structure. In a study, the researchers said they had used a new combination of a protein language model that allows them to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures. 




In addition, OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionality characterised protein family and antibodies that tend to have noisy MSAs (multiple sequence alignments) due to fast evolution. 

OmegaFold vs AlphaFold vs ESMFold 

A month ago, Meta AI launched a breakthrough model called Evolutionary Scale Modelling, or ESM, for faster protein structure prediction. This model, too, claimed to have similar accuracy as AlphaFold2 and RoseTTAFold, but ESMFold inference is faster at enabling the exploration of structural spaces of metagenomic proteins. 

There seem to be glaring similarities between ESMFold, AlphaFold, and OmegaFold. The team said that the overall model of OmegaFold is conceptually inspired by advances in language models for NLP coupled with deep neural networks used in AlphaFold2. 

OmegaFold leverages a deep transformer-based protein language model, trained on a large collection of unaligned and unlabeled protein sequences, to learn single-and pairwise -residue representations as powerful features that model the distribution of sequences. 

The Omega protein language model (PLM) can capture structural and functional information encoded in the amino-acid sequences through the embeddings. These are later fed into Geoformer, a new geometry-inspired transformer neural network, to distill the structural and physical pairwise relationships between amino acids. Finally, a structural module predicts the 3D coordinates of all heavy atoms. 

ESMFold, on the other hand, leverages a large-scale language model for protein prediction. The improvements in language modelling perplexity and structure learning continue through 15 billion parameters. Meanwhile, AlphaFold uses a network-based architecture and training proceeds based on evolutionary, physical and geometric constraints of protein structures. 

The researchers noted that their model (OmegaFold) performs well on CASP and CAMEO benchmark datasets, spanning a wide range of prediction difficulty levels. In comparison, OmegaFold, with a single sequence as input, were as accurate as the advanced MSA-based methods, including AlphaFold 2 and RoseTTATold. 

As shown below, OmegaFold structures had a mean local-distance difference test (LDDT) score of 0.82 on the CAMEO dataset, with comparable accuracy to RoseTTAFold structures (0.75 mean LDDT score) and similar to AlphaFold 2 structures (0.86 mean LDDT) predicted from MSAs. Local-distance difference tests, or LDDT, are commonly used metrics for structure evaluation. 

On the CASP dataset, OmegaFold structures were also quite accurate, with an average TM-score of 0.79, slightly lower than that of RoseTTAFold structures (0.81 mean TM-score) and equivalent to AlphaFold 2 structures (0.79 mean TM–score). Meanwhile, ESMFold achieved a TM-score of 0.71 on the CAMEO test set and 0.53 on the CASP dataset. TM-score is a common metric for assessing protein structure’s topological similarity. 

A score above 0.90 is considered roughly equivalent to the experimentally determined structure. 

On single-sequence input, OmegaFold wins 

Over the years, several companies have used deep learning to exploit evolutionary information in MSAs (multiple sequence alignments) to accurately predict protein structures. On the contrary, MSAs of homologous proteins are not always available, including orphan proteins and antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its 3D structure. The OmegaFold team suggested that evolutionary information and MSAs should not be necessary to predict a protein’s folded form. 

This is where the new ‘super fast’ protein production model OmegaFold comes into the picture. It outperformed AlphaFold 2 and RoseTTAFold on single-sequence inputs. Further, OmegaFold achieved much higher statistical prediction accuracy in comparison to AlphaFold 2, likely due to the advantages of its single-sequence-based prediction method, both on antibody loops and orphan proteins. 

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Download our Mobile App

MachineHack

AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.