Protein Wars Part 2: It’s OmegaFold vs AlphaFold

OmegaFold achieved much higher statistical prediction accuracy in comparison to AlphaFold 2
The mirror circuit versus cycle benchmarking method for quantum computation
Listen to this story

On July 20, 2022, Chinese biotech firm Helixon launched OmegaFold, the first computational method to predict high-resolution protein structure from a single primary sequence successfully. This new study by Chinese researchers fills a much-encountered gap in structure prediction and inches closer to understanding protein folding in nature.

Recently, the company open-sourced its project, joining the likes of DeepMind’s AlphaFold, RoseTTAFold, and Meta AI’s ESMFold, among others, which are also open source. The initial version of the code and model is available on GitHub

Understanding protein folding helps researchers and scientists know the underlying cause of many diseases and abnormalities. It also helps find a cure, design new medicines, pharmaceutical solutions, and alternative treatments. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

This new model developed by Helixon claims to outperform RoseTTAFold and achieve similar prediction accuracy to AlphaFold 2 on the recently released structure. In a study, the researchers said they had used a new combination of a protein language model that allows them to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures. 

In addition, OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionality characterised protein family and antibodies that tend to have noisy MSAs (multiple sequence alignments) due to fast evolution. 

OmegaFold vs AlphaFold vs ESMFold 

A month ago, Meta AI launched a breakthrough model called Evolutionary Scale Modelling, or ESM, for faster protein structure prediction. This model, too, claimed to have similar accuracy as AlphaFold2 and RoseTTAFold, but ESMFold inference is faster at enabling the exploration of structural spaces of metagenomic proteins. 

There seem to be glaring similarities between ESMFold, AlphaFold, and OmegaFold. The team said that the overall model of OmegaFold is conceptually inspired by advances in language models for NLP coupled with deep neural networks used in AlphaFold2. 

OmegaFold leverages a deep transformer-based protein language model, trained on a large collection of unaligned and unlabeled protein sequences, to learn single-and pairwise -residue representations as powerful features that model the distribution of sequences. 

The Omega protein language model (PLM) can capture structural and functional information encoded in the amino-acid sequences through the embeddings. These are later fed into Geoformer, a new geometry-inspired transformer neural network, to distill the structural and physical pairwise relationships between amino acids. Finally, a structural module predicts the 3D coordinates of all heavy atoms. 

ESMFold, on the other hand, leverages a large-scale language model for protein prediction. The improvements in language modelling perplexity and structure learning continue through 15 billion parameters. Meanwhile, AlphaFold uses a network-based architecture and training proceeds based on evolutionary, physical and geometric constraints of protein structures. 

The researchers noted that their model (OmegaFold) performs well on CASP and CAMEO benchmark datasets, spanning a wide range of prediction difficulty levels. In comparison, OmegaFold, with a single sequence as input, were as accurate as the advanced MSA-based methods, including AlphaFold 2 and RoseTTATold. 

As shown below, OmegaFold structures had a mean local-distance difference test (LDDT) score of 0.82 on the CAMEO dataset, with comparable accuracy to RoseTTAFold structures (0.75 mean LDDT score) and similar to AlphaFold 2 structures (0.86 mean LDDT) predicted from MSAs. Local-distance difference tests, or LDDT, are commonly used metrics for structure evaluation. 

On the CASP dataset, OmegaFold structures were also quite accurate, with an average TM-score of 0.79, slightly lower than that of RoseTTAFold structures (0.81 mean TM-score) and equivalent to AlphaFold 2 structures (0.79 mean TM–score). Meanwhile, ESMFold achieved a TM-score of 0.71 on the CAMEO test set and 0.53 on the CASP dataset. TM-score is a common metric for assessing protein structure’s topological similarity. 

A score above 0.90 is considered roughly equivalent to the experimentally determined structure. 

On single-sequence input, OmegaFold wins 

Over the years, several companies have used deep learning to exploit evolutionary information in MSAs (multiple sequence alignments) to accurately predict protein structures. On the contrary, MSAs of homologous proteins are not always available, including orphan proteins and antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its 3D structure. The OmegaFold team suggested that evolutionary information and MSAs should not be necessary to predict a protein’s folded form. 

This is where the new ‘super fast’ protein production model OmegaFold comes into the picture. It outperformed AlphaFold 2 and RoseTTAFold on single-sequence inputs. Further, OmegaFold achieved much higher statistical prediction accuracy in comparison to AlphaFold 2, likely due to the advantages of its single-sequence-based prediction method, both on antibody loops and orphan proteins. 

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM