MITB Banner

2022 was The Year of Protein Folding Models. Wait, What?

Even though the enterprise was harnessing the powers of generative AI in 2022, if we talk about research, 2022 was definitely the year of protein fold predictions

Share

Listen to this story

Lean in, the most-cited paper of 2022 was not about generative AI – and wasn’t even from a big-tech. European Molecular Biology Laboratory (EMBL-EBI) and DeepMind, published AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models, which was cited 1331 times.

It gets even more interesting. The second most-cited paper tells the same story – protein folding and not from big-tech. ColabFold: making protein folding accessible to all, by Max Planck Institute for Multidisciplinary Sciences, was cited 1138 times. 

So, even though 2022 was described as “the year of generative AI”, if we look at the most cited papers of 2022, that does not hold true completely. 

Zeta Alpha published its report about the 100 most cited AI papers in 2022 that also includes a comparison from 2021. The list is divided on the basis of organisations, type of organisation like academia or industry, country, and also the number of times the paper was tweeted, which is just a random measure. 

Harnessing Protein-Folding Power

After the success of AlphaFold 1 in 2018, DeepMind increased its speed and accuracy even further. AlphaFold 2 won the CASP14 in 2020, and is regarded as the best protein-folding model. The collaboration with EMBL-EBI in July predicted the structure of a 200 times bigger protein database. 

AlphaFold was born with the noble purpose of figuring out the shape of solo proteins. However, it didn’t take long for scientists to realise that with a little bit of tweaking, they could use the software to reveal the intricate dance between multiple proteins. Since then, researchers have been on a roll, coming up with all sorts of nifty tricks to make AlphaFold better at handling complex protein puzzles. DeepMind even dropped an upgrade called AlphaFold-Multimer, which is basically giving the software a superhero cape to take on even more challenging tasks.

AlphaFold’s progress opened the gates for every other organisation stepping into the field. Chinese biotech firm, Helixon developed OmegaFold, joining the race, and beat its competitors in many areas. Meta brought in ESMFold, Baker Lab brought in RoseTTAFold, and the list goes on. 

Read: New Algorithms That Harnessed Protein-folding Power in 2022

Even though the enterprise world was harnessing the powers of generative AI in 2022, if we talk about research, 2022 was definitely the year of protein fold predictions, and generative AI helped as well. Many credit the rise of generative AI in the field for protein prediction to grow. 

Last month, NVIDIA and Evozyne created a generative AI model for proteins, speeding up drug discovery. The model is based on training data that consists of known protein structures, and they can use this data to generate predictions for proteins whose structure is unknown. 

Boston-based Generate Biomedicines came up with a diffusion model-based protein prediction model, Chroma, calling it the DALL-E 2 of Biology. The RoseTTAFold researchers also implemented diffusion models and released RoseTTAFoldDiffusion.

Of course, generative AI is only one part of the puzzle. The sheer complexity of protein folding means that there are many challenges that must be overcome in order to accurately predict a protein’s structure. But with new advances in AI and computational techniques, researchers are making progress in this field faster than ever before.

But What About Generative AI

Even though protein prediction papers topped the list, we must credit generative AI for the progress in protein fold prediction. Generative AI papers including diffusion models, LLMs, and computer vision models are present throughout the list of the top-cited papers.

Meta comes in third with A ConvNet for the 2020s, a paper published alongside UC Berkeley, which was cited 835 times. The paper talks about the hybrid approach of Swin Transformers that had made Transformers as the generic vision backbone, marking transformers as the superior for vision tasks, and introducing a pure ConvNet for testing its abilities. 

The list goes on with Hierarchical Text-Conditional Image Generation with CLIP Latents from OpenAI. Then comes Google’s PaLM: Scaling Language Modeling with Pathways and Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Training language models to follow instructions with human feedback, the OpenAI research that powers ChatGPT is at the 10th place with just 254 citations. 

Google has consistently been the strongest player with most papers published since 2020, followed by Meta, Microsoft, UC Berkeley, and Stanford. OpenAI or DeepMind are not even in the top 20 organisations when we talk about the volume of publications. But they have the highest impact — ratio of papers published vs how many are in the top 100 papers. 

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.