Listen to this story
Lean in, the most-cited paper of 2022 was not about generative AI – and wasn’t even from a big-tech. European Molecular Biology Laboratory (EMBL-EBI) and DeepMind, published AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models, which was cited 1331 times.
It gets even more interesting. The second most-cited paper tells the same story – protein folding and not from big-tech. ColabFold: making protein folding accessible to all, by Max Planck Institute for Multidisciplinary Sciences, was cited 1138 times.
So, even though 2022 was described as “the year of generative AI”, if we look at the most cited papers of 2022, that does not hold true completely.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Zeta Alpha published its report about the 100 most cited AI papers in 2022 that also includes a comparison from 2021. The list is divided on the basis of organisations, type of organisation like academia or industry, country, and also the number of times the paper was tweeted, which is just a random measure.
Harnessing Protein-Folding Power
After the success of AlphaFold 1 in 2018, DeepMind increased its speed and accuracy even further. AlphaFold 2 won the CASP14 in 2020, and is regarded as the best protein-folding model. The collaboration with EMBL-EBI in July predicted the structure of a 200 times bigger protein database.
AlphaFold was born with the noble purpose of figuring out the shape of solo proteins. However, it didn’t take long for scientists to realise that with a little bit of tweaking, they could use the software to reveal the intricate dance between multiple proteins. Since then, researchers have been on a roll, coming up with all sorts of nifty tricks to make AlphaFold better at handling complex protein puzzles. DeepMind even dropped an upgrade called AlphaFold-Multimer, which is basically giving the software a superhero cape to take on even more challenging tasks.
AlphaFold’s progress opened the gates for every other organisation stepping into the field. Chinese biotech firm, Helixon developed OmegaFold, joining the race, and beat its competitors in many areas. Meta brought in ESMFold, Baker Lab brought in RoseTTAFold, and the list goes on.
Even though the enterprise world was harnessing the powers of generative AI in 2022, if we talk about research, 2022 was definitely the year of protein fold predictions, and generative AI helped as well. Many credit the rise of generative AI in the field for protein prediction to grow.
Last month, NVIDIA and Evozyne created a generative AI model for proteins, speeding up drug discovery. The model is based on training data that consists of known protein structures, and they can use this data to generate predictions for proteins whose structure is unknown.
Boston-based Generate Biomedicines came up with a diffusion model-based protein prediction model, Chroma, calling it the DALL-E 2 of Biology. The RoseTTAFold researchers also implemented diffusion models and released RoseTTAFoldDiffusion.
Of course, generative AI is only one part of the puzzle. The sheer complexity of protein folding means that there are many challenges that must be overcome in order to accurately predict a protein’s structure. But with new advances in AI and computational techniques, researchers are making progress in this field faster than ever before.
But What About Generative AI
Even though protein prediction papers topped the list, we must credit generative AI for the progress in protein fold prediction. Generative AI papers including diffusion models, LLMs, and computer vision models are present throughout the list of the top-cited papers.
Meta comes in third with A ConvNet for the 2020s, a paper published alongside UC Berkeley, which was cited 835 times. The paper talks about the hybrid approach of Swin Transformers that had made Transformers as the generic vision backbone, marking transformers as the superior for vision tasks, and introducing a pure ConvNet for testing its abilities.
The list goes on with Hierarchical Text-Conditional Image Generation with CLIP Latents from OpenAI. Then comes Google’s PaLM: Scaling Language Modeling with Pathways and Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Training language models to follow instructions with human feedback, the OpenAI research that powers ChatGPT is at the 10th place with just 254 citations.
Google has consistently been the strongest player with most papers published since 2020, followed by Meta, Microsoft, UC Berkeley, and Stanford. OpenAI or DeepMind are not even in the top 20 organisations when we talk about the volume of publications. But they have the highest impact — ratio of papers published vs how many are in the top 100 papers.