Listen to this story
While accepting his Nobel Prize for Chemistry in 1972, Christian Anfinsen famously predicted that, in theory, a protein’s amino acid sequence would be able to determine its structure fully. Based on this hypothesis, it was predicted that in the next 50 years, researchers would be able to computationally predict a protein’s 3D structure based only on its 1D amino acid sequence. The major challenge was that the number of ways a protein could fold theoretically before settling into the final 3D structure was very large. In fact, Cyrus Levinthal, a renowned American microbiologist, had famously said that it might take longer than the age of the known universe to identify all the possible configurations of a typical protein by brute force calculation.
All these hypotheses and estimations fell flat in the face of DeepMind’s breakthrough innovation – AlphaFold. Solving a 50-year-old challenge, AlphaFold could now predict the structure of the protein. It was dubbed the most important innovation at the intersection of AI and biology by a few experts. Just when one thought things could not get better, DeepMind released AlphaFold 2, a software that could not only predict the structure of almost every protein made by humans but also the entire proteomes of 20 other widely studied organisms – this constitutes over 365,000 structures in all.
Now, after a year, where do we stand with AlphaFold and structural biology on the whole?
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
AlphaFold accelerating scientific research
DeepMind and EMBL’s European Bioinformatics Institute partnered to create the AlphaFold database, which will make the predictions publicly available for the larger interest of the scientific community. This database covers the complete human proteome, which includes fragments for long proteins and the proteomes of 47 other key animals. Under this partnership, the two institutions had announced that in 2022, the database would be further expanded to include a large portion of all catalogues proteins.
Number of research works citing AlphaFold2. Credit: Nature
Open-sourcing this database has helped carry out some major research works. Structural biologist Thomas Boesen and microbial ecologist Tina Šantl-Temkiv (both are working at Denmark-based Aarhus University) used AlphaFold results to model the structure of bacterial proteins that catalyse ice formation. To date, biologists haven’t been able to determine the structures experimentally; the success of Boesen and Šantl-Temkiv’s experiment could help understand better the cooling effects of ice in clouds.
A team led by computational biologist Martin Steinegger have used AlphaFold to develop Foldseek. It is a tool that helps in finding the relatives of the RNA-copying enzyme of SARS-CoV-2 (COVID-19 causing virus). This research resulted in the discovery of previously unknown/unidentified proteins across eukaryotes, including slime moulds. In their 3D structure, they resemble enzymes called the reverse transcriptases, which viruses like HIV use to copy RNA into DNA. This is an expected result, especially considering that the two share very little similarity at the genetic sequence level.
For a few other experiments, while AlphaFold did not prove to be an immediate solution, it served as a good approximation that could be further refined by experiments. For example, X-ray crystallography appears as a pattern of diffracted X-rays, and scientists require a starting guess at a protein’s structure to interpret these patterns. Previously, scientists would assemble information related to proteins or use experimental approaches. However, with the introduction of AlphaFold, these approaches have been rendered useless.
AlphaFold models have also accurately predicted the unique features of G-protein-coupled receptors, which are important drug targets.
That said, AlphaFold models often need support from extra software to get fine details. Once this aspect is taken into account, AlphaFold structures prove good enough to guide drug discovery at large. Further, AlphaFold is designed to predict a single structure; however, many proteins take multiple conformations, which is important for their function. The predictions by AlphaFold models are for structures in isolation, even though many proteins function along with ligands like DNA and RNA.
As per scientist Luciano Abriata, the interest in AlphaFold peaked between late 2020 and mid-2021. The second peak did not drop to zero even after months. Now the graph has hit a plateau, and a stable baseline is likely to sustain as many biologists continue to use its results for their experiments.
Credit: Luciano Abriata
Further, it is time for CASP15. In its last round – CASP 14 – close to 100 groups from around the world submitted more than 67,000 models on 84 modelling targets, among which DeepMind’s AlphaFold emerged the winner.