“It would take longer than the age of the known universe to enumerate all possible configurations of a typical protein by brute force calculation.”Cyrus Levinthal.
Inside every cell of your body, billions of tiny machines are hard at work carrying oxygen, letting your eyes detect light, and making your muscles move. These microscopic machines are proteins, and they underpin every biological process ever known to us. Each protein has an intricate 3D shape that defines what it does and how it works.
In 1972, Nobel laureate Christian Anfinsen proposed a theory about how a protein’s amino acid sequence should fully determine its structure. This hypothesis triggered a quest, which lasted nearly 50 years. But to computationally predict a protein’s 3D structure based solely on its one-dimensional amino acid sequence is an expensive and time-consuming experimental method. Proteins are like a string of beads made up of 20 different types of amino acids. Interactions between these amino acids make the protein fold as it finds its shape out of almost limitless possibilities.
For decades scientists have been working on the way to figure out a protein’s final structure just from its string of amino acids. Current methods use Cryo-electron microscopy, Nuclear magnetic resonance, or X-Ray crystallography cost millions of dollars, and employ a laborious trial and error approach. This is where DeepMind’s Alphafold comes into the picture.
Researchers at Alphabet Inc.’s DeepMind started working on an AI system– four years ago– AlphaFold, to solve this protein fold problem. AlphaFold has been trained on the sequences and structures of about a hundred thousand proteins painstakingly mapped out by scientists around the world. Today, DeepMind announced that the latest version of AlphaFold is even more accurate and has been recognised by CASP as a breakthrough.
Brief Overview Of Alphafold
“Predicting how those chains [of amino acids] fold into the intricate 3D protein structure is known as the “protein folding problem.”
Predicting the 3D shape of a protein purely from its genetic sequence is a complex task. The challenge here is that the DNA only contains information about the sequence of amino acid residues, which form long chains. Predicting how those chains fold into the intricate 3D structure of a protein is what’s known as the “protein folding problem”.
A folded protein, explained the team at DeepMind, can be thought of as a “spatial graph”, where the amino acid residues from the nodes and edges connect the residues in close proximity. For the latest version of AlphaFold, DeepMind developed an attention-based neural network system that interprets the structure of this graph, while reasoning over the implicit graph that it’s building.
By iterating this process, the system developed strong predictions of the underlying physical structure of the protein and was able to determine structures in a matter of days. AlphaFold can also predict which parts of each predicted protein structure are reliable using an internal confidence measure.
AlphaFold by the numbers
- Training data: close to 170,000 protein structures from the protein data bank together with large databases containing protein sequences of unknown structure.
- Processing: 128 TPUv3 cores (roughly equivalent to ~100-200 GPUs) run over a few weeks.
DeepMind And The Cost Of A Breakthrough
“Looking at the pace of progress, I think we will have AI in a form in which it benefits a lot of users in the coming years, but I still think there’s a long-term investment for us,” said Sundar Pichai back in 2016 when he was asked about DeepMind.
Google acquired DeepMind for $500 million in 2014 and ever since DeepMind has expanded itself to more than 700 employees, and have openly published over 200 peer-reviewed papers. From fighting environmental impact at Google’s data centres to Google’s video call apps, DeepMind’s solutions were incorporated heavily.
However, reports of DeepMind’s debts started to come out showing how DeepMind’s losses grew to $570 million (£470 million) in 2019, up from $341 million (£281 million) in 2017, and $154 million (£127 million) in 2016. From establishing the labs to paying the top researchers with hefty packages, there are many unavoidable expenses for any AI organisation. More so in the case of DeepMind whose research interests are nothing short of esoteric.
DeepMind’s team is knee-deep into solving humanity’s most pressing challenges in fields such as neuroscience. Unlike OpenAI’s GPT-3, which has appealing commercial aspects, DeepMind’s accomplishments cannot be translated into billable APIs. As questions regarding DeepMind’s fiscal troubles for Google started to mount, AlphaFold’s recognition came as a relief. The implications of deciphering protein structures with AlphaFold are immense in the drug discovery industry. The returns can be unprecedented and can wipe out all the losses. Regardless of the financial aspect, the successes of AI labs like DeepMind fortifies the role of AI in assisting the most challenging scientific endeavours in the coming decades.
Celebrating the breakthrough, Nobel Laureate Venki Ramakrishnan said that this computational work represents a stunning advance on the protein-folding problem. “It will be exciting to see the many ways in which it will fundamentally change biological research,” he said.
What Does The Future Hold
Starting from diagnosing fatal diseases to engineering a bacteria to eat up the plastic, the knowledge of a protein structure will enable us to tackle problems which were thought to be impossible and irreversible.
AlphaFold’s predictions could enable progress in all sorts of areas — imagine a future where we can understand diseases more quickly and develop drugs to fight them or one where we could use enzymes to break down plastic waste or even to capture carbon from the atmosphere or with the help of proteins. Although, there’s a lot more work to be done, unlocking the shapes of these building blocks could help scientists better understand the natural world and perhaps expand our knowledge of life itself.
Predicting protein structure could be useful in future pandemic response efforts as well. Protein structure predictions could also contribute to the understanding of specific diseases with a small number of specialist groups. Identifying proteins that have malfunctioned could enable more precise work on drug development and help find promising treatments faster.
According to Arthur Levinson, founder of Calico, AlphaFold is a once in a generation advance, and it demonstrates how computational methods are poised to transform research in biology and hold much promise for accelerating the drug discovery process.