DeepMind Solves One Of The Oldest Challenges Of Biochemistry With AlphaFold

An Antarctic eelpout swims gracefully in cold dark depths without freezing its internal juices. It does this with the help of anti-freezing proteins (AFPs) which are by-products of million mutations over millions of years of evolution. The 3D structure of AFPs allows them to bind to ice crystals and prevent organisms from freezing by forming a hydrophobic layer that separates liquid from crystallising. This information of proteins is written in genes and encoded in DNA.

The Protein Folding Problem

The functionality of a protein depends on its unique 3D structure. But figuring out the 3D shape of a protein purely from its genetic sequence is a complex task that scientists have been working with for decades. The challenge is that DNA only contains information about the sequence of a protein’s building blocks called amino acid residues, which form long chains. Predicting how those chains will fold into the intricate 3D structure of a protein is what’s known as the “protein folding problem”.

“One possible sequential process which might lead a protein to land in a particular state, is the growth of the peptide chain on the ribosome, starting with the amino-terminal end and proceeding to the carboxy terminus. Computer programs have been written in such a way that any configuration can be altered to minimize the Van der Waals energy and to ensure close packing of the structure. However, this energy minimization can only be expected to alter the structure to the bottom of the local minimum; it is not intended to search through all possible configurations for a true minimum energy,” observed Cyrus Levinthal in his paper.


Sign up for your weekly dose of what's up in emerging technology.

Where AI Comes Into The Picture

Experimental techniques cost millions of dollars and employ a trial and error approach. And, with a larger chain of proteins, this process becomes exhaustive. Some common examples are:

  • Cryo-electron microscopy
  • Nuclear magnetic resonance
  • X-Ray crystallography

This is where AI makes room for itself. In order to avoid the laborious conventional techniques, researchers at DeepMind have used deep learning to model a predictor. With large amounts of genomic data available, imbibing machine learning for protein sequencing makes it easier which otherwise would have taken longer than the age of the universe.

Download our Mobile App

“Our team focused specifically on the hard problem of modelling target shapes from scratch, without using previously solved proteins as templates. We achieved a high degree of accuracy when predicting the physical properties of a protein structure, and then used two distinct methods to construct predictions of full protein structures,” noted the DeepMind’s jubilant team after their successful demonstration at CASP.

The neural networks are trained to predict the distance between amino acid pairs and the angles between chemical bonds that connect those amino acids. These two methods are used to estimate how close pairs of amino acids are to each other. Then a neural network is trained to predict how distances between protein residual pairs are distributed.

The probability scores obtained from the above methods are eventually used to estimate how accurate a proposed protein structure is.

The researchers have also trained another network that takes the average of all the distances to estimate how close the proposed structure is to the desired result.

The scoring functions are then used to search the topology of the protein that resembled the predictions. A generative neural network was trained to invent new fragments for the protein; improving the overall prediction score.

Apart from this, gradient descent has only been experimented with. With this technique the complexity involved in the previously fragmented approach was avoided; as entire protein chains were considered.

These methodologies work on standard machine learning techniques but, to solve problems which have either been ignored or have been tried unsuccessfully.

Importance Of Predicting Protein Folds

Antibody proteins are ‘Y-shaped’, a hook-like structure which picks up microorganisms that pose threat to the immune system. Whereas, collagen proteins are shaped like cords to transmit tension between cartilage, ligaments, bones, and skin. CRISPR and Cas9, act like scissors and cut and paste DNA and ribosomes act like a programmed assembly line, which help build proteins themselves.

Be it diagnosing fatal diseases or engineering a bacteria to eat up the plastic, the knowledge of a protein structure enables us to tackle problems which were thought to be impossible and irreversible.

Also see:

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges