An Antarctic eelpout swims gracefully in cold dark depths without freezing its internal juices. It does this with the help of anti-freezing proteins (AFPs) which are by-products of million mutations over millions of years of evolution. The 3D structure of AFPs allows them to bind to ice crystals and prevent organisms from freezing by forming a hydrophobic layer that separates liquid from crystallising. This information of proteins is written in genes and encoded in DNA.
The Protein Folding Problem
The functionality of a protein depends on its unique 3D structure. But figuring out the 3D shape of a protein purely from its genetic sequence is a complex task that scientists have been working with for decades. The challenge is that DNA only contains information about the sequence of a protein’s building blocks called amino acid residues, which form long chains. Predicting how those chains will fold into the intricate 3D structure of a protein is what’s known as the “protein folding problem”.
“One possible sequential process which might lead a protein to land in a particular state, is the growth of the peptide chain on the ribosome, starting with the amino-terminal end and proceeding to the carboxy terminus. Computer programs have been written in such a way that any configuration can be altered to minimize the Van der Waals energy and to ensure close packing of the structure. However, this energy minimization can only be expected to alter the structure to the bottom of the local minimum; it is not intended to search through all possible configurations for a true minimum energy,” observed Cyrus Levinthal in his paper.
Where AI Comes Into The Picture
Experimental techniques cost millions of dollars and employ a trial and error approach. And, with a larger chain of proteins, this process becomes exhaustive. Some common examples are:
- Cryo-electron microscopy
- Nuclear magnetic resonance
- X-Ray crystallography
This is where AI makes room for itself. In order to avoid the laborious conventional techniques, researchers at DeepMind have used deep learning to model a predictor. With large amounts of genomic data available, imbibing machine learning for protein sequencing makes it easier which otherwise would have taken longer than the age of the universe.
“Our team focused specifically on the hard problem of modelling target shapes from scratch, without using previously solved proteins as templates. We achieved a high degree of accuracy when predicting the physical properties of a protein structure, and then used two distinct methods to construct predictions of full protein structures,” noted the DeepMind’s jubilant team after their successful demonstration at CASP.
The neural networks are trained to predict the distance between amino acid pairs and the angles between chemical bonds that connect those amino acids. These two methods are used to estimate how close pairs of amino acids are to each other. Then a neural network is trained to predict how distances between protein residual pairs are distributed.
The probability scores obtained from the above methods are eventually used to estimate how accurate a proposed protein structure is.
The researchers have also trained another network that takes the average of all the distances to estimate how close the proposed structure is to the desired result.
The scoring functions are then used to search the topology of the protein that resembled the predictions. A generative neural network was trained to invent new fragments for the protein; improving the overall prediction score.
Apart from this, gradient descent has only been experimented with. With this technique the complexity involved in the previously fragmented approach was avoided; as entire protein chains were considered.
These methodologies work on standard machine learning techniques but, to solve problems which have either been ignored or have been tried unsuccessfully.
Importance Of Predicting Protein Folds
Antibody proteins are ‘Y-shaped’, a hook-like structure which picks up microorganisms that pose threat to the immune system. Whereas, collagen proteins are shaped like cords to transmit tension between cartilage, ligaments, bones, and skin. CRISPR and Cas9, act like scissors and cut and paste DNA and ribosomes act like a programmed assembly line, which help build proteins themselves.
Be it diagnosing fatal diseases or engineering a bacteria to eat up the plastic, the knowledge of a protein structure enables us to tackle problems which were thought to be impossible and irreversible.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad