Biotech Labs Bank on Generative AI to Design New Protein Structures

Generate Biomedicines and David Baker‘s Group came up with Chroma and RoseTTAFold Diffusion which are new protein-synthesis methods. They use text-to-image diffusion models.
Listen to this story

OpenAI’s DALL.E 2 has been making it big with text-to-image models that easily generate pictures from textual descriptions. Earlier this week, two biotech labs—Generate Biomedicines and David Baker‘s Group—relied on generative AI, particularly diffusion models, to come up with new protein structures and, eventually, better drugs. 

Boston-based therapeutics company Generate Biomedicines announced a programme called Chroma which, according to the company, is the “DALL-E 2 of biology”. Similarly, biologist David Baker’s team from the University of Washington has also come up with RoseTTAFoldDiffusion. The model can build accurate designs for new proteins that can be brought to life in the lab. 

Why it matters? The AI generators can be used to create designs for proteins with particular characteristics, such as structure, size, or function, which enables the development of novel proteins that can perform specific tasks on demand. Once developed, this can be used to create/identify drugs, which help in regulating the basic health processes in living beings; for example, when we fall sick, proteins help us get better. The aim of protein synthesis by AI generators is to help biologists extend the ingredient list of natural proteins and make new medications on demand.

Although technology-backed protein designs are not new, they have mostly been outdated and time-consuming in working with large, complicated proteins, which are important in curing difficult diseases. Chroma and Baker’s methods are the very first full-fledged programmes that can build precise designs for a wide variety of proteins.

How did they do it? In Chroma, the text noise is introduced by separating the chains of amino acids that make up proteins. Chroma assembles these chains into a protein from a random group of them. For RoseTTAFold Diffusion, a second neural network is used to predict protein structure and provides information about how the parts of a protein fit together, and then it uses this information to direct the whole generating process.

Baker’s Group and Generate Biomedicines have created proteins with different degrees of symmetry, such as circular, triangular, or hexagonal proteins. Generate Biomedicines went a step ahead and designed proteins in the shapes of the 26 letters of the Latin alphabet and the numbers 0 to 10. Both groups are capable of creating new proteins and matching them to pre-existing structures.

To test whether Chroma produced designs that could be made into real medicines, Generate Biomedicines took the sequences for some of its designs—the amino acid strings that make up the protein—and ran them through another AI programme. The result showed that 55% of them would be predicted to fold into the structure generated by Chroma, which suggests that these are designs for viable proteins. Similarly, some of RoseTTAFold Diffusion’s designs were developed in the lab by Baker’s team. This created a novel protein that binds to the parathyroid hormone, which regulates blood calcium levels.

In 2021, Chinese biotech company Helixon developed Omegafold, which joined DeepMind’s AlphaFold, RoseTTAFold and ESMFold by Meta AI. So, the question remains why did Generate Biomedicines choose to implement RoseTTTAFold instead of the other open-sourced protein-prediction models that have better accuracy results. 

Earlier this year, Bengaluru-based algorithmic biologist Manoj Gopalakrishnan built Tapestry, a single-round quantitative method for extensive molecular testing that offers significant time and cost savings compared to conventional RTPCR tests.

It will be really interesting to see how this evolves in the near future, where life sciences and biotech companies are experimenting with protein prediction models alongside the image generation tools such as DALL.E 2 to develop new protein structures, in turn, helping develop better drugs and medical solutions. 

In an interaction with AIM, chief medical scientist at Microsoft Research, Junaid Bajwa, said that the journey from what the initial discovery is, to translating into real molecules, and taking those molecules into the real world use cases.  

While major big tech companies are focusing on developing protein prediction models, including the likes of Meta and Google-backed DeepMind, Microsoft seems to be more focused on the implementation side of things, where it has partnered with Novartis, Novo Nordisk and others to apply to the real-world scientific research advancements, focusing on the impact side of things. 

Download our Mobile App

Shritama Saha
Shritama is a technology journalist who is keen to learn about AI and analytics play. A graduate in mass communication, she is passionate to explore the influence of data science on fashion, drug development, films, and art.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.

Intel Goes All in on AI

Pat Gelsinger said, there are three types of chip manufacturers, “you’re big, you’re niche or you’re dead”