Listen to this story
|
In July 2022, the Meta AI protein team released the Evolutionary Scale Modelling, or ESM, for protein structure prediction. The model has been trained on 15 billion parameters—one of the largest language models of proteins evaluated to date.
Check out the code for ESMFold here.
The team claims that ESMFold has been trained to predict full atomic protein structure directly from language model representations of a single sequence.
We have trained ESMFold to predict full atomic protein structure directly from language model representations of a single sequence. Accuracy is competitive with AlphaFold on most proteins with order of magnitude faster inference. By @MetaAI Protein Team.
— Alex Rives (@alexrives) July 21, 2022
https://t.co/APVoaawyOb pic.twitter.com/f6DvSfjuOX
Over the years, the team has launched several models, with its latest model being released to the public. According to Meta AI, AlphaFold 2 and RoseTTAFold have similar accuracy, but ESMFold inference is faster in enabling the exploration of structural spaces of metagenomic proteins.
The team said that the improvements in language modelling perplexity and learning of structure continue through 15 billion parameters. Further, they said that its latest model, ESM2 at 150 million parameters, is better than its older model ESM1b at 650 million parameters.
ESMFold—How it works
Meta AI team said as ESMFold processes a protein sequence, an image of the protein’s structure materialises in its internal states, which then enables atomic resolution prediction of the three-dimensional structure—even though the language model was only trained on sequences.
Amazing. We did see this also come up in ProGen – Large language models captured 3d structure through its attention.https://t.co/0oK7coEMV6
— Richard Socher (@RichardSocher) July 24, 2022
Further, the team said that there are billions of protein sequences with unknown structures and functions—many from metagenomic sequencing. The latest model makes it possible to map this structural space in practical timescales. “We were able to fold a random sample of 1 million metagenomic sequences in a few hours,” claimed Meta AI researchers.
The team believes that ESMFold can help to understand regions of protein space that are distant from existing knowledge.