Listen to this story
In July 2022, the Meta AI protein team released the Evolutionary Scale Modelling, or ESM, for protein structure prediction. The model has been trained on 15 billion parameters—one of the largest language models of proteins evaluated to date.
Check out the code for ESMFold here.
The team claims that ESMFold has been trained to predict full atomic protein structure directly from language model representations of a single sequence.
Over the years, the team has launched several models, with its latest model being released to the public. According to Meta AI, AlphaFold 2 and RoseTTAFold have similar accuracy, but ESMFold inference is faster in enabling the exploration of structural spaces of metagenomic proteins.
The team said that the improvements in language modelling perplexity and learning of structure continue through 15 billion parameters. Further, they said that its latest model, ESM2 at 150 million parameters, is better than its older model ESM1b at 650 million parameters.
ESMFold—How it works
Meta AI team said as ESMFold processes a protein sequence, an image of the protein’s structure materialises in its internal states, which then enables atomic resolution prediction of the three-dimensional structure—even though the language model was only trained on sequences.
Further, the team said that there are billions of protein sequences with unknown structures and functions—many from metagenomic sequencing. The latest model makes it possible to map this structural space in practical timescales. “We were able to fold a random sample of 1 million metagenomic sequences in a few hours,” claimed Meta AI researchers.
The team believes that ESMFold can help to understand regions of protein space that are distant from existing knowledge.