MITB Banner

Watch More

The Enformer vs the Basenji – The AI algorithms for gene expression predictions

Enformer, a genetic research tool based on Transformers, advances genetic research by predicting how DNA sequences influence gene expression.
The Enformer vs the Basenji - The AI Algorithms for gene expression predictions

Design by The Enformer vs the Basenji - The AI Algorithms for gene expression predictions

DeepMind and Alphabet at Calico introduced a neural network architecture called Enformer that greatly improved the accuracy of predicting gene expression based on DNA sequence. 

In the paper “Effective gene expression prediction from sequence by integrating long-range interactions” published in Nature Methods, DeepMind suggested that Enformer is more accurate than Basenji.

Basenji2 and limitations

The basic building blocks of gene expression have typically been convolutional neural networks. They have, however, been limited in their ability and effectiveness to model due to the effects of distal enhancers on gene expression. 

So Deepmind depends on Basenji2, built on TensorFlow, which offers a variety of benefits, including distributed computing, a large and adaptive developer community, and is designed to predict quantitative signals using regression loss functions, rather than binary signals using classification loss functions.

The best part of Basenji is that it could predict the regulatory activity of 40,000 base pair DNA sequences at a time. 

Enformer’s advances include

Enformer, on the other hand, relies on a technique common to natural language processing from Google called Transformers to take into account self-attention mechanisms that would be able to integrate much more DNA context. As Transformers can read long text passages, DeepMind modified them to read DNA sequences of vastly extended length. 

Enformer outperformed the best team on the critical assessment of genome interpretation challenge (CAGI5) for noncoding variant interpretation despite no additional training. Furthermore, Enformer learned to predict promoter-enhancer interactions directly from DNA sequences, competing with methods that took direct experimental data as input.

In the case of training, DeepMind used Sonnet to construct neural networks used for many different purposes. It is defined in enformer.py.

DeepMind pre-computed variant effect scores for all frequent variants (MAF>0.5%, in any population) and stored them in HDF5 files per chromosome for the HG19 reference genome under the 1000 genomes project. Additionally, they provide the top 20 principal components of variant-effect scores per chromosome in a tabix-indexed TSV file (HG19 reference genome). These files have the following columns:

  • #CHROM – chromosome (chr1)
  • POS – variant position (1-based)
  • ID – dbSNP identifier
  • REF – reference allele (e.g. A)
  • ALT – alternate allele (e.g. T)
  • PC{i} – i-th principal component of the variant effect prediction.

Hopefully, these advances will enable better mapping of growing human disease associations to cell-type-specific gene regulatory mechanisms and provide a framework to understand how cis-regulatory evolution works.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Sohini Das

Sohini Das

Sohini graduated from the University of Kalyani with a master's degree in nanosciences and nanotechnology. She hopes to become a tech journalist one day. Her work focuses on digital transformation, geopolitics, and emerging technologies.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories