MITB Banner

The Enformer vs the Basenji – The AI algorithms for gene expression predictions

Enformer, a genetic research tool based on Transformers, advances genetic research by predicting how DNA sequences influence gene expression.

Share

The Enformer vs the Basenji - The AI Algorithms for gene expression predictions

Illustration by The Enformer vs the Basenji - The AI Algorithms for gene expression predictions

DeepMind and Alphabet at Calico introduced a neural network architecture called Enformer that greatly improved the accuracy of predicting gene expression based on DNA sequence. 

In the paper “Effective gene expression prediction from sequence by integrating long-range interactions” published in Nature Methods, DeepMind suggested that Enformer is more accurate than Basenji.

Basenji2 and limitations

The basic building blocks of gene expression have typically been convolutional neural networks. They have, however, been limited in their ability and effectiveness to model due to the effects of distal enhancers on gene expression. 

So Deepmind depends on Basenji2, built on TensorFlow, which offers a variety of benefits, including distributed computing, a large and adaptive developer community, and is designed to predict quantitative signals using regression loss functions, rather than binary signals using classification loss functions.

The best part of Basenji is that it could predict the regulatory activity of 40,000 base pair DNA sequences at a time. 

Enformer’s advances include

Enformer, on the other hand, relies on a technique common to natural language processing from Google called Transformers to take into account self-attention mechanisms that would be able to integrate much more DNA context. As Transformers can read long text passages, DeepMind modified them to read DNA sequences of vastly extended length. 

Enformer outperformed the best team on the critical assessment of genome interpretation challenge (CAGI5) for noncoding variant interpretation despite no additional training. Furthermore, Enformer learned to predict promoter-enhancer interactions directly from DNA sequences, competing with methods that took direct experimental data as input.

In the case of training, DeepMind used Sonnet to construct neural networks used for many different purposes. It is defined in enformer.py.

DeepMind pre-computed variant effect scores for all frequent variants (MAF>0.5%, in any population) and stored them in HDF5 files per chromosome for the HG19 reference genome under the 1000 genomes project. Additionally, they provide the top 20 principal components of variant-effect scores per chromosome in a tabix-indexed TSV file (HG19 reference genome). These files have the following columns:

  • #CHROM – chromosome (chr1)
  • POS – variant position (1-based)
  • ID – dbSNP identifier
  • REF – reference allele (e.g. A)
  • ALT – alternate allele (e.g. T)
  • PC{i} – i-th principal component of the variant effect prediction.

Hopefully, these advances will enable better mapping of growing human disease associations to cell-type-specific gene regulatory mechanisms and provide a framework to understand how cis-regulatory evolution works.

Share
Picture of Sohini Das

Sohini Das

Sohini graduated from the University of Kalyani with a master's degree in nanosciences and nanotechnology. She hopes to become a tech journalist one day. Her work focuses on digital transformation, geopolitics, and emerging technologies.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India