Active Hackathon

How To Build A BERT Classifier Model With TensorFlow 2.0

BERT is one of the most popular algorithms in the NLP spectrum known for producing state-of-the-art results in a variety of language modeling tasks. Built on top of transformers and seq-to-sequence models, the Bidirectional Encoder Representations from Transformers is a very powerful NLP model that has outperformed many.

What Is The Big Deal About BERT?

The state-of-the-art results that it produces on a variety of language-specific tasks are enough to show that it is indeed a big deal. The results come from its underlying architecture which uses breakthrough techniques such as seq2seq (sequence-to-sequence) models and transformers. The seq2seq model is a network that converts a given sequence of words into a different sequence and is capable of relating the words that seem more important.


Sign up for your weekly dose of what's up in emerging technology.

Here, the LSTM network is a good example of the seq2seq model. The transformer architecture is also responsible for transforming a sequence into another, but without depending on any Recurrent Networks such as  LSTMs or GRUs. Being Bi-Directional, the model is also able to assume the context of a written text and predict accordingly.

Inspired By BERT

The success of  BERT has not only made it the power behind the top search engine known to mankind but also has inspired and paved the way for many new and better models. Given below are some of the popular NLP models and algorithms which were inspired by BERT:

ALBERT: A Lite BERT(ALBERT) incorporates techniques such as factorised embedding parameterisation and cross-layer parameter sharing for parameter reduction which helps in scaling the pre-trained models.

RoBERTa: Robustly optimised BERT is an optimised method for pretraining NLP systems which are built on BERT’s language-masking strategy. The model is claimed to have surpassed the BERT-large as well as XLNet-large models in performance. 

ViLBERT: Vision-and-Language BERT is built to learn task-agnostic joint representations of image content as well as natural language. The model includes two parallel BERT-style models which are mainly operating over image regions and text segments.

MT-DNN: Multi-Task Deep Neural Network uses Google’s BERT to achieve new state-of-the-art results  The model is a combination of multi-task learning and language model pre-training.

SenseBERT: The method uses Self-supervision which is an unsupervised learning technique as the name implies it supervises itself.

In one of our previous articles, we learned how to solve a Multi-Class classification problem using BERT and achieve great results. We did this using TensorFlow 1.15.0. and today we will upgrade our TensorFlow to version 2.0 and we will build a BERT Model using KERAS API for a simple classification problem.

We will use the bert-for-tf2 library which you can find here. The following example was inspired by Simple BERT using TensorFlow2.0.

Lets Code!

Importing TensorFlow2.0

Let’s start by importing TensorFlow2.0. The following example is done using Google Colab.


  %tensorflow_version 2.x  #gpu

except Exception:


import TensorFlowas tf

Installing Necessary Modules

To install the bert-for-tf2 module, type and execute the following command.

!pip install bert-for-tf2

We will also install a dependency module called sentencepiece by executing the following command:

!pip install sentencepiece

Importing Necessary Modules

import tensorflow_hub as hub

from tensorflow.keras.models import Model

from bert.tokenization.bert_tokenization import FullTokenizer

Fetching The BERT Model From TensorFlowHub

We will now fetch the actual BERT model from TensorFlowHub

bert_layer = hub.KerasLayer("",trainable=True)

Data Preparation

The following is an example of data preprocessing for BERT. The code block transforms a piece of text into a BERT acceptable form. For detailed preprocessing check out the Step By Step Guide To Implement Multi-Class Classification With BERT & Tensorflow.

Let’s test it out if the preprocessor is working properly-

We will use an example from MachineHack’s Predict The News Category Hackathon.


We can now build a Keras model for binary classification and train it using a training set. For more details on preparing the dataset for training and validation, check out the Step By Step Guide To Implement Multi-Class Classification With BERT & Tensorflow

More Great AIM Stories

Amal Nair
A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact:

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

How does the Indian Army want to use AI?

An AI system that can collect data, analyse them and present the same to the commander in a very short time frame is one of the key requirements for the Indian Army

How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?