MITB Banner

How To Build A BERT Classifier Model With TensorFlow 2.0

Share

BERT is one of the most popular algorithms in the NLP spectrum known for producing state-of-the-art results in a variety of language modeling tasks. Built on top of transformers and seq-to-sequence models, the Bidirectional Encoder Representations from Transformers is a very powerful NLP model that has outperformed many.

What Is The Big Deal About BERT?

The state-of-the-art results that it produces on a variety of language-specific tasks are enough to show that it is indeed a big deal. The results come from its underlying architecture which uses breakthrough techniques such as seq2seq (sequence-to-sequence) models and transformers. The seq2seq model is a network that converts a given sequence of words into a different sequence and is capable of relating the words that seem more important.

Here, the LSTM network is a good example of the seq2seq model. The transformer architecture is also responsible for transforming a sequence into another, but without depending on any Recurrent Networks such as  LSTMs or GRUs. Being Bi-Directional, the model is also able to assume the context of a written text and predict accordingly.

Inspired By BERT

The success of  BERT has not only made it the power behind the top search engine known to mankind but also has inspired and paved the way for many new and better models. Given below are some of the popular NLP models and algorithms which were inspired by BERT:

ALBERT: A Lite BERT(ALBERT) incorporates techniques such as factorised embedding parameterisation and cross-layer parameter sharing for parameter reduction which helps in scaling the pre-trained models.

RoBERTa: Robustly optimised BERT is an optimised method for pretraining NLP systems which are built on BERT’s language-masking strategy. The model is claimed to have surpassed the BERT-large as well as XLNet-large models in performance. 

ViLBERT: Vision-and-Language BERT is built to learn task-agnostic joint representations of image content as well as natural language. The model includes two parallel BERT-style models which are mainly operating over image regions and text segments.

MT-DNN: Multi-Task Deep Neural Network uses Google’s BERT to achieve new state-of-the-art results  The model is a combination of multi-task learning and language model pre-training.

SenseBERT: The method uses Self-supervision which is an unsupervised learning technique as the name implies it supervises itself.

In one of our previous articles, we learned how to solve a Multi-Class classification problem using BERT and achieve great results. We did this using TensorFlow 1.15.0. and today we will upgrade our TensorFlow to version 2.0 and we will build a BERT Model using KERAS API for a simple classification problem.

We will use the bert-for-tf2 library which you can find here. The following example was inspired by Simple BERT using TensorFlow2.0.

Lets Code!

Importing TensorFlow2.0

Let’s start by importing TensorFlow2.0. The following example is done using Google Colab.

try:

  %tensorflow_version 2.x  #gpu

except Exception:

  pass

import TensorFlowas tf

Installing Necessary Modules

To install the bert-for-tf2 module, type and execute the following command.

!pip install bert-for-tf2

We will also install a dependency module called sentencepiece by executing the following command:

!pip install sentencepiece

Importing Necessary Modules

import tensorflow_hub as hub

from tensorflow.keras.models import Model

from bert.tokenization.bert_tokenization import FullTokenizer

Fetching The BERT Model From TensorFlowHub

We will now fetch the actual BERT model from TensorFlowHub

bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",trainable=True)

Data Preparation

The following is an example of data preprocessing for BERT. The code block transforms a piece of text into a BERT acceptable form. For detailed preprocessing check out the Step By Step Guide To Implement Multi-Class Classification With BERT & Tensorflow.

Let’s test it out if the preprocessor is working properly-

We will use an example from MachineHack’s Predict The News Category Hackathon.

Output:

We can now build a Keras model for binary classification and train it using a training set. For more details on preparing the dataset for training and validation, check out the Step By Step Guide To Implement Multi-Class Classification With BERT & Tensorflow

PS: The story was written using a keyboard.
Share
Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India