MITB Banner

Transformers Simplified: A Hands-On Intro To Text Classification Using Simple Transformers 

Share

In the past few years, we have seen tremendous improvements in the ability of machines to deal with Natural Language. We saw algorithms breaking the state-of-the-art one after the other on a variety of language-specific tasks, all thanks to transformers. In this article, we will discuss and implement transformers in the simplest way possible using a library called Simple Transformers.

The Seq2Seq Model

Before stepping into the transformers’ territory, let’s take a brief look at the Sequence-to-Sequence models.

The Sequence-to-Sequence model (seq2seq) converts a given sequence of text of fixed length into another sequence of fixed length, which we can easily relate to machine translation. But Seq2seq is not just limited to translation, in fact, it is quite efficient in tasks that require text generation.

The model uses an encoder-decoder architecture and has been very successful in machine translation and question answering tasks. It uses a stack of Long Short Term Memory(LSTM) networks or Gated Recurrent Units(GRU) in encoders and decoders.

Here is a simple demonstration of Seq2Seq model:

Image Source: A ten-minute introduction to sequence-to-sequence learning in Keras

One major drawback of the Seq2Seq model comes from the limitation of its underlying RNNs. Though LSTMs are meant to deal with long term dependencies between the word vectors, the performance drops as the distance increases. The model also restricts parallelization.

Transformer Architecture 

The transformer model introduces an architecture that is solely based on attention mechanism and does not use any Recurrent Networks but yet produces results superior in quality to Seq2Seq models.It addresses the long term dependency problem of the Seq2Seq model. The transformer architecture is also parallelizable and the training process is considerably faster.

Image Source: Attention Is All You Need

Let’s take a look at some of the important features :

Encoder: The encoder has 6 identical layers in which each layer consists of a multi-head self-attention mechanism and a fully connected feed-forward network. The multi-head attention system and feed-forward network both have a residual connection and a normalization layer. 

Decoder: The decoder also consists of 6 identical layers with an additional sublayer in each of the 6 layers. The additional sublayer performs multi-head attention over the output of the encoder stack.

Attention Mechanism

Attention is the mapping of a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The attention mechanism allows the model to understand the context of a text. 

  • Scaled Dot-Product Attention:
  • Multi-Head Attention:
Image Source: Attention Is All You Need

The transformer architecture is a breakthrough in the NLP spectrum, giving rise to many state-of-the-art algorithms such as Google’s BERT, RoBERTa, OpenGPT and many others.

Text Classification With Transformers

In this hands-on session, you will be introduced to Simple Transformers library. The library is built on top of the popular huggingface transformers library and consists of implementations of various transformer-based models and algorithms.

The library makes it effortless to implement various language modeling tasks such as Sequence Classification, Token Classification (NER), and Question Answering. 

So without further ado let’s get our hands dirty!

Introduction To Simple Transformers

The Simple Transformers library is made with the objective of making the implementation as simple as possible and it has quite achieved it. Transformers can now be used effortlessly with just a few lines of code. All credit goes to Simple Transformers — Multi-Class Text Classification with BERT, RoBERTa, XLNet, XLM, and DistilBERT and huggingface transformers.

Installing Simple Transformers

Type and execute the following command to install the simple transformers library.

!pip install simpletransformers

Creating A Classifier Model 

from simpletransformers.classification import ClassificationModel

#Create a ClassificationModel

model = ClassificationModel(model_type, model_name, number_of_labels, use_cuda = boolean)

  • model_type: This parameter can be one of  ‘bert’, ‘xlnet’, ‘xlm’, ‘roberta’, ‘distilbert’
  • model_name: All available model names can be found here.
  • number_of_labels: These are a number of unique labels or classes in the problem.
  • use_cuda: When set to true uses the CUDA framework for GPUs.

The ClassificationModel also has dict args which contains attributes for controlling the values of hyperparameters.The default argument list is given below :

Training The Model

The train_model method can be used to train the model. The method accepts a dataframe.

model.train_model(training_dataframe)

The method also saves checkpoints of the model to the path if specified using the dict args.

Evaluating The Classifier

The eval_model method evaluates the model on a validation set and returns the metrics, the outputs of the model as well as the wrong predictions.

result, model_outputs, wrong_predictions = model.eval_model(validation_dataframe)

Predicting 

The predict method returns predictions and row outputs that contains a value for each class in the predicted labels.

predictions, raw_outputs = model.predict(['input sentence']

Multi-Class Classification With Simple Transformers

Now we will use the transformers to solve MachineHacks Predict The News Category Hackathon. For this, head over to MachineHack, sign up and start the course to download the datasets.

Share
Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.