Now Reading
Transformers Simplified: A Hands-On Intro To Text Classification Using Simple Transformers 

Transformers Simplified: A Hands-On Intro To Text Classification Using Simple Transformers 

Amal Nair

In the past few years, we have seen tremendous improvements in the ability of machines to deal with Natural Language. We saw algorithms breaking the state-of-the-art one after the other on a variety of language-specific tasks, all thanks to transformers. In this article, we will discuss and implement transformers in the simplest way possible using a library called Simple Transformers.

The Seq2Seq Model

Before stepping into the transformers’ territory, let’s take a brief look at the Sequence-to-Sequence models.

The Sequence-to-Sequence model (seq2seq) converts a given sequence of text of fixed length into another sequence of fixed length, which we can easily relate to machine translation. But Seq2seq is not just limited to translation, in fact, it is quite efficient in tasks that require text generation.

The model uses an encoder-decoder architecture and has been very successful in machine translation and question answering tasks. It uses a stack of Long Short Term Memory(LSTM) networks or Gated Recurrent Units(GRU) in encoders and decoders.

Here is a simple demonstration of Seq2Seq model:

Image Source: A ten-minute introduction to sequence-to-sequence learning in Keras

One major drawback of the Seq2Seq model comes from the limitation of its underlying RNNs. Though LSTMs are meant to deal with long term dependencies between the word vectors, the performance drops as the distance increases. The model also restricts parallelization.

Transformer Architecture 

The transformer model introduces an architecture that is solely based on attention mechanism and does not use any Recurrent Networks but yet produces results superior in quality to Seq2Seq models.It addresses the long term dependency problem of the Seq2Seq model. The transformer architecture is also parallelizable and the training process is considerably faster.

Image Source: Attention Is All You Need

Let’s take a look at some of the important features :

Encoder: The encoder has 6 identical layers in which each layer consists of a multi-head self-attention mechanism and a fully connected feed-forward network. The multi-head attention system and feed-forward network both have a residual connection and a normalization layer. 

Decoder: The decoder also consists of 6 identical layers with an additional sublayer in each of the 6 layers. The additional sublayer performs multi-head attention over the output of the encoder stack.

Attention Mechanism

Attention is the mapping of a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The attention mechanism allows the model to understand the context of a text. 

  • Scaled Dot-Product Attention:
  • Multi-Head Attention:
Image Source: Attention Is All You Need

The transformer architecture is a breakthrough in the NLP spectrum, giving rise to many state-of-the-art algorithms such as Google’s BERT, RoBERTa, OpenGPT and many others.

Text Classification With Transformers

In this hands-on session, you will be introduced to Simple Transformers library. The library is built on top of the popular huggingface transformers library and consists of implementations of various transformer-based models and algorithms.

The library makes it effortless to implement various language modeling tasks such as Sequence Classification, Token Classification (NER), and Question Answering. 

So without further ado let’s get our hands dirty!

Introduction To Simple Transformers

The Simple Transformers library is made with the objective of making the implementation as simple as possible and it has quite achieved it. Transformers can now be used effortlessly with just a few lines of code. All credit goes to Simple Transformers — Multi-Class Text Classification with BERT, RoBERTa, XLNet, XLM, and DistilBERT and huggingface transformers.

Installing Simple Transformers

Type and execute the following command to install the simple transformers library.

!pip install simpletransformers

Creating A Classifier Model 

from simpletransformers.classification import ClassificationModel

#Create a ClassificationModel

See Also

model = ClassificationModel(model_type, model_name, number_of_labels, use_cuda = boolean)

  • model_type: This parameter can be one of  ‘bert’, ‘xlnet’, ‘xlm’, ‘roberta’, ‘distilbert’
  • model_name: All available model names can be found here.
  • number_of_labels: These are a number of unique labels or classes in the problem.
  • use_cuda: When set to true uses the CUDA framework for GPUs.

The ClassificationModel also has dict args which contains attributes for controlling the values of hyperparameters.The default argument list is given below :

Training The Model

The train_model method can be used to train the model. The method accepts a dataframe.


The method also saves checkpoints of the model to the path if specified using the dict args.

Evaluating The Classifier

The eval_model method evaluates the model on a validation set and returns the metrics, the outputs of the model as well as the wrong predictions.

result, model_outputs, wrong_predictions = model.eval_model(validation_dataframe)


The predict method returns predictions and row outputs that contains a value for each class in the predicted labels.

predictions, raw_outputs = model.predict(['input sentence']

Multi-Class Classification With Simple Transformers

Now we will use the transformers to solve MachineHacks Predict The News Category Hackathon. For this, head over to MachineHack, sign up and start the course to download the datasets.

What Do You Think?

If you loved this story, do join our Telegram Community.

Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top