MITB Banner

Why Transformers Play A Crucial Role In NLP Development

Share

Recent advances in modern Natural Language Processing (NLP) research have been dominated by the combination of Transfer Learning methods with large-scale Transformer language models.

Creating these general-purpose models remains an expensive and time-consuming process restricting the use of these methods to a small subset of the wider NLP community. With Transformers, came a paradigm shift in NLP with the starting point for training a model on a downstream task moving from a blank specific model to a general-purpose pre-trained architecture. 

How Transformers Took Over From Other Architectures

In NLU, there are challenges like similar sounding words that can be given higher scores during training from the corpus. For example, the word ‘wound’ can be used for indicating an injury or wrapping up of something. The chances are that homonyms such as these will be given higher scores for their ambiguity and the weights that are used to calculate the weighted average will give a different representation of the same word.

via Google AI

A Transformer network applies self-attention mechanism which scans through every word and appends attention scores(weights) to the words. The Transformer was introduced as a simple network architecture, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

Neural networks usually process language by generating fixed-or-variable-length vector-space representations. After starting with representations of individual words or even pieces of words, they aggregate information from surrounding words to determine the meaning of a given bit of language in context.

Though RNNs have in recent years become the typical network architecture for translation, processing language sequentially, their sequential nature makes it difficult to harness parallel processing units like TPUs fully. Convolutional neural networks (CNNs), on the other hand, though less sequential, take a relatively large number of steps to combine information.

Whereas, the output of the transformer network, which also happens to be the final hidden state is taken as the first token for the input and the probability of selecting a random label is calculated using standard softmax function. 

The same formula is used for the end of the answer span where the maximum scoring span is used as the prediction.

Top NLP Models Using Transformers

  • BERT, or Bidirectional Encoder Representations from Transformers, set new benchmarks for NLP when it was introduced by Google late last year. This novel model is a new method of pre-training language representations which obtained state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. 
  • XLNet is another new unsupervised language representation learning method based on a novel generalised permutation language modelling objective. XLNet employed Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieved state-of-the-art (SOTA) results on various downstream language tasks, including question answering, natural language inference, sentiment analysis, and document ranking.
  • Distil* is a class of compressed models that started with DistilBERT. DistilBERT stands for Distillated-BERT. DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture.

BERT itself has paved the way to newer models. Since state-of-the-art models are mostly based on BERT and BERT is formulated on transformer architecture, we can safely assume that the Transformer model has taken the throne for natural language understanding.

This was made possible because of the Transformer allowed for significantly more parallelisation and reached a new state of the art in translation quality.

Beyond computational performance and higher accuracy, Transformer also enabled visualisation of what other parts of a sentence the network attends to when processing or translating a given word, thus gaining insights into how information travels through the network.

It Has Its Own Library Now

In what can be exciting news to the machine learning community, for the developers in NLP domain especially, the team at Huggingface had released a library called Transformers. 

This library now provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, CTRL…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pre-trained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.

As NLP becomes a key aspect of AI, the democratisation of the Transformers in the form of a library will open more doors to the up and coming researchers. As the state-of-the-art pre-trained models like BERT and GPT-2 can be accessed without having to build it from scratch, entry-level practitioners can now focus on their target idea rather than reinventing the wheel. 

PS: The story was written using a keyboard.
Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India