Active Hackathon

Guide to Google’s Tensor2Tensor for Neural Machine Translation

Tensor2Tensor based Transformer built with self-attention layers becomesstate-of-the-art model in Neural Machine Translation

Tensor2Tensor, shortly known as T2T, is a library of pre-configured deep learning models and datasets. The Google Brain team has developed it to do deep learning research faster and more accessible. It uses TensorFlow throughout and aims to improve performance and usability strongly. Models can be trained on any of the CPU, single GPU, multiple GPU and TPU either locally or in the cloud. Tensor2Tensor models need minimal or zero configuration or device-specific code. It provides support for well-acclaimed models and datasets across different media platforms such as images, videos, text and audio. However, Tensor2Tensor demonstrates outstanding performance in Neural Machine Translation (NMT) with a huge collection of pre-trained and pre-configured models and NMT datasets.

Neural Machine Translation has a long history and is still in progress with a variety of emerging approaches. Neural Machine Translation found its great success using the recurrent neural networks employed with LSTM cells. Since the input sequence to the recurrent neural network must be encoded to a fixed-length vector, it showed poor quality results in translating long sentences. This issue was partially overcome by models with ensemble or stack of gated convolutional networks and recurrent neural networks. Tensor2Tensor based Transformer architecture built with stacked self-attention layers becomes the new state-of-the-art model in Neural Machine Translation with drastically reduced training cost and remarkably improved BLEU score. This architecture has been introduced by Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit of Google Brain and Nal Kalchbrenner of DeepMind.


Sign up for your weekly dose of what's up in emerging technology.

Unlike RNN models, Tensor2Tensor based Transformer has no fixed-sized bottleneck problem. Each time step in this architecture has direct access to the full history of the sequence of inputs enabled by the self-attention mechanism. Self-attention mechanism is known to be a powerful tool in modeling sequential data. It enables high speed training as well as maintaining distance-temporal relationships even during translation of long sequences. The transformer Neural Machine Translation model is composed of two parts: an encoder and a decoder. The encoder and decoder parts are built with stacks of multi-head self-attention layers and fully connected feed forward network layers. 

Tensor2Tensor Transformer Architecture

Methodology of Tensor2Tensor

Tensor2Tensor comprises five key components for the training run. They are:

  1. Datasets
  2. Device Configuration
  3. Hyperparameters
  4. Model
  5. Estimator and Experiment

Datasets are encapsulated into an input pipeline through the ‘Problem’ class. These classes are responsible for supply of preprocessed data for training and evaluation. Device configurations such as type of processor (CPU, GPU, TPU), number of devices, synchronization mode, and devices’ location are specified. Hyperparameters that instantiate the model and training procedure must be specified along with codes to be reproduced or shared. Model ties together the architecture, datasets, device configurations and hyperparameters to generate the necessary target by controlling losses, evaluation metrics and optimisation. Estimator and Experiment are the classes that handle training in loops, creating checkpoints, logging and enabling evaluation. With the predefined and established approach, Tensor2Tensor achieves greater performance in multiple media platforms.

Python Implementation

Tensor2Tensor is installed using the command

!pip install tensor2tensor

The Tensor2Tensor based Transformer can simply be called and run to perform Neural Machine Translation with predefined setup using the following commands. It can be noted that the code auto-configures itself based on the available configuration settings such as device type, the number of devices and so on. The following commands fetch the data, train and evaluate the transformer model, and test the model by translating a few text lines from a predefined file. It should be noted that training may take hours to days based on the user’s configuration.

 # See what problems, models, and hyperparameter sets are available.
 # You can easily swap between them (and add new ones).
 t2t-trainer --registry_help

The following codes fetch the data from the English-to-German translation task the input data pipeline.

 # Generate data
 t2t-datagen \
   --data_dir=$DATA_DIR \
   --tmp_dir=$TMP_DIR \

The following codes let the model train on the defined dataset, evaluate internally.

 # Train
 # If you run out of memory, add --hparams='batch_size=1024'.
 t2t-trainer \
   --data_dir=$DATA_DIR \
   --problem=$PROBLEM \
   --model=$MODEL \
   --hparams_set=$HPARAMS \
 # Decode
 echo "Hello world" >> $DECODE_FILE
 echo "Goodbye world" >> $DECODE_FILE
 echo -e 'Hallo Welt\nAuf Wiedersehen Welt' >
 t2t-decoder \
   --data_dir=$DATA_DIR \
   --problem=$PROBLEM \
   --model=$MODEL \
   --hparams_set=$HPARAMS \
   --output_dir=$TRAIN_DIR \
   --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
   --decode_from_file=$DECODE_FILE \

The following codes enable user to sample check the translation performance on an unseen text

 # See the translations
 cat translation.en 

Finally, BLUE score can be calculated to evaluate the model with global standards

 # Evaluate the BLEU score
 t2t-bleu --translation=translation.en 

As an alternative to Colab, Tensor2Tensor models can be easily run on cloud based FloydHub workspaces as it is preinstalled with Tensor2Tensor, highly supporting configured on-the-go pre-trained models.

Performance evaluation of Tensor2Tensor Transformer

Tensor2Tensor based Transformer exhibits great performance in respect of syntactic and semantic considerations in Neural Machine Translation. It shows much greater computational efficiency compared to Recurrent Neural Networks with reduced computational time and memory. Tensor2Tensor enables interpretation of language models with self-attention by visualizing the attention distribution. This architecture is evaluated using WMT 2014 Translation task

On the WMT 2014 English-to-French translation task, the Tensor2Tensor based Transformer model achieves a state-of-the-art BLEU score of 41.8, outperforming all of the previously published single models, at less than 1/4 the training cost of the previous state-of-the-art model.

On the WMT 2014 English-to-German translation task, the Tensor2Tensor based Transformer model achieves a state-of-the-art BLEU score of 28.4, outperforming all of the previously published single models and ensembles, at a fraction of the training cost of the previous state-of-the-art model. 

Further reading:

More Great AIM Stories

Rajkumar Lakshmanamoorthy
A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM