How To Perfect Neural Machine Translation With Generative Networks

Cross-linguistic references can be understood by humans based on context. Words learnt in one language like ‘mother’ or ‘maa’ can be associated immediately when heard in another language. Now, attempts have been made to induce this trait in machines and the current machine learning models have employed transfer learning techniques to make smart translations.

In an attempt to explore new avenues in machine translation and making the existing ones better, a team of Google researchers released a study on how to improve the robustness of neural machine translation (NMT) models.


Sign up for your weekly dose of what's up in emerging technology.

NMT models have become popular over the past couple of years. For example, OpenNMT which is an open-source deep learning framework built on principles of neural machine translation supports applications like speech recognition, sequence tagging and other language modelling tasks.

These NMT models, however, can be sensitive to minor disturbances in the input, which can lead to errors, such as under-translation, over-translation or mistranslation. 

Download our Mobile App

For example, the authors demonstrate how a given German sentence, the state-of-the-art NMT model, Transformer, will yield a correct translation.

“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die geladenen Zeugen weiterhin weigern sollten, eine Aussage zu machen.” 

(Machine translation to English: “The spokesman of the Committee of Inquiry has announced that if the witnesses summoned continue to refuse to testify, he will be brought to court.”),

But, when a subtle change is applied to the input sentence, say from geladenen to the synonym vorgeladenen, the translation becomes very different (and in this case, incorrect):

“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die vorgeladenen Zeugen weiterhin weigern sollten, eine Aussage zu machen.” 

(Machine translation to English: “The investigative committee has announced that he will be brought to justice if the witnesses who have been invited continue to refuse to testify.”).

The researchers in their paper, write that an ideal NMT model would generate similar translations for separate inputs that exhibit small differences. The idea behind this approach is to induce enough noise into a translation model with adversarial inputs so that the model in its resistance to these disturbances, ends up improving altogether.

To do this, the authors introduce the Adversarial Generation (AdvGen), an algorithm that generates plausible adversarial examples for disturbing the model and then backchannel the feedback into the model for defensive training.

Model Training With AdvGen

The idea behind AdvGen method is inspired by generative adversarial networks (GANs), however, it does not rely on a discriminator network. AdvGen applies the adversarial example in training to ensure diversity in the training set. 

As shown above, a transformer model is applied to an input sentence and, in conjunction with the target output sentence and target input sentence, calculate the loss in translation.

This loss is given as an input to the AdvGen function along with the source sentence to construct an adversarial source example.

The words selected by the model in such a way that they are more likely to introduce errors in Transformer output. The generated adversarial sentence, then, is fed back into the Transformer, initiating the defence stage. 

The researchers claim that there is a notable improvement of 2.8 and 1.6 BLEU points, respectively, compared to the competitive Transformer model, achieving a new state-of-the-art performance.

Experimental results on Chinese-English and English-German translation tasks demonstrate the capability of our approach to improving both the translation performance and the robustness.

Future Direction

There has been a growth in the usage of voice-controlled devices over the past couple of years. Be it Amazon’s Alexa-enabled devices or Google Home, they run high on ML models in the background. Thes devices train themselves on new words spoken in new contexts and try to understand the demands of the user no matter how vague the command is. 

When these devices are deployed in new locations where a new language is spoken then portability becomes a challenge. Novel experiments such as AdvGen serve as a robust building block for improving downstream tasks in the future, especially when they are sensitive or intolerant to imperfect translation input.

Know more about this work here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 10th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Top BI tools for Mainframes

Without BI, organisations will not be able to dominate with data-driven decision-making but focus on experiences, intuition, and gut feelings.