How To Perfect Neural Machine Translation With Generative Networks

Cross-linguistic references can be understood by humans based on context. Words learnt in one language like ‘mother’ or ‘maa’ can be associated immediately when heard in another language. Now, attempts have been made to induce this trait in machines and the current machine learning models have employed transfer learning techniques to make smart translations.

In an attempt to explore new avenues in machine translation and making the existing ones better, a team of Google researchers released a study on how to improve the robustness of neural machine translation (NMT) models.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

NMT models have become popular over the past couple of years. For example, OpenNMT which is an open-source deep learning framework built on principles of neural machine translation supports applications like speech recognition, sequence tagging and other language modelling tasks.

These NMT models, however, can be sensitive to minor disturbances in the input, which can lead to errors, such as under-translation, over-translation or mistranslation. 

For example, the authors demonstrate how a given German sentence, the state-of-the-art NMT model, Transformer, will yield a correct translation.

“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die geladenen Zeugen weiterhin weigern sollten, eine Aussage zu machen.” 

(Machine translation to English: “The spokesman of the Committee of Inquiry has announced that if the witnesses summoned continue to refuse to testify, he will be brought to court.”),

But, when a subtle change is applied to the input sentence, say from geladenen to the synonym vorgeladenen, the translation becomes very different (and in this case, incorrect):

“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die vorgeladenen Zeugen weiterhin weigern sollten, eine Aussage zu machen.” 

(Machine translation to English: “The investigative committee has announced that he will be brought to justice if the witnesses who have been invited continue to refuse to testify.”).

The researchers in their paper, write that an ideal NMT model would generate similar translations for separate inputs that exhibit small differences. The idea behind this approach is to induce enough noise into a translation model with adversarial inputs so that the model in its resistance to these disturbances, ends up improving altogether.

To do this, the authors introduce the Adversarial Generation (AdvGen), an algorithm that generates plausible adversarial examples for disturbing the model and then backchannel the feedback into the model for defensive training.

Model Training With AdvGen

The idea behind AdvGen method is inspired by generative adversarial networks (GANs), however, it does not rely on a discriminator network. AdvGen applies the adversarial example in training to ensure diversity in the training set. 

As shown above, a transformer model is applied to an input sentence and, in conjunction with the target output sentence and target input sentence, calculate the loss in translation.

This loss is given as an input to the AdvGen function along with the source sentence to construct an adversarial source example.

The words selected by the model in such a way that they are more likely to introduce errors in Transformer output. The generated adversarial sentence, then, is fed back into the Transformer, initiating the defence stage. 

The researchers claim that there is a notable improvement of 2.8 and 1.6 BLEU points, respectively, compared to the competitive Transformer model, achieving a new state-of-the-art performance.

Experimental results on Chinese-English and English-German translation tasks demonstrate the capability of our approach to improving both the translation performance and the robustness.

Future Direction

There has been a growth in the usage of voice-controlled devices over the past couple of years. Be it Amazon’s Alexa-enabled devices or Google Home, they run high on ML models in the background. Thes devices train themselves on new words spoken in new contexts and try to understand the demands of the user no matter how vague the command is. 

When these devices are deployed in new locations where a new language is spoken then portability becomes a challenge. Novel experiments such as AdvGen serve as a robust building block for improving downstream tasks in the future, especially when they are sensitive or intolerant to imperfect translation input.

Know more about this work here.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.