Netflix Is Using AI For Its Subtitles

As Netflix continues to expand its global reach, the viewership has become quite global. The audience is totally fine with watching a TV show that is shot in a foreign language. Netflix has somehow succeeded to break the language barrier, but the challenge now lies in making sure that the translations are accurate.

Netflix has a rigorous process in place to discard inaccurate subtitles, but that still doesn’t help. So, a group of ML researchers at Netflix have introduced a new approach to tackle this. They call this approach Automatic Pre-Processing or APP. This process, the researchers claim, will give translations that are close to those of the native language.

Overview Of The Model

Translation quality for low-resource translation (i.e., from English into a low resource language) in the black-box MT (BBMT) setting is challenging. So, the researchers at Netflix introduced a method to improve such systems via automatic pre-processing (APP) using sentence simplification.


Sign up for your weekly dose of what's up in emerging technology.

For example, if a source sentence says “The vice president should feel free to jump in,” and is to be translated into Hindi, using Google Translate one will get “Vice President should feel free to jump inside.” The system, state the Netflix team, was unable to correctly translate the idiomatic and non-compositional phrase “jump in.”

Machine translation (MT) systems trained on smaller training sets usually give results that deviate from the context. Grasping phrases, idioms, or complex word language pairs is a challenging task. In other words, the back-translation is different in meaning than the natural source sentence.

Download our Mobile App

To address this problem, the researchers adopt the notion that translating back-translations is easier than translating naturally occurring source sentences.

The model Automated Preprocessing (APP) builds on this observation that human reference translations when back-translated to the original language is a rich source of simplifications (e.g. “jump in” is simplified to “take part”). 

This observation, stated the Netflix team, leads to two immediate corollaries – 

  • back-translating the ground truth human translations to the source language results in a simplified version of the original source, and 
  • a function to map the source sentences to its simplified version can be learned by training a sequence-to-sequence (S2S) model.

The evaluation of the model was done on the GIGS, Wikilarge and the Open Subtitles datasets. FIGS dataset comes from subtitles appearing on 12,301 TV shows and movies from a subscription video-on-demand provider. 

For training the APP simplification model, the researchers used the Transformer architecture through the tensor2tensor library. All experiments were conducted using the transformer base architecture with 6 blocks in the encoder and decoder and are run using 4 NVIDIA V100 GPUs. 

Even though this work mainly focuses on simplifying English-based subtitles, the model is universal and can be used for other languages. 

“Our work merges two important sub-fields machine translation and sentence simplification, and paves the path for future research in both of these fields,” the researchers wrote. 

According to Netflix, the errors in subtitles might not be that critical but are subtle. These experiences add up and might affect the user engagement. Netflix sometimes rejects a subtitle even if the subtitles are grammatically correct but fall short of getting the simple phrases and colloquialisms right.

Translating multiword expressions and non-compositional phrases is a tricky task, and simplifying these expressions before translating help. This work merges machine translation and sentence simplification; two important sub-fields of NLP and the researchers believe that this will lead to further research.

Know more about this work here.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges