Google Translate Has Gender Bias. And It Needs Fixing

Machine prejudice is increasingly becoming a cause for concern. Despite  translations becoming more natural and fluid with advancements in neural machine translation (NMT), they still reflect societal biases and stereotypes.

Gender bias also surfaces when working with languages that use gender-specific terminology. For example, Google Translate has historically translated the Turkish phrase “He/she is a doctor,” into the masculine form, whereas the Turkish phrase “He/she is a nurse” has always been translated into the feminine form.


Sign up for your weekly dose of what's up in emerging technology.

Prevailing gender bias in translation 

Now, Google is aiming to reduce gender bias in machines. In December 2018, it released gender-specific translations in Google Translate that enables gender-neutral searches to have their translations rendered in both feminine and masculine forms.

One of Google’s key research areas is using adjacent sentences and passages as context to  make notable improvement in gender accuracy.These techniques present a hurdle because gender information is not often explicitly expressed in each individual sentence. For instance, in the following Spanish passage, the first sentence directly refers to Marie Curie as the topic, but the second sentence does not. The second sentence could be referring to anyone, regardless of their gender. When translating, the initial sentence must identify a pronoun and reveal the information needed for an accurate translation.

Spanish TextTranslation to English
Marie Curie nació en Varsovia. Fue la primera persona en recibir dos premios Nobel en distintas especialidades.Marie Curie was born in Warsaw. She was the first person to receive two Nobel Prizes in different specialties.
Source: AI blog

Furthermore, to counteract the usual issues in contextual translation (e.g., pronoun drop, gender agreement and appropriate possessives), Google is releasing the Translated Wikipedia Biographies dataset to evaluate the gender bias of translation models. The objective is to support long-term advances on machine learning systems focusing on pronouns and gender in translation by providing a benchmark in which translations’ correctness can be tested pre- and post-model revisions.

Case study with Google Translate

In 2019, a paper published in Neural Computing and Applications, “ Assessing gender bias in machine translation: a case study with Google Translate” by Prates, M.O.R., Avelar, P.H. & Lamb, L.C. studied gender bias in machine translation. 

The researchers believed that automatic translation systems can be leveraged via gender-neutral languages to provide an insight into gender biases in AI. The team began with a comprehensive list of job positions from the US Bureau of Labor Statistics (BLS) and used it to construct sentences in gender neutral languages such as Hungarian, Chinese, Yoruba, etc. The researchers used the Google Translate API to translate the lines into English and collect statistics on the prevalence of female, male, and gender-neutral pronouns in the translated output. This demonstrated the strong inclination of Google Translate towards masculine default, particularly in areas generally linked with unequal distribution of genders or with preconceptions, such as science, technology, engineering and maths jobs. The comparison of these figures to BLS data on the frequency of female participation in each occupational position, showed that Google Translate fails to replicate a real-world distribution of female workers.  

Translated Wikipedia Biographies dataset

The dataset was developed to examine common gender errors in machine translation. Each instance of dataset represents an individual, a rock band, or a sports team (considered genderless). Non-native English speaking staff write articles in their original language and have them professionally translated into Spanish and German. Similar sets could be used to examine pronoun-drop and gender agreement in both Spanish and English. Bands and sports teams are the first third-person pronouns to be found in a gender-specific investigation.

The dataset was created by selecting an equal representation of examples across geographies and genders. To ensure an objective selection of occupations, researchers chose nine that exemplified a range of stereotyped gender connections (either feminine, masculine, or neither). Then, to account for any geographical bias, they separated all of these cases according to their geographic variety. There were two biographies  (one male and one feminine), one for each of the seven geographic zones. Finally, 12 instances that lacked a gender were included. Rock bands and sports teams were chosen since they are frequently referred to by third person non-gendered pronouns such as “it” or singular “they”. 


The machine translation evaluation approach using this dataset offers new applications (introduced in a previous post). One can calculate the correctness of the gender-specific translations that relate to this subject as each instance is tied to a known gender. This computation is much easier when it is translated into English, as all pronouns in the language are gender-specific. Gender datasets have also reduced errors on context-aware models by 67% compared to earlier models.Using this additional information, new lines of research may be explored into how different models perform across various occupations or areas.

More Great AIM Stories

Ritika Sagar
Ritika Sagar is currently pursuing PDG in Journalism from St. Xavier's, Mumbai. She is a journalist in the making who spends her time playing video games and analyzing the developments in the tech world.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
Analytics India Industry Study 2022

Analytics India Industry Study 2022

The analytics industry recorded a substantial increase of 34.5% on a year-on-year basis in 2022, with the market value reaching USD 61.1 billion.