Baidu Conquers The Next-Gen AI Race By Beating Tech Giants In A Language Test

Baidu AI

Although conversing with artificial intelligence (AI) has been a common plot point for several science fiction movies, the real-life applications are still miles away. However, according to recent media reports, Chinese technology giant — Baidu has started to make bold strides towards AI by beating Google and Microsoft in a competition designed to test the ability of machine in understanding human language.

Baidu, which is often termed as China’s Google, surpassed traditional players when it comes to AI and language learning. It has achieved the highest ever score in the General Language Understanding Evaluation (GLUE), which has been widely considered to be the benchmark for AI language comprehension skills. For most humans, the managed score is usually an 87 out of 100, however, Baidu’s model, called ERNIE (Enhanced Representation through knowledge Integration has scored a 90, which is a first for any AI models. This model was initially developed to understand the Chinese language but researchers soon realised its ability to understand English as well.

According to Hao Tian, chief architect at Baidu Research said, “When we first started this work, we were thinking specifically about certain characteristics of the Chinese language. But we quickly discovered that it was applicable beyond that.”

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Behind The Scenes

ERNIE was completely inspired by Google’s BERT (Bidirectional Encoder Representations from Transformers), a “masked” language model, created in 2018, used to train AI to understand human language. Both the models — Bert and Ernie were named after Sesame Street characters, a children’s show, which are used to interpret meanings by examining the words appearing both before and after a word in a sentence in order to fully establish context. 

While, Google’s model hides 15 per cent of the words in each sequence and then tries to predict them based on the context, the scene for Baidu is completely different. This is because, many Chinese characters do not have an inherent meaning until they are strung together with other characters, which is a key linguistic difference from English, and therefore Baidu’s team need to take steps further in training its AI model to better understand a way to hide a string of meaningful characters and predict the masked ones. The company has already started to use the model in order to improve results for its search engine and make its AI assistant Xiao Du more accurate.

Download our Mobile App

Also Read: Baidu Goes On A Patent Frenzy

To make people understand better, the company illustrated the technique on its Github page, taking ‘Harry Potter is a series of fantasy novels written by J. K. Rowling’, as an example. Google’s BERT was able to identify the letter ‘K’ through the local co-occurring words J, K, and Rowling, but was not able to comprehend anything related to the word ‘J. K. Rowling’. However, ERNIE, on the other hand, was able to understand the exact relationship between ‘Harry Potter’ and ‘J. K. Rowling,’ by analysing the underlying knowledge of words and phrases, to come to a conclusion that ‘Harry Potter’ was a novel written by ‘J. K. Rowling.’

With such understanding AI, Baidu comprehends meaningful words instead of individual characters and therefore performing better in both English as well as Chinese. ERNIE is now being used for real-world applications, where it is deployed to answer questions on its search engine and deliver better results.

According to Baidu Research team, “Although language understanding will always remain a difficult challenge, our results on GLUE indicate that pre-training language models with continual training and multi-task learning are a promising direction for NLP research. And therefore, we will keep improving the performance of the ERNIE model via the continual pre-training framework.”

Also Read: 7 Innovations By Baidu Which Changed The Face Of AI

Business Prospect

Baidu, with a total of 5712 AI-related patents, is currently at an expanding mode different sectors like virtual assistants, smart speakers and autonomous cars. The company’s patent applications were followed by Tencent (4,115), Microsoft (3,978), Inspur (3,755), and Huawei (3,656), according to the report issued by the China Industrial Control Systems Cyber Emergency Response Team, a research unit under the MIIT. The report also mentioned how Baidu is leading the patent application in several key areas of AI, which include deep learning (1,429), NLP (938), and speech recognition (933). The company also leads in the highly competitive area of intelligent driving, with 1,237 patent applications. 

In fact, earlier this month, this Chinese giant has partnered with Samsung to develop power-efficient AI chips, which could be used for managing large-scale AI workloads, such as search ranking, speech recognition, image processing, natural language processing, autonomous driving, and deep learning platforms. So, this partnership with Samsung will help Baidu’s NLP framework, ERNIE, to process language way faster than it could be imagined with its current GPUs.

Along with that, once Baidu leaves the partnership with NVIDIA’s AI accelerators, it will, in turn, omits its dependency on American companies. It will also help in reducing the cost of their data centres, and the whole move will also give them an upper hand on its AI rival — Alibaba, who has recently launched its own AI accelerator chip.

Also Read: Baidu’s ERNIE 2.0 Gets NLP Top Honours, Eclipses Bert & XLNet


Well, after several years of research, the Chinese giant has now developed a comprehensive AI ecosystem which has brought the company at the forefront of the global AI industry. And, according to the media, in the near future, Baidu will continue to push forward the real application of AI into more vertical. It is also believed that the company will continue its researches in the core sectors of AI aiming to contribute to the technological innovation of China.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Sejuti Das
Sejuti currently works as Associate Editor at Analytics India Magazine (AIM). Reach out at

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.