Active Hackathon

Microsoft Is Going Big On Reinforcement Learning. Here’s How

When it comes to research in new-age technologies, Microsoft has been striving hard to stay ahead of its competitors. From recommendations to gaming, the tech giant has been using popular techniques like reinforcement learning to create efficient products for customers that match their interests. 

The foundational work in reinforcement learning (RL) started back in 1992, in which the researchers worked on Simple Statistical Gradient. This year, the tech giant has made significant contributions in the ongoing AI conference known as NeurIPS 2020. The three key research areas that are being focussed this year include batch reinforcement learning; a strategic exploration that has given rich observations; and representation learning. 


Sign up for your weekly dose of what's up in emerging technology.

John Langford, the partner research manager at Microsoft Research, stated that all of the reinforcement learning research at the tech giant falls into two specific criteria. One is to solve challenges that the customers are bringing in, and the second is to understand the foundations that the tech giant can utilise in order to build replicable, reliable solutions.

Below are some of the new AI solutions based on reinforcement learning that the tech giant unveiled this year. 


Personalizer is an AI service that delivers a personalised, relevant experience for every user. The service boosts usability and user satisfaction with reinforcement learning-based capabilities and prioritises relevant content, layouts, and conversations through an easy-to-use API. Unlike recommendation engines that suggest a few specific options from a large catalogue, Personalizer presents a suitable outcome for a user, every time they interact with the app. 

A part of Azure Cognitive Services within the Azure AI platform, Personalizer is primarily used in research labs. But now, the AI system is making its way into more Microsoft products and services, from the cognitive services where the developers can plug into apps as well as websites to autonomous systems in order to refine the manufacturing processes. Microsoft has been using this system internally to select the right offers, products and content across Windows, Edge browser and Xbox.

Know more here.

Metrics Advisor

Recently, the tech giant announced the preview of Metrics Advisor, which is a new Azure Cognitive Service in order to address the need for metrics intelligence. The service ingests data from various sources, using machine learning to automatically find anomalies from sensors, products, and business metrics, and provides diagnostics insights. 

It uses reinforcement learning to incorporate feedback and make models more adaptive to a customer’s dataset. Metrics Advisor helps in detecting definite anomalies in sensors, production processes or business metrics.

Know more here

Project Paidia

Project Paidia is a research collaboration between the Game Intelligence group at Microsoft Research Cambridge and game developer Ninja Theory. This project aims to drive the SOTA research in RL to enable novel applications in modern video games, in particular, agents that learn to collaborate with human players. According to a blog post, Project Paidia focuses on learning a particularly challenging type of behaviour such as collaboration with human players.

Know more here.

AI Robot Control System

Last year in October, Microsoft Research joined hands with Sber Robotics Laboratory to develop a unique AI control system that teaches robots to manipulate physical objects of unstable shape in almost the same way that humans do. The work on the project was carried out at Sber’s Robotics Laboratory in Moscow in 2019 and collectively lasted over a year.

Finishing the project this year, the researchers at the Sber and Microsoft team used deep reinforcement learning and machine teaching techniques first to train the AI agent in a simulated environment, where it could explore different strategies and learn what worked best. Once deployed in real-world working conditions, the robotic system was successfully able to unload the coin bags on the first try 95% of the time.

Know more here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter