Microsoft Releases Gaze-Tracking System That Works On Any Device

Microsoft Gaze Tracking system

The researchers at Microsoft have developed an AI-based gaze tracking system that works on any device. This system is correctly dubbed as ‘hardware-agnostic’, given its ability to function on any type of device; further, the researchers believe that such a feature would now lay the groundwork for developing prediction capabilities of deep neural networks to control computers, tablets, or phones using just the eyes.

This system utilises a deep neural network architecture as an appearance-based method that uses facial imagery for constrained gaze-tracking. The said facial imagery is captured on an ordinary RGB camera present in most modern computing devices. Microsoft’s new gaze-tracking architecture could find its application in enabling people with motor-neuron disabilities such as ALS and cerebral palsy to control their devices, for doctors to interact with patient information without touching the screen or the keyboard, interactive gaming, behavioural studies, and user experience research.

Hardware-Agnostic GazeTracker

Eye-gaze based computers depend on complex computational tasks requiring measuring the user’s head-pose, head-positioning, eye-rotation, and the distance between the user and the object. All the mentioned variables are calculated with respect to the observer’s frame of reference, which is an assembly of an infrared light projector and a high-resolution infrared camera. The accuracy of the system is affected by several factors such as illumination, background noise, optical properties of the sensors and quality of imaging, among others. However, the issue with such standardised and general devices is that they are customised to either the device they are being used on or to user-specific calibration and hence requires the development of specialised hardware. There are several challenges to acquiring such hardware — availability, affordability, reliability, and ease of use.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The newly developed architecture by the Microsoft researchers aims at overcoming this particular problem. As part of their experiment, the researchers employed RGB cameras, present in almost all of the modern computing devices along with applications of recent advances in deep learning.

For the experiment, the researchers reproduced an iTracker network architecture for RGB-based constrained gaze-tracking as the baseline. The iTracker developed for this project did not employ any calibration of device-specific fine-tuning, as the original version. This iTracker architecture captures the eyes, the face region from the original image and a 25X25 binary face grid that indicates the positions of all the face pixels in the original image. These input images were then passed through the Eye and Face sub-networks after which the corresponding output is processed by multiple fully connected layers for estimating the gaze point coordinates.

Notably, the researchers used GazeCapture dataset, acquired on phones and tablets to train their model. GazeCapture, an MIT corpus, is also the largest dataset containing data from 1,450 people with fine-tuning which is publicly available. However, they also performed data augmentation to equip the model in handling real-world variations. Other random changes were introduced in terms of brightness, contrast, and saturation.

A single model was trained for both smartphones and tablets with the entire dataset. During the entire experiment, methods such as regularisation, data augmentation, colour transformation, and data normalization were used. The detailed experiment can be found here, and the corresponding code can be found here.

Additionally, to weed out the potential biases, the Grad-Cam++ techniques were used to generate the heat map of the model’s internal gradient activities.

What was obtained at the end was a system that achieved an RMSError of 1,8073 on GazeCapture.

Wrapping Up

This is not the first time Microsoft is experimenting with gaze-tracking. In a recent study, the researchers experimented with infrared lights around the display for eye-tracking. Additionally, Microsoft OS Windows 10 was the first to provide a technology called Eye Control for allowing users to use just eye movement to control their on-screen mouse and keyboard.

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox