Last updated December 22, 2020
In AI Mysteries

ML Behind Your Favorite Google Meet Backgrounds

Published on November 6, 2020
by Ram Sagar

The pandemic forced the corporate world to abandon their offices and work from home. As a matter of fact, the in-person meetings were replaced by virtual meetups. Thanks to this new tradition, video conferencing companies like Zoom benefited tremendously. The sudden rush to embrace virtual worlds led to various challenges. Users were annoyed by interruptions, noise, monotonous walls and more. So, companies that offer video call services allowed users to change their background and made changes so that the noise in audio is reduced.

These tweaks are usually the result of machine learning models running in the background. Loading these models and running them for inference can be slow. So, models need to be small along with the imagery. More so, if you are launching these services on your browser. Google Workspace (formerly G-Suite) has done well in this regard by making the voice calls more clear and the backgrounds more aesthetic for its users. In a recent blog post, Google discussed how they had achieved such high quality for their video services.

The engineers at Google diligently crafted a pipeline that leverages many ML innovations that Google has developed over the years. One of them is MediaPipe, an open-source, cross-platform framework for building pipelines to process perceptual data of different modalities. More about it in the next section.

About MediaPipe

Most of the object detection usually addresses two dimensional or 2D objects. The bounding boxes are always rectangles and squares but never a cube. By extending prediction to 3D, one can capture an object’s size, position and orientation in the world, leading to a variety of applications in robotics, self-driving vehicles, image retrieval, and augmented reality.

Google AI released MediaPipe Objectron, a mobile real-time 3D object detection pipeline for everyday objects. This pipeline detects objects in 2D images, and estimates their poses and sizes through a machine learning (ML) model, trained on a newly created 3D dataset. Objectron computes oriented 3D bounding boxes of objects in real-time on mobile devices.

Using MediaPipe, Google introduced a new in-browser ML solution for blurring and background replacement in Google Meet. With the help of MediaPipe, ML models and OpenGL shaders run efficiently on the browser. Google claims that it has achieved real-time performance with low power consumption, even on low-power devices.

How Did Google ‘Meet’ The Challenge

“…other solutions require installing additional software, Meet’s features are powered by cutting-edge web ML technologies built with MediaPipe that work directly in your browser — no extra steps necessary.”

To provide real-time, in-browser performance, Google combined efficient on-device ML models, WebGL-based rendering, and web-based ML inference via XNNPACK and TensorFlow Lite.

MediaPipe leverages WebAssembly, a low-level binary code format designed specifically for web browsers. This helps improve speed for compute-heavy tasks. During a video call, the browser converts WebAssembly instructions into native machine code that executes much faster than traditional JavaScript code.

The procedure can be summarised as follows:

Each video frame is processed by segmenting a user from their background.
ML inference is used to compute a low-resolution mask.
The mask is further refined to align it with the image boundaries.
This mask is then used to render the video output via WebGL2, with the background blurred or replaced.

The Google Meet team has built a Segmentation model for the smooth functioning of on-device ML models. These models need to be ultra-lightweight for fast inference, low power consumption, and small download size. Mount these models onto a browser, and the resolution takes a toll on the number of FLOPs necessary to process each frame. So, smaller images are mandatory for a better experience.

So, the images are downsampled to a smaller size before fed to the model. The team has brought the encoder-decoder model size down to 400KB by modifying the exporting the model to TensorFlow Lite using float16 quantization. This, said Google, resulted in a slight loss in weight precision with no drop in quality. The resulting model has 193K parameters and is only 400KB in size. For the encoder model, they have used MobileNetV3-small.

For background replacement, the team adopted a compositing technique, known as light wrapping. Light wrapping softens the segmentation edges by allowing background light to spill over onto foreground elements and will enable users to blend well with their backgrounds. Also removes the halo effects.

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

Watch More

ML Behind Your Favorite Google Meet Backgrounds

About MediaPipe

How Did Google ‘Meet’ The Challenge

Ram Sagar

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.