Active Hackathon

How Meta uses TMO to save almost a third of memory per server

TMO holistically identifies offloading opportunities from application and sidecar containers that provide infrastructure-level functions.

Transparent Memory Offloading (TMO) is Meta’s solution for heterogeneous data centre environments. It introduces a new Linux kernel mechanism that measures the lost work due to resource shortage across CPU, memory, and I/O in real-time. Guided by this information and without prior application knowledge, TMO automatically adjusts the amount of memory to offload to a heterogeneous device, such as compressed memory or an SSD. It does so according to the device’s performance characteristics and the application’s sensitivity to slower memory accesses. TMO holistically identifies offloading opportunities from application and sidecar containers that provide infrastructure-level functions.

TMO has been running in production since 2021 and has saved 20 per cent to 32 per cent of total memory across millions of servers in Meta’s expansive data centre fleet.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

It is now part of the Linux kernel and, in a nutshell, automatically offloads data to other storage tiers (e.g. Samsung’s CX memory expander) that are less costly and more power-efficient than memory.

TMO has been running on millions of Facebook servers for more than a year, saving almost a third of memory per server. While that is likely insignificant across dozens or hundreds of servers, Facebook’s immense scale presents a unique challenge. 

TMO consists of the following components:

  1. Pressure Stall Information (PSI), a Linux kernel component that measures the lost work due to resource shortage across CPU, memory, and I/O in real time. For the first time, we can directly measure an application’s sensitivity to memory access slowdown without resorting to fragile low-level metrics such as the page promotion rate. 
  2. Senpai, a userspace agent that applies mild, proactive memory pressure to effectively offload memory across diverse workloads and heterogeneous hardware with minimal impact on application performance. 
  3. TMO performs memory offloading to swap at subliminal memory pressure levels, with  turnover proportional to file cache. This contrasts with the historical behaviour of swapping as an emergency overflow under severe memory pressure.

More Great AIM Stories

Tasmia Ansari
Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

[class^="wpforms-"]
[class^="wpforms-"]