MITB Banner

Watch More

How Meta uses TMO to save almost a third of memory per server

TMO holistically identifies offloading opportunities from application and sidecar containers that provide infrastructure-level functions.

Transparent Memory Offloading (TMO) is Meta’s solution for heterogeneous data centre environments. It introduces a new Linux kernel mechanism that measures the lost work due to resource shortage across CPU, memory, and I/O in real-time. Guided by this information and without prior application knowledge, TMO automatically adjusts the amount of memory to offload to a heterogeneous device, such as compressed memory or an SSD. It does so according to the device’s performance characteristics and the application’s sensitivity to slower memory accesses. TMO holistically identifies offloading opportunities from application and sidecar containers that provide infrastructure-level functions.

TMO has been running in production since 2021 and has saved 20 per cent to 32 per cent of total memory across millions of servers in Meta’s expansive data centre fleet.

It is now part of the Linux kernel and, in a nutshell, automatically offloads data to other storage tiers (e.g. Samsung’s CX memory expander) that are less costly and more power-efficient than memory.

TMO has been running on millions of Facebook servers for more than a year, saving almost a third of memory per server. While that is likely insignificant across dozens or hundreds of servers, Facebook’s immense scale presents a unique challenge. 

TMO consists of the following components:

  1. Pressure Stall Information (PSI), a Linux kernel component that measures the lost work due to resource shortage across CPU, memory, and I/O in real time. For the first time, we can directly measure an application’s sensitivity to memory access slowdown without resorting to fragile low-level metrics such as the page promotion rate. 
  2. Senpai, a userspace agent that applies mild, proactive memory pressure to effectively offload memory across diverse workloads and heterogeneous hardware with minimal impact on application performance. 
  3. TMO performs memory offloading to swap at subliminal memory pressure levels, with  turnover proportional to file cache. This contrasts with the historical behaviour of swapping as an emergency overflow under severe memory pressure.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories