How Meta uses TMO to save almost a third of memory per server

Transparent Memory Offloading (TMO) is Meta’s solution for heterogeneous data centre environments. It introduces a new Linux kernel mechanism that measures the lost work due to resource shortage across CPU, memory, and I/O in real-time. Guided by this information and without prior application knowledge, TMO automatically adjusts the amount of memory to offload to a heterogeneous device, such as compressed memory or an SSD. It does so according to the device’s performance characteristics and the application’s sensitivity to slower memory accesses. TMO holistically identifies offloading opportunities from application and sidecar containers that provide infrastructure-level functions.

TMO has been running in production since 2021 and has saved 20 per cent to 32 per cent of total memory across millions of servers in Meta’s expansive data centre fleet.

It is now part of the Linux kernel and, in a nutshell, automatically offloads data to other storage tiers (e.g. Samsung’s CX memory expander) that are less costly and more power-efficient than memory.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

TMO has been running on millions of Facebook servers for more than a year, saving almost a third of memory per server. While that is likely insignificant across dozens or hundreds of servers, Facebook’s immense scale presents a unique challenge. 

TMO consists of the following components:


Download our Mobile App



  1. Pressure Stall Information (PSI), a Linux kernel component that measures the lost work due to resource shortage across CPU, memory, and I/O in real time. For the first time, we can directly measure an application’s sensitivity to memory access slowdown without resorting to fragile low-level metrics such as the page promotion rate. 
  2. Senpai, a userspace agent that applies mild, proactive memory pressure to effectively offload memory across diverse workloads and heterogeneous hardware with minimal impact on application performance. 
  3. TMO performs memory offloading to swap at subliminal memory pressure levels, with  turnover proportional to file cache. This contrasts with the historical behaviour of swapping as an emergency overflow under severe memory pressure.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Tasmia Ansari
Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES