PyTorch Becomes Facebook’s Default AI Framework

Last week, Facebook said it would migrate all its AI systems to PyTorch. Facebook’s AI models currently perform trillions of inference operations every day for the billions of people that use its technology. Its AI tools and frameworks help fast track research work at Facebook, educational institutions and businesses globally.

Big tech companies including Google (TensorFlow) and Microsoft (ML.NET), have been betting big on open-source machine learning (ML) and artificial intelligence (AI) frameworks and libraries.

Why migrate to PyTorch? 

Predominantly, Facebook has been using two distinct but synergistic frameworks for deep learning: PyTorch and Caffe2. PyTorch is optimised for research, while Caffe2 is optimised for production. Caffe2 is Facebook’s in-house production framework for training and deploying large-scale machine learning models. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Facebook said adopting PyTorch as Facebook’s default AI framework ensures that all the experiences across its technologies will run optimally at scale.

“Over a year into the migration to PyTorch, there are more than 1.7K inference models in full production, and 93 percent of our new training models are on PyTorch,” said Lin Qiao, engineering director at Facebook AI. 

Download our Mobile App

Migration also means that Facebook will be closely working alongside the PyTorch developer community. “PyTorch not only makes our engineering and research work more efficient, collaborative and effective, but also allows us to share our work and learn from the advances made by thousands of PyTorch developers around the world,” she added. 

The evolution of PyTorch 

Traditionally, AI’s research-to-production pipeline has been plodding. Numerous steps and tools, fragmented processes, and lack of clear standardisation across the industry made it impossible to manage the end-to-end workflow. Researchers and engineers were forced to choose between AI frameworks optimised for either research or production. 

In 2016, a group of ML/AI researchers at Facebook collaborated with the research community to better understand existing frameworks. The team experimented with machine learning (ML) frameworks such as Theano and Torch and advanced concepts from Lua Torch, Chainer, and HIPS Autograd. “After months of development, PyTorch was born,” said Qiao. It became the go-to deep learning library for AI researchers, thanks to its simple interface, dynamic computational graphs, first-class Python integration and back-end support for CPUs and GPUs. 

In 2018, Facebook released PyTorch 1.0 and started the work to unify PyTorch’s research and production capabilities into a single framework. The new iteration merged Python-based PyTorch with production-ready Caffe2, providing both flexibility for research and performance optimisation for production. 

With time, PyTorch engineers at Facebook introduced various tools, pretrained models, libraries, and data sets for each stage of advancement, enabling the developer and research community to quickly create and deploy new ML/AI innovations at scale. To this day, the platform continues to evolve, with the most recent release boasting more than 3K commits since the prior version. 

The process

Facebook is looking to create a smoother end-to-end developer experience for its engineers and developers and accelerate its reach-to-production pipeline by using a single platform.

“By moving away from Cafee2 and standardising in PyTorch, we are decreasing the engineering and infrastructure burden associated with maintaining two systems, as well as unifying under one common umbrella, both internally and within the open-source community.

“This is an ongoing journey and spans product teams across Facebook. As we migrate our ML/AI workloads, we also need to maintain steady model performance and limit the disruption to any downstream product traffic or research progress,” said Qiao. On average, there are over 4K models running on PyTorch daily at Facebook . 

Further, Qiao said Facebook’s developers go through multiple steps including critical online and offline testing, training, inference, and then publishing. Additionally, multiple tests are conducted to check for performance, and correctness variance between Cafee2 and PyTorch, which can take engineers and researchers up to a few weeks to perform.

To address these migration scenarios, Facebook said its engineers have developed an internal workflow and custom tools to help teams decide the best way to migrate rather than getting it replaced. 

While the migration seems plausible, the latency of machine learning models poses a challenge. Facebook has created internal benchmarking tools to compare the performance of original models with PyTorch counterparts ahead of time, thus, making these evaluations easier. 

Advantages of migrating to PyTorch 

  • ML/AI models are now easier to build, program, test and debug 
  • Research and production environments are brought closer than ever 
  • Deployment on-device (PyTorch Mobile) is accelerating. PyTorch Mobile currently runs on devices like the Oculus Quest and Portal, as well as on desktops, and the Android and iOS mobile apps for Facebook, Instagram, and Messenger 
  • On-device AI will play a crucial role with emerging hardware technologies such as wearable AR

Wrapping up

With PyTorch as the underlying framework powering all of Facebook’s AI workloads and innovations, its engineers can deploy new ML/AI models in minutes rather than in weeks or months. Real-world use cases include Instagram personalisation technologies, person segmentation models (especially in the AR/VR space), enlisting PyTorch in the battle against harmful content like hate speech and misinformation, text-to-speech, optical character recognition and more. 

“PyTorch gives us the flexibility and scalability to move fast and innovate at Facebook,” said Aparna Lakshmi Ratan, director of product management at Facebook AI.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.