Everything You Need To Know About Facebook’s Deep Learning Library PyTorchVideo

Deep video understanding is one of the most challenging tasks in computer vision. With the rise of computing power and the amount of video data on the internet, the demand for new-age machine learning models and tools continues to grow. As per Stanford University, technologies used to develop object detection from videos are maturing rapidly.

Facebook AI recently unveiled a new deep learning library for video understanding called PyTorchVideo. The source code is available on GitHub


Sign up for your weekly dose of what's up in emerging technology.

With PyTorchVideo, Facebook aims to help researchers develop cutting-edge machine learning models and tools to enhance video understanding capabilities, alongside providing a unified repository of reproducible and efficient video understanding components for research and production applications. 

In addition to this, Facebook is looking to standardise video-focused libraries that serve various video use cases in one place. “This has created a barrier for developers looking to work with videos for the first time,” said Facebook AI, stating that lack of standardisation makes it difficult to collaborate and spur innovation. 

In the coming months, Facebook will improve the PyTorchVideo library to enable and support more groundbreaking research in video understanding. “We welcome contributions from the entire community. All our efforts will be directed at supporting the rich open-source community committed to pushing the boundaries of video research,” said Facebook. 

PyTorchVideo: In a nutshell 

Today, the PyTorchVideo library supports components that can be used for various video understanding applications, including video classification, self-supervised learning, detection, and optical flow, among others.

The video understanding library supports other modalities, including audio and text. It is not just limited to desktop devices; its Accelerator package also provides mobile hardware-specific optimization and model deployment flow. 

Some of the core features of PyTorchVideo include: 

  • Enables researchers to build new video architectures using its video models and pretrained weights with customizable components 
  • It consists of a set of downstream tasks like action classification, action detection, acoustic event detection and self-supervised learning (SSL)
  • Supports a wide range of datasets and tasks for benchmarking various video models under different evaluation protocols 
  • Promotes hardware-aware model design and full-speed on-device model execution using efficient building blocks and deployment flow optimized for inference on hardware like mobile devices, Intel NNPI, etc. 
  • Offers access to a growing toolkit of standard scripts for video processing, including tracking, decoding and optimal flow extracting 

At present, PyTorch Video is being used by Facebook AI for various research works, including:

Also, it has been used to fuel recent advances in video transformers and self-supervised learning, including:

There is a shortage of open-source codes and libraries for developing video understanding tools and models, which makes it difficult for researchers to exchange ideas, compare notes and accelerate innovation in the space.

Facebook AI’s PyTorchVideo can bolster innovation in the video understanding space, going beyond the realm of deep fakes and synthetic media propaganda. 

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.