Deep video understanding is one of the most challenging tasks in computer vision. With the rise of computing power and the amount of video data on the internet, the demand for new-age machine learning models and tools continues to grow. As per Stanford University, technologies used to develop object detection from videos are maturing rapidly.
Facebook AI recently unveiled a new deep learning library for video understanding called PyTorchVideo. The source code is available on GitHub.
With PyTorchVideo, Facebook aims to help researchers develop cutting-edge machine learning models and tools to enhance video understanding capabilities, alongside providing a unified repository of reproducible and efficient video understanding components for research and production applications.
In addition to this, Facebook is looking to standardise video-focused libraries that serve various video use cases in one place. “This has created a barrier for developers looking to work with videos for the first time,” said Facebook AI, stating that lack of standardisation makes it difficult to collaborate and spur innovation.
In the coming months, Facebook will improve the PyTorchVideo library to enable and support more groundbreaking research in video understanding. “We welcome contributions from the entire community. All our efforts will be directed at supporting the rich open-source community committed to pushing the boundaries of video research,” said Facebook.
PyTorchVideo: In a nutshell
Today, the PyTorchVideo library supports components that can be used for various video understanding applications, including video classification, self-supervised learning, detection, and optical flow, among others.
The video understanding library supports other modalities, including audio and text. It is not just limited to desktop devices; its Accelerator package also provides mobile hardware-specific optimization and model deployment flow.
Some of the core features of PyTorchVideo include:
- Enables researchers to build new video architectures using its video models and pretrained weights with customizable components
- It consists of a set of downstream tasks like action classification, action detection, acoustic event detection and self-supervised learning (SSL)
- Supports a wide range of datasets and tasks for benchmarking various video models under different evaluation protocols
- Promotes hardware-aware model design and full-speed on-device model execution using efficient building blocks and deployment flow optimized for inference on hardware like mobile devices, Intel NNPI, etc.
- Offers access to a growing toolkit of standard scripts for video processing, including tracking, decoding and optimal flow extracting
At present, PyTorch Video is being used by Facebook AI for various research works, including:
- SlowFast networks for video recognition
- Audiovisual SloFast networks for video recognition
- X3D: Expanding architectures for efficient video recognition
- Video classification with channel-separated convolutional networks
Also, it has been used to fuel recent advances in video transformers and self-supervised learning, including:
- Multiscale vision transformers
- A large-scale study on unsupervised spatiotemporal representation learning
- Multiview pseudo-labelling for semi-supervised learning from video
- Unidentified video objects: A benchmark for dense, open-world segmentation
There is a shortage of open-source codes and libraries for developing video understanding tools and models, which makes it difficult for researchers to exchange ideas, compare notes and accelerate innovation in the space.
Facebook AI’s PyTorchVideo can bolster innovation in the video understanding space, going beyond the realm of deep fakes and synthetic media propaganda.