MITB Banner

Watch More

PyTorch Releases v1.10: All New Features & Updates

The new update is focused on improving the training and performance, alongside developer usability.
PyTorch Releases v1.10: All New Features & Updates

Facebook’s open-source machine learning framework PyTorch recently announced the launch of v1.10. The new version of the framework is composed of over 3,400 commits since 1.9, made by 426 contributors. The new update is focused on improving the training and performance, alongside developer usability. 

In June 2021, PyTorch had released v1.09, with improvements in torch.linalg, torch.special, and Complex Autograd, along with Mobile Interpreter, TorchElastic, the PyTorch RPC framework, APIs for model inference deployment, and PyTorch Profiler. 

Here are key highlights of v1.10

  • CUDA Graphs APIs are integrated to reduce CPU overheads for CUDA workloads 
  • Several frontend APIs such as FX, torch.special, and nn.Module Parametrisation have been moved from beta to stable 
  • Support for automatic fusion in JIT Compiler expands to CPUs in addition to GPUs 
  • APIs Android NNAPI support is now available in beta 

CUDA Graphs APIs Integration (Beta) 

In a bid to reduce CPU overheads for CUDA workloads, PyTorch now integrates CUDA Graphs APIs. As a result, it greatly reduces the CPU overhead for CPU-bound CUDA workloads, improving performance by increasing GPU utilisation. Plus, it reduces jitters for distributed workloads, and since parallel workloads have to wait for the slowest worker – reducing jitter improves overall parallel efficiency. 

API Integration allows easy interop between the network parts captured by CUDA graphs and parts of the network that cannot be captured due to graph limitations. 

Conjugate View (Beta) 

For complex tensors (torch.conj()), PyTorch’s conjugation is now a constant time operation and returns a view of the input tensor with a conjugate bit set as can be seen by calling torch.is_conj(). For example, this has been conjugated in various PyTorch operations like matrix multiplication, dot production, etc., to fuse conjugation with the operation leading to significant performance gain and memory savings on both CUDA and CPU. 

Python Code Transformation with FX 

FX offers a Pythonic platform for transforming and lowering PyTorch programmes. For pass writers, this toolkit facilitates Python-to-Python transformation of functions and nn.Module instances. It aims to support a subset of Python language semantics to facilitate ease of implementation of transforms. With the latest update, FX is moving to stable.

Check out FX examples on GitHub.

torch.special  

torch.special analogous to SciPy’s special module, is now available in stable. It has about 30 operations, including gamma, Bessel, and (Gauss) error functions. 

nn.Module Parameterisation 

This feature allows users to parametrise any parameter or buffer of an nn.Module without modifying the nn.Module itself is available in stable. This release adds weight normalisation (weight_norm), orthogonal parameterisation (matrix constraints and part of pruning) and more flexibility when creating your own parameterisation. See tutorials for more details. 

Distributed Training 

In the latest PyTorch v1.10, several features are moving from beta to stable in the distributed package. Here are some of the features that are now stable:

  • Remote module allows users to operate a module on a remote worker like using a local module, where the RPCs are transparent to the user. 
  • DDP Communication Hook: It allows users to override how DDP synchronises gradients across processes.  
  • ZeroRedundancyOptimiser: It can be used in conjunction with DistributedDataParallel to minimise the size of per-process optimiser states. With this new release, it now can handle uneven inputs to different data-parallel workers. 

Besides this, PyTorch has also improved the parameter partition algorithm to better balance memory and computation overhead across processes. Check out the tutorials here. 

Performance Optimisation and Tooling 

 

Profile-directed typing in TorchScript (Beta)

For compilation to be successful, TorchScript has a hard requirement for source code to have type annotations. Trial & error was the only way to add missing or incorrect type annotations in the past. This was inefficient and time-consuming. With the latest update, PyTorch has enabled profile directed typing for torch.jit.script by using existing tools like MonkeyType, making the process much easier, faster, and more efficient.  

CPU Fusion (Beta) 

In the latest PyTorch 1.10, the team has added an LLVM-based JIT compiler for CPUs that can fuse a sequence of torch library calls to improve performance. This is the ‘first time’ they have brought compilation to the CPUs, while they have had this capability for some time on GPUs. Check out the performance results here (Colab notebook). 

PyTorch Profiler (Beta)

The main objective of PyTorch Profiler is to target the execution steps that are the most costly in time and memory and visualise the workload distribution between CPUs and GPUs. Here are some of the key features of PyTorch 1.10: 

  • Enhanced memory view 
  • Enhanced automated recommendations 
  • Enhanced kernel view 
  • Distributed training 
  • Correlate operators in the forward and backward pass 
  • TensorCore 
  • NVTX
  • Support for profiling on mobile devices 

To get started with new features, check out the tutorials here.

PyTorch Mobile: Android NNAP Support (Beta) 

Last year, PyTorch had released prototype support for Android’s neural networks API (NNAPI). It allows Android apps to run computationally intensive neural networks on chips that power mobile phones. This includes GPUs and NPUs (specialised neural processing units). 

Since then, the team has added more op coverage, support for flexible load-time shapes, and the ability to run the model on the host for testing. Check out the tutorial for using this feature. 

In addition to this, transfer learning steps have been added to object detection examples. 

Other Updates 

  • TorchX: A new SDK for quickly building and deploying ML applications from research and development to production. 
  • TorchAudio: Here, the team has added text-to-speech pipeline, self-supervised model support, multi-channel support and MVDR beamforming module, RNN transducer (RNNT) loss function, and batch and filterbank support to filter function. 
  • TorchVision: Added new RegNet and EfficientNet models, FX-based feature extraction added to utilities, two new Automatic Augmentation techniques: Rand Augment and Trivial Augment, and updated training recipes. 

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Amit Raja Naik

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories