Recently, Python’s popular open-source machine learning library, PyTorch announced its new performance debug profiler, PyTorch Profiler, along with its 1.8.1 version release. The new tool — developed as a part of a collaboration between tech giants Facebook and Microsoft — enables accurate and efficient performance analysis in large scale deep learning models.
Behind PyTorch Profiler
With a new module namespace torch.profiler, PyTorch Profiler is the successor of PyTorch autograd profiler. This new tool uses a new GPU profiling engine — built using the NVIDIA CUPTI APIs — and can capture the GPU kernel events with high fidelity. PyTorch includes a simple profiler API that is useful when determining the most expensive operators in the model.
The Profiler collects both GPU and framework related information. The tool then correlates them, performs automatic detection of bottlenecks in the model, generates recommendations on how to resolve these bottlenecks, and visualise them. The new Profiler API is natively supported in PyTorch. Users can profile their models using this API, without installing any additional packages and get results immediately in TensorBoard with the new PyTorch Profiler plugin.
PyTorch profiler accepts several parameters. some of the most valuable parameters to analyse the execution time are:
- record_shapes – whether to record shapes of the operator inputs;
- profile_memory – whether to report the amount of memory consumed by the model’s Tensors
- use_cuda – whether to measure execution time of CUDA kernels.
To analyse the memory consumption, the PyTorch Profiler can show the amount of memory used by the model’s tensors allocated during the execution of the model’s operators.
Importance of Profiler In ML
Analysing and performance improvement of deep learning models are some of the existing challenges researchers and developers face as the model’s size increases. According to sources, for a long time now, PyTorch users have a hard time solving these challenges due to the lack of available tools. The standard performance debugging tools that provide GPU hardware level information missed the PyTorch-specific context of operations.
To recover the missed information, users needed to combine multiple tools or manually add minimum correlation information to make sense of the data. An autograd profiler, known as torch.autograd.profiler, could capture information about PyTorch operations but does not capture detailed GPU hardware-level information or support visualisation.
The PyTorch Profiler (torch.profiler) is a tool that can capture both information about PyTorch operations and capture detailed GPU hardware-level information.
Advantages of Profiler
According to a few Reddit users, the new tool will be more useful than its previous version and NVVP (Nvidia visual profiler) as it provides profile data-loading, processing, matching of GPU operations to your PyTorch modules and more.
The advantages of PyTorch Profiler are as follows:
- To gain insight regarding the operations run inside a model.
- To diagnose performance issues and optimise deep learning models.
- To analyse the performance profile.
- Identifies the bottlenecks and eliminates them by following the recommendations.
- Lets users inspect the cost of different operators inside the model – both on the CPU and GPU.
- To measure the time and memory consumption of the model’s operators.