Listen to this story
|
The newly released PyTorch 1.12 has introduced BetterTransformer which implements a backwards-compatible fastpath of torch.nn.TransformerEncoder for Transformer Encoder Inference. It helps in 2x in speedup and throughput for many common execution scenarios.
Image: Transformer Encoder architecture
BetterTransformer launches with accelerated native implementations of MultiHeadAttention and TransformerEncoderLayer for CPUs and GPUs. These fast paths are integrated into the standard PyTorch Transformer APIs and help in accelerating TransformerEncoder, TransformerEncoderLayer and MultiHeadAttention nn.modules.
Source: pytorch.org
These new modules implement two types of optimizations:
- Fused kernels combine multiple individual operators normally used to implement Transformers to provide a more efficient implementation.
- Take advantage of sparsity in the inputs to avoid performing unnecessary operations on padding tokens. Padding tokens frequently account for a large fraction of input batches in many Transformer models used for Natural Language Processing.
Backwards Compatibility
Advantageously, BetterTransformer does not need any model change. To benefit from fast path execution, inputs and operating conditions must satisfy some access conditions. While the internal implementation of Transformer APIs has changed, PyTorch 1.12 maintains strict compatibility with Transformer modules shipped in previous versions, enabling PyTorch users to use models created and trained with previous PyTorch releases while benefiting from BetterTransformer improvements.
For more details, click here