Pytorch recently released its No-code configuration based accelerator API named Stoke. Developed by the Data Science Team, AI Center of Excellence at Fidelity Investments, Stoke is a lightweight wrapper that provides a simple declarative API for context switching between devices (e.g. CPU, GPU), distributed modes, mixed-precision, and other PyTorch ‘accelerator’ extensions. It places no restrictions on code structure/style for model architecture, training/inference loops, loss functions, optimizer algorithm, etc.
It simply ‘wraps’ existing PyTorch code to automatically handle the necessary underlying wiring for all of the supported ‘accelerators’. This allows switching from local full-precision CPU to mixed-precision distributed multi-GPU with optimizer state sharding by simply changing a few declarative flags.
Stoke supports the following ‘accelerators’:
- Devices: CPU, GPU
- Distributed: DDP, Horovod, deepspeed (via DDP)
- Mixed-Precision: AMP, Nvidia Apex, deepspeed (custom APEX like backend)
- Extensions: fairscale (Optimizer State Sharding and Sharded DDP), deepspeed (ZeRO Stage 0–3, etc.)
However, certain combinations of backends/functionality are not compatible with each other.
The main benefits that Stoke provides over other traditional APIs are:
- Declarative style API: declare the desired accelerator state(s) and let Stoke handle the rest
- Wrapped API mirrors base PyTorch style model, loss, backward, and step calls
- Automatic device placement of model(s) and data
- Universal interface for saving and loading regardless of the backend(s) or device(s)
- Automatic handling of gradient accumulation and clipping
- Common attrs interface for all backend configuration parameters (with helpful docstrings!)
- A few extra(s) — Custom torch.utils.data.distributed.Sampler: BucketedDistributedSampler, which buckets data by a sorted index and then randomly samples from the specific bucket(s) to prevent situations like grossly mismatched sequence length leading to wasted computational overhead (i.e., excess padding). Helper methods for printing synced losses, device-specific print, number of model parameters, etc.
Stoke has now been released as an open-source tool for the ML community to use. The team says the development will continue in the open-source domain. Continued support for any cutting-edge PyTorch based ‘accelerator’ functionality will be released in the open-source domain (e.g., the recent support for Full Model Sharding incorporated into Fairscale).
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.