Listen to this story
|
PyTorch has announced a new series of 10 video tutorials on Fully Sharded Data Parallel (FSDP) today. The tutorials are led by Less Wright, an AI/PyTorch Partner Engineer and who also presented at Nvidia Fall GTC 2022.
Introducing what the users will be learning, Less Wright says, “Whether you are training a 100 million or 1 trillion model parameter model, the series will enable users to train the models more efficiently, along with short deep dives of various aspects of FSDP.”
Wright believes that the main goal of the 10-part series is to help build expertise on leveraging FSDP for distributed AI training. He also says that the series will be added with new videos, along with features to the FSDP.
Source: YouTube
For instance, the first series titled, ‘Accelerate your training speed with the FSDP Transformer wrapper’, consists of a tutorial on how to utilise the new FSDP transformer wrapper. Unlike the default wrapper which makes sharding choices based on parameter count, this transformer wrapper understands how the model operates—locating appropriate breaks to shard.
To put it in simple terms, it lets users know how to implement the transformer wrapper and increase the model’s training speed by up to 2x.
Other parts of the series include FSDP Mixed Precision Training, Sharding Strategies, Backwards Prefetching, and Fine Tuning Models.
Meta recently announced the PyTorch project to be part of the non-profit Linux Foundation—newly launching as PyTorch foundation. The main goal would be to drive adoption of AI and deep learning tooling—fostering and sustaining an ecosystem of open source—and vendor-neutral projects with PyTorch.