Advertisement

Active Hackathon

New Algorithm Improves ML Model Training Over The Internet

Volunteer computing (VC) is popular with other domains such as bioinformatics and physics where people donate the idle time of their desktops, smartphones, and other personal devices.

“Training BERT costs about $7,000, and for the largest models like GPT-3, this number can be as high as $12 million.”

Typically, training a deep learning model starts with a forward pass where loss functions are evaluated followed by a backward pass where the loss-compensating gradients are generated, which are then pushed to servers and updated. These servers aggregate the updates from all the users and make changes to the global machine learning model. Now, this procedure repeats itself multiple times until it hits a certain accuracy. State of the art models are large and involve heavy compute. As models become bigger, the training process continues to remain an expensive affair. Distributed training was introduced to avoid restricting research to just well funded labs. Volunteer computing (VC) is popular with other domains such as bioinformatics and physics where people donate the idle time of their desktops, smartphones, and other personal devices to solve a computationally hard problem. Imagine lending your friend’s PC to train your deep learning model remotely while they are away. 

Landscape of collaborative computation

Folding@home or FAH is a distributed computing project for simulating protein dynamics, including protein folding and the movements of proteins apropos a variety of diseases. FAH brings together volunteers (citizen scientists) to run simulations of protein dynamics on their personal computers. Insights from these data help scientists to better understand biology and provide new opportunities for developing therapeutics. For example, in Folding@home, over 700,000 volunteers have collectively contributed 2.43 exaFLOPs of compute to COVID-19 research in April of 2020 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The Berkeley Open Infrastructure for Network Computing or BOINC app, allows downloading of scientific computing jobs on a user’s personal computer and runs the workload in the background. For instance, the Rosetta@home is a distributed computing project for protein structure prediction on the BOINC platform. Rosetta can tap into the computational power of idle computers to help with projects related to designing new proteins and to predict their 3-dimensional shapes.

MLC@Home is a distributed computing project dedicated to understanding and interpreting complex machine learning models, with an emphasis on neural networks. It uses the BOINC distributed computing platform. MLC@Home is another project on BOINC that provides an open, collaborative platform for ML researchers. It allows them to train thousands of networks in parallel, with tightly controlled inputs, hyperparameters, and network structures. However, distributed training still has few problems:

  • Distributed training of a single model requires significantly more communication and does not allow a natural way to “restart” failed jobs. 
  • Distributed training of neural networks are bounded by the throughput of parameter servers and the memory available on the weakest GPU.

“Is there really no alternative to using pre-trained models for the broader ML community?”

According to Hugging Face (HF)–whose NLP libraries are used by companies such as Apple– data transfer in distributed deep learning is still a bottleneck. This can arise due to the need to aggregate the gradients from multiple workers and as most participants don’t have high speed connections, they run the risk of getting dropped from the network. “So how on Earth can you train anything with a household data plan?” asks the team at HF. 

Now, a team of researchers from Yandex, HF and others have come up with a new method that lets machine learning models train over the internet in a better way. The new training algorithm is called Distributed Deep Learning in Open Collaborations (or DeDLOC)

About DeDLOC

Image credits: Paper by Diskin et al.,

Data parallelism in GPUs is a popular technique. DeDLOC tries to employ best of all parallelism attributes while tweaking the popular distributed training techniques. DeDLOC incorporates synchronous data-parallel training with fixed hyperparameters regardless of the number of volunteers. Training is done with extremely large batches to compensate for slow communication. According to the researchers, each device accumulates gradients at its own pace until the collaboration reaches the target batch size. Once ready, the collaborators exchange their gradients and perform one optimiser step. 

​​DeDLOC operates similarly to BitTorrent and I2P where individual peers coordinate by forming a Distributed Hash Table. To test DeDLOC’s performance, the researchers picked the sahajBERT language mode. The experiment had 40 volunteers,  30 of whom were Bengali-speaking. Volunteers were asked to open the provided notebook (Colab/Kaggle) locally and run one code cell and watch the training loss decrease on the shared dashboards. The cumulative runtime for the experiment was 234 days.

At the end of training, sahajBERT was compared with three other pretrained language models: XLM-R Large, IndicBert, and bnRoBERTa. The results showed that DeDLOC, when applied on pretraining sahajBERT  achieves nearly state-of-the-art quality with results comparable to much larger models that used hundreds of high-tier accelerators. This is the first distributed deep learning training at scale and the results are encouraging for individual researchers looking to take up expensive ML training tasks. “The community for any language can train their own models without the need for significant computational resources concentrated in one place,” wrote the HuggingFace team.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.