Active Hackathon

BitTorrent For ML: A Novel Decentralised Way Of Using Supercomputers From Your Home

Training the popular GPT-3 from scratch can cost millions of dollars. OpenAI can afford it. But, what if an individual researcher wants to experiment on such a large scale? It is almost impossible to generate funds for toying with networks. 

To address this large gap between innovation and computation in ML, a team of researchers from Russia have introduced Learning@home — a neural network training paradigm that handles large amounts of poorly connected participants.


Sign up for your weekly dose of what's up in emerging technology.

“Hypothetically, a researcher could crowdsource the training of large neural networks with thousands of regular PCs provided by volunteers. The raw computing power of a hundred thousand $2500 desktops dwarfs that of a $250M server pod,” wrote the researchers.

Recently, they have released a library called hivemind that incorporated a layer of the decentralised mixture of experts (DMoE).

Overview Of DMOE

DMOE inference illustration. (Source: Photo via Paper by Maksim et al.,)

Crowdsourced computation is not a new idea. But, to pull this off, a foolproof system is essential. Learning@home, with their new library, hivemind tries to achieve this. This novel framework and libraries can allow the researchers to crowdsource the computation from volunteers with regular PCs as the combined floating-point performance of such projects are on par with larger supercomputers. 

The challenge in these projects is to figure out a way to utilise crowd power efficiently. Consumer-grade PCs are slower. They are prone to failures. So, instead of adopting the existing distributed training strategies, the authors in their work on DMoE identified the advantages of volunteer computing and designed a new strategy that capitalises on them.

Decentralised Mixture-of-Experts (DMoE) is a layer that contains multiple independent “expert” sub-networks distributed over a pool of workers. It is designed to process any input type by using the appropriate experts (convolutional or attentive).

To measure the effectiveness, the researchers simulated a distributed training environment, using a large number of identical blocks distributed evenly across 4 NVIDIA GTX 1080 GPUs. Network latency is simulated by adding an artificial delay after computation of each block. 

According to the researchers, the main positive outcome is to let researchers harness volunteer computing and train models on the scale currently available only to large corporations. 

About Hivemind Library

Hivemind is a library for decentralised training of large neural networks. In a nutshell, you want to train a neural network, but all you have is a bunch of enthusiasts with unreliable computers that communicate over the internet. Any peer may fail or leave at any time, but the training must continue. To meet this objective, hivemind models use a specialised layer type: the Decentralised Mixture of Experts (DMoE). 

Hivemind is designed for those who want to:

  • run crowdsourced deep learning using compute from volunteers or decentralised participants;
  • train neural networks on multiple servers with varying compute, bandwidth and reliability;
  • [to be announced] join a worldwide open deep learning experiment.

That said, Learning@home discourages the use of hivemind library for splitting models between 2-3 servers, distributed training for a reliable, uniform and highly connected cluster, and training small models dynamically allocated to in-house workers.

Volunteer computing is driven by societal impact more often than not. It is easier to convince people to share their PCs for solving pandemic problems than for building a deep learning application that adds animal filters to images. 

“Volunteer computing is biased towards exciting or socially relevant research in the same way as traditional HPC is biased towards the interests of those who fund it,” wrote the researchers. They also warn that due to decentralised nature even legitimate Learning@home projects can be hijacked by hackers.

Hivemind v0.8 is in the early alpha stage: the core functionality to train decentralised models is there, but the interface is still in active development.

Check the quickstart tutorial.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter