BitTorrent For ML: A Novel Decentralised Way Of Using Supercomputers From Your Home

Training the popular GPT-3 from scratch can cost millions of dollars. OpenAI can afford it. But, what if an individual researcher wants to experiment on such a large scale? It is almost impossible to generate funds for toying with networks. 

To address this large gap between innovation and computation in ML, a team of researchers from Russia have introduced Learning@home — a neural network training paradigm that handles large amounts of poorly connected participants.

“Hypothetically, a researcher could crowdsource the training of large neural networks with thousands of regular PCs provided by volunteers. The raw computing power of a hundred thousand $2500 desktops dwarfs that of a $250M server pod,” wrote the researchers.

Recently, they have released a library called hivemind that incorporated a layer of the decentralised mixture of experts (DMoE).

Overview Of DMOE

DMOE inference illustration. (Source: Photo via Paper by Maksim et al.,)

Crowdsourced computation is not a new idea. But, to pull this off, a foolproof system is essential. Learning@home, with their new library, hivemind tries to achieve this. This novel framework and libraries can allow the researchers to crowdsource the computation from volunteers with regular PCs as the combined floating-point performance of such projects are on par with larger supercomputers. 

The challenge in these projects is to figure out a way to utilise crowd power efficiently. Consumer-grade PCs are slower. They are prone to failures. So, instead of adopting the existing distributed training strategies, the authors in their work on DMoE identified the advantages of volunteer computing and designed a new strategy that capitalises on them.

Decentralised Mixture-of-Experts (DMoE) is a layer that contains multiple independent “expert” sub-networks distributed over a pool of workers. It is designed to process any input type by using the appropriate experts (convolutional or attentive).

To measure the effectiveness, the researchers simulated a distributed training environment, using a large number of identical blocks distributed evenly across 4 NVIDIA GTX 1080 GPUs. Network latency is simulated by adding an artificial delay after computation of each block. 

According to the researchers, the main positive outcome is to let researchers harness volunteer computing and train models on the scale currently available only to large corporations. 

About Hivemind Library

Hivemind is a library for decentralised training of large neural networks. In a nutshell, you want to train a neural network, but all you have is a bunch of enthusiasts with unreliable computers that communicate over the internet. Any peer may fail or leave at any time, but the training must continue. To meet this objective, hivemind models use a specialised layer type: the Decentralised Mixture of Experts (DMoE). 

Hivemind is designed for those who want to:

  • run crowdsourced deep learning using compute from volunteers or decentralised participants;
  • train neural networks on multiple servers with varying compute, bandwidth and reliability;
  • [to be announced] join a worldwide open deep learning experiment.

That said, Learning@home discourages the use of hivemind library for splitting models between 2-3 servers, distributed training for a reliable, uniform and highly connected cluster, and training small models dynamically allocated to in-house workers.

Volunteer computing is driven by societal impact more often than not. It is easier to convince people to share their PCs for solving pandemic problems than for building a deep learning application that adds animal filters to images. 

“Volunteer computing is biased towards exciting or socially relevant research in the same way as traditional HPC is biased towards the interests of those who fund it,” wrote the researchers. They also warn that due to decentralised nature even legitimate Learning@home projects can be hijacked by hackers.

Hivemind v0.8 is in the early alpha stage: the core functionality to train decentralised models is there, but the interface is still in active development.

Check the quickstart tutorial.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.
Vijaysinh Lendave
Complete Tutorial on Text Preprocessing in NLP

In any data science project life cycle, cleaning and preprocessing data is the most important performance aspect. Say if you are dealing with unstructured text data, which is complex among all the data, and you carried the same for modeling two things will happen. Either you come up with a big error, or your model will not perform as you expected.

Vijaysinh Lendave
Complete Tutorial on Linear And Non-Linear Filters using OpenCV

Initially developed by Intel, OpenCV is an open-source computer vision cross-platform library for real-time image processing and which has become a standard tool for all things related to computer vision applications. In 2000, the first version of OpenCV was released; since then, its functionality has been very much enriched and simplified by the scientific community. Later in 2012, a nonprofit foundation took the initiative for maintaining a support site for developers and users.  

Yugesh Verma
Hands-On Tutorial on Visualizing Spectrograms in Python

For visualising signals into an image, we use a spectrogram that plots the time in the x-axis and frequency in the y-axis and, for more detailed information, amplitude in the z-axis. Also, it can be on different colors where the density of colors can be considered the signal’s strength. Finally, it gives you an overview of the signal where it explains how the strength of the signal is

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM