Training the popular GPT-3 from scratch can cost millions of dollars. OpenAI can afford it. But, what if an individual researcher wants to experiment on such a large scale? It is almost impossible to generate funds for toying with networks.
To address this large gap between innovation and computation in ML, a team of researchers from Russia have introduced [email protected] — a neural network training paradigm that handles large amounts of poorly connected participants.
“Hypothetically, a researcher could crowdsource the training of large neural networks with thousands of regular PCs provided by volunteers. The raw computing power of a hundred thousand $2500 desktops dwarfs that of a $250M server pod,” wrote the researchers.
Recently, they have released a library called hivemind that incorporated a layer of the decentralised mixture of experts (DMoE).
Overview Of DMOE
Crowdsourced computation is not a new idea. But, to pull this off, a foolproof system is essential. [email protected], with their new library, hivemind tries to achieve this. This novel framework and libraries can allow the researchers to crowdsource the computation from volunteers with regular PCs as the combined floating-point performance of such projects are on par with larger supercomputers.
The challenge in these projects is to figure out a way to utilise crowd power efficiently. Consumer-grade PCs are slower. They are prone to failures. So, instead of adopting the existing distributed training strategies, the authors in their work on DMoE identified the advantages of volunteer computing and designed a new strategy that capitalises on them.
Decentralised Mixture-of-Experts (DMoE) is a layer that contains multiple independent “expert” sub-networks distributed over a pool of workers. It is designed to process any input type by using the appropriate experts (convolutional or attentive).
To measure the effectiveness, the researchers simulated a distributed training environment, using a large number of identical blocks distributed evenly across 4 NVIDIA GTX 1080 GPUs. Network latency is simulated by adding an artificial delay after computation of each block.
According to the researchers, the main positive outcome is to let researchers harness volunteer computing and train models on the scale currently available only to large corporations.
About Hivemind Library
Hivemind is a library for decentralised training of large neural networks. In a nutshell, you want to train a neural network, but all you have is a bunch of enthusiasts with unreliable computers that communicate over the internet. Any peer may fail or leave at any time, but the training must continue. To meet this objective, hivemind models use a specialised layer type: the Decentralised Mixture of Experts (DMoE).
Hivemind is designed for those who want to:
- run crowdsourced deep learning using compute from volunteers or decentralised participants;
- train neural networks on multiple servers with varying compute, bandwidth and reliability;
- [to be announced] join a worldwide open deep learning experiment.
That said, [email protected] discourages the use of hivemind library for splitting models between 2-3 servers, distributed training for a reliable, uniform and highly connected cluster, and training small models dynamically allocated to in-house workers.
Volunteer computing is driven by societal impact more often than not. It is easier to convince people to share their PCs for solving pandemic problems than for building a deep learning application that adds animal filters to images.
“Volunteer computing is biased towards exciting or socially relevant research in the same way as traditional HPC is biased towards the interests of those who fund it,” wrote the researchers. They also warn that due to decentralised nature even legitimate [email protected] projects can be hijacked by hackers.
Hivemind v0.8 is in the early alpha stage: the core functionality to train decentralised models is there, but the interface is still in active development.
Check the quickstart tutorial.