Data sharing is one of the major challenges in machine learning models. The advent of techniques like federated learning, differential privacy and split learning have addressed data silos, privacy and regulation issues in a big way.
In this article, we will look at split learning, a new technique developed at the MIT Media Lab that allows training of machine learning models without sharing any raw data. The technique solves challenges like data silos, data sharing, etc.
Sign up for your weekly dose of what's up in emerging technology.
Most importantly, Split Neural Networks (SplitNN) does not share raw data or model details with collaborating institutions. The configurations caters to practical settings of entities holding different modalities of patient data; centralised and local health entities collaborating on multiple tasks; and learning without sharing labels, the paper, Split learning for health: Distributed deep learning without sharing raw patient data, showed.
The researchers compared performance and resource efficiency trade-offs of split learning and other methods like federated learning and large batch synchronous stochastic gradient descent. The results showed the
How split learning works
SplitNN is a distributed and private deep learning technique to train deep neural networks over multiple data sources without the need to share raw labelled data directly. SplitNN addresses the problem of training a model over multiple data entities.
The model is split into multiple sections in split learning, each trained on a different client. For example, the data undergoing training might reside on a supercomputing resource or multiple clients participating in the collaborative training. However, none of the clients training the model can ‘see’ each other’s data.
Techniques are applied on the data, which encode data into a different space before sending it to train the model. As the model is split into multiple sections, and each of these sections is trained on a different client, the network’s training is carried over by transferring the weights of the last layer of each section to the adjacent (or next) section. Thus, only the weights of the last layer (also known as the cut layer) of each section are sent to the next client, and no raw data is shared among the clients.
As shown in the figure above, the training layer of SplitNN is marked by the green line, representing the cut layer. The top part of the model is trained on the server, and the bottom part of the model is trained on multiple clients.
These steps are continued until the distributed split learning network is trained without looking at each other’s raw data.
For instance, a split learning configuration allows local hospitals with smaller individual datasets to collaborate and build machine learning models that offer superior healthcare diagnostics without sharing any raw data.
Simple vanilla split learning
This is the simplest SplitNN configuration, as shown in figure (a). For instance, in this setting, each client (say, a radiology centre) trains a partial model up to a specific layer called the ‘cut layer.’ Then, the outputs at the cut layers are sent to a server which completes the rest of the training without seeing the raw data (example: radiology images) from clients.
This finishes a round of forward propagation without sharing raw data. The gradients are now back propagated at the server from its last layer until the cut layer. Finally, the gradients at the cut layers are sent back to radiology client centres.
The rest of the backpropagation is now completed at the radiology client centres. This process is continued until the SplitNN is trained without looking at each other’s raw data.
Split learning without label
As shown in the image above (figure (b)), the network is wrapped around the end layers of the server’s network, and the output is sent back to client entities. While the server still retains most of its layers, the clients generate the gradients from the end layers. This is later used for backpropagation without sharing the corresponding labels.
For instance, the labels include highly sensitive information like the status of patients. The setup is ideal for distributed deep learning.
Split learning for vertically partitioned data
This type of configuration allows for multiple institutions holding different modalities of patient data to learn distributed models without revealing or sharing the data. As shown in the image above (figure c), the configuration of SplitNN is suitable for multimodal multi-institutional collaboration.
For example, radiology centres want to collaborate with pathology test centres and a server for disease diagnosis. Therefore, the radiology centres holding imaging data modalities train a partial model up to the cut layer. Similarly, the pathology test centres having patient test results train a partial model up to its own cut layer.
Once this is done, the outputs at the cut layer from both these centres are then concatenated and sent to the disease diagnosis server that trains the rest of the model. These steps are repeated to train the distributed deep learning model without sharing each other’s raw data.
Simple configurations of distributed deep learning can’t handle various practical setups of collaboration across health entities. This is where SplitNN comes into play. Moreover, SplitNN is versatile, allowing for many plug-and-play configurations depending on the application. SplitNN is also scalable to large-scale settings. Also, the boundaries of resource efficiency can be pushed further in distributed deep learning by combining SplitNN with neural network compression methods for seamless distributed learning on edge.
This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.