The traditional way of using integrated tools for data mining and research analysis is no longer practical since the data is too large to manage. In recent times, distributed and federated ML are being favoured approaches as they allow for larger data analysis. While the two concepts appear similar, there is a considerable difference between the two. In this concept we explore how these two approaches are different from each other.
Distributed machine learning
Distributed machine learning is a multi-node ML system that improves performance, increases accuracy, and scales to larger input data sizes. It reduces errors made by the machine and assists individuals to make informed decisions and analyses from large amounts of data. Distributed machine learning algorithms have evolved to handle enormous data sets.
It is a challenge to handle large-scale data due to the limitations of machine learning algorithms in terms of scalability and efficiency. For example, when the algorithm’s computational complexity outpaces the main memory, the algorithm will not scale well due to memory restrictions. Enter distributed machine learning algorithms. It can handle large data sets and develop efficient and scalable algorithms.
Distributed ML algorithms are integral to large-scale learning because of their ability to allocate learning processes onto several workstations to enable faster learning algorithms.
Some of the most common sectors for deploying distributed ML algorithms are healthcare or advertising; a simple application generates a lot of data here. Since data is enormous, programmers frequently re-train data not to interrupt the workflow and use parallel loading. For example, the programming model, MapReduce was built to allow automatic parallelisation and distribution of large-scale computations.
Federated machine learning
The traditional AI algorithms require centralising data on a single machine or a server. The limitation of this approach is that all the data collected is sent back to the central server for processing before sending it back to the devices. The whole process limits a model’s ability to learn in real0-time.
Federated Learning is a centralised server first approach. It is a distributed ML approach where multiple users collaboratively train a model. The concept of federated learning was first introduced in Google AI’s 2017 blog. Here, the raw data is distributed without being moved to a single server or data centre. It selects a few nodes and sends the initialised version containing model parameters of an ML model to all the nodes. Each node now executes the model, trains the model on their local data, and has a local version of the model at each node.
Federated Learning leverages techniques from multiple research areas such as distributed systems, machine learning, and privacy. FL is best applied in situations where the on-device data is more relevant than the data that exists on servers.
Federated learning provides edge devices with state of the art ML without centralising the data and privacy by default. Thus it handles the unbalanced and non-Independent and Identically Distributed (IID) data of the features in mobile devices. A lot of data is generated from smartphones that can be used locally at the edge with on-device inference. Since the server does not need to be in the loop for every interaction with the locally generated data, this enables fast working with battery saving and better data privacy.
For instance, Google’s Gboard aims to be the most privacy forward keyboard by using an on-device cache of local interactions. This data is used for federated learning and computation.
Federated ML vs distributed ML
Federated Learning and Distributed Learning differ in three significant ways:
- FL does not allow direct raw data communication. DL does not have any such restriction.
- FL employs the distributed computing resources in multiple regions or organisations. DL utilises a single server or a cluster in a single region, which belongs to a single organisation.
- FL generally takes advantage of encryption or other defence techniques to ensure data privacy or security. FL promises to safeguard the confidentiality and security of the raw data. There is less focus on safety in DL.
- Federated Learning leverages techniques from multiple research areas such as distributed systems, machine learning, and privacy. One can say that federated learning is an improvement on distributed learning system.