Latest technologies such as machine learning and deep learning require a colossal amount of data to improve its outcomes’ accuracy. However, it is nearly impossible for a local computer to process the vast amount of data. As a result, practitioners use distributed computing for obtaining high-computational power to deliver quick and accurate results.
However, effectively managing distributed computation is not straightforward, and this causes hindrance in training and evaluating AI models. To address these challenges, Uber has open-sourced its Fiber framework to help researchers and developers streamline their large-scale parallel scientific computation.
Fiber is a Python-based distributed computing framework for modern computer clusters. With Fiber, users are not limited to programming only on desktop or laptop, but the whole computer cluster. Initially, Uber built Fiber to support complex projects like POET and similar projects that required distributed computing, but today, it has open-sourced the framework for the larger community.
Key features of Fiber:
- Easy to use: Leveraging the framework, one can write programs that run on the computer cluster, without the requirement for deep-dive into details of the computer cluster.
- Easy to learn: If one is familiar with standard multiprocessing Python’s API, then they will not require any other expertise to work with Fiber.
- Fast performance: For a quick and reliable connection, Fiber’s communication backbone is built on Nanomsg, which is a high-performance asynchronous messaging library.
- No need for deployment: You can run Fiber applications the same way as running a typical software on a computer cluster, and Fiber handles the rest for you.
- Reliable computation: Fiber also has a built-in error handling function that helps users run a pool of processes. This allows them to focus on writing the actual application code, instead of dealing with crashed workers.
Fiber can also work in tandem with specialized frameworks in areas where performance is critical. To accomplish this, you need to use Fiber’s Ring feature, which assists in setting up a distributed training job on computer clusters.
Fiber is developed in a way that retains flexibility, such that it can support different backends working on various cluster management systems. For this, the Fiber is divided into several layers, such as API layer, backend layer and cluster layer. While the API layer acts as fundamental blocks for Fiber-like processes, queues, pools, and managers, the backend layer handles tasks like creating or terminating jobs on various cluster managers. The cluster layer consists of different cluster managers that assist in effectively managing resources and tracking different jobs.
How Is It Different?
Unlike other distributed machine learning tools, Fiber introduces a new concept called ‘job-backed processes’ or ‘Fiber process’. Although it is similar to Python’s multiprocessing library, Fiber comes with more flexibility – apart from running locally, it can also execute remotely on different machines. It is because every job-backed process is containerized and has its allocation of CPU, GPU, among other resources. Besides, codes are self-contained as all the child processes are started with the same container image as the parent process to guarantee a consistent running environment without relying on other activities.
Furthermore, unlike Spark and IPyParallel, Fiber only needs to be installed on a single machine as a standard Python pip package, thereby simplifying the workflows and at the same time, giving more control.
Queues and pipes: The library behaves like other multiprocessing APIs, but the difference is that they are now shared by multiple activities running on different machines. Therefore, two different procedures can read and write from the same pipe. Besides, each process can send to or receive from the same queue at the same time.
Pools, Managers and Proxy Objects: Fiber has extended the pools to work with job-backed processes to manage thousands of remote work. Besides, Fiber through Managers and Proxy Objects provides built-in-memory storage for applications. This was earlier carried out with external storage like Cassandra, Redis, and more.
Fiber Rings: It is a concept where all the processes work collectively as relative equals. Thus, unlike Pool, Ring does not have the idea of a master and worker processes. Fiber Ring also helps in setting up a topology, which is common in machine learning practices while carrying out distributed SGD. Usually, this is a challenging task, but Ring simplifies it as it does all the heavy lifting.
Since large-scale solutions are highly reliant on clusters, Fiber can assist users in achieving many goals with heterogeneous computing hardware, while ensuring the resources are used effectively. Fiber works almost like other frameworks, but has unique advantages that can be a game-changer for developers to simplify their workloads while developing AI-based solutions.
It promises to bridge the gap between making code work locally and running it on a production cluster. The ability to add new dependencies to the code without the need for re-deployment might make Fiber standout against popular tools like Spark.
Also Read: Top 10 Books For Learning Apache Spark