LinkedIn has recently announced the open-sourcing Dagli, a machine learning library for Java and other JVM languages. This open-source machine learning library will ostensibly make it easier for developers to create bug-resistant, easily readable, modifiable, maintainable, as well as deployable model pipelines without incurring technical debt.
According to the data report, as the industry of machine learning matures and increases with innovative applications, the majority of companies, approximately 50% spend between 8 and 90 days deploying a single machine learning model — with 18% taking longer than 90 days. A lot of this could be attributed to the inability to scale, along with the challenges that come with model reproducibility, and the lack of executive buy-in, and poor tooling.
With this open source machine learning library, the model pipeline is defined as a directed acyclic graph, consisting of vertices and edges, stated in the news media. These vertices and edges are directed from one vertex to another for training and inference, stated in the news media. The environment of open source Dagli provides developers with the pipeline definitions, near-ubiquitous immutability and static typing.
When asked Jeff Pasternack, the LinkedIn NLP research scientist, he wrote in a blog post that models are traditionally part of an integrated pipeline, and therefore the constructing, training, and deploying these pipelines to production remains a challenging task. “Duplicated or extraneous work is often required to accommodate both training and inference, engendering brittle ‘glue’ code that complicates future evolution and maintenance of the model,” stated Pasternack.
The machine learning library — Dagli works on servers, Hadoop, command-line interfaces, IDEs, and other typical JVM contexts. It also comes with plenty of pipeline components that are built-in for ready to use, including neural networks, gradient boosted decision trees, logistic regression, FastText, cross-validation, feature selection, cross-training, data readers, evaluation, and feature transformations.
For professionals and experienced data scientists, Dagli offers a path to create production-ready AI models that are maintainable and extensible in the long term, and also can leverage an existing JVM technology stack. However, on the other hand, for less experienced software engineers, this machine learning library provides an API that can be used to avoid typical logic bugs, when used with a JVM language and tooling.
According to Pasternack, Dagli is created to make efficient, production-ready models that are easier to write, revise, and deploy. Further it will also avoid the technical debt and long-term maintenance challenges. Dagli, further, leverages modern, highly multicore processors and powerful graphics cards for effective single-machine training of real-world models.
The launch of Dagli comes after LinkedIn made available the LinkedIn Fairness Toolkit (LiFT), which is an open-source software library designed to enable the measurement of fairness in AI and machine learning workflows. LinkedIn also debuted DeText, an open-source framework for NLP-related ranking, language generation tasks as well as classification task. It leverages semantic matching, using deep neural networks to understand member intents in search and recommender systems.