MITB Banner

LinkedIn Open-Sources GDMix, A Framework That Trains Efficient Personalisation Models

Share

Recently, developers at LinkedIn open-sourced a deep learning framework known as GDMix. GDMix or Generalised Deep Mixed model is a deep ranking framework to train non-linear fixed effect and random effect models. According to the developers, this type of models is widely used in the personalisation of search as well as recommender systems.   

With more than 700 million members, billions of feed updates, and more than thousands of courses to choose from, the professional networking platform is heavily dependent on AI and machine learning techniques. Personalised ranking for search and recommender systems is one of the key technologies to achieve the goal of the best experience possible for the members in LinkedIn.

A fully personalised ranking algorithm includes features like request features, document features, context features and interactive features including a large number of categorical ID features. However, it is most often difficult to train models of this size efficiently.

According to the developers, training such models may require tools like specialised processors, very large system memory, ultra-fast network connections, among others. To solve this issue, the developers introduced the GDMix framework, which will train these kinds of models efficiently while consuming less time.

Behind GDMix

The GDMix framework works by breaking down a large model into a global model, also known as a fixed effect and a large number of small models called random effects. After breaking the large model, it solves the problem individually. This means that the framework follows the divide-and-conquer approach that allows efficient training of large personalisation models with commodity hardware.

According to the professional networking platform, the GDMix project is an extension of an early effort on generalised linear models, known as Photon ML, a machine learning library based on Apache Spark. The GDMix framework is an improvement of the Photon-ML library as it expands support for deep learning models. The framework can be easily applied to a variety of search as well as recommendation tasks.

Currently, GDMix supports three different operation modes, which are-

  1. Fixed effect model: Logistic Regression; Random effect model: Logistic Regression
  2. Fixed effect model: Deep NLP models supported by DeText; Random effect model: Logistic Regression.
  3. Fixed effect model: Arbitrary model provided by a user; Random effect model: Logistic Regression.

GDMix offers an efficient solution to train a model by taking a parallel blockwise coordinate descent approach. It supports both the per-entity random effects and training per-cohort random effects. 

GDMix expands the modelling capacity to include deep learning models. Particularly, GDMix leverages DeText, which is a deep learning ranking framework for text understanding, as its native deep learning model trainer. The framework is implemented in Tensorflow, Scipy and Spark. 

GDMix has a mixed implementation of Python and Scala, which is used for training models and processing intermediate data on Spark. The framework requires Python version 3.3+ and Apache Spark version 2.0+.

Key Features

There are three key features of the GDMix model. They are mentioned below-

Model Scalability

GDMix works by splitting the model into a fixed effect and many random effects. This split helps a developer to train models with hundreds of millions of entities and tens of billions of parameters.

Model Flexibility

Both the fixed effect and random effects of GDMix are designed to support various model types. The fixed effect supports linear models as well as deep learning models, while the random effect natively supports linear models. In GDMix, it is easier to add custom models, such as support vector machines (SVM), decision trees, and gradient boosting algorithms.

Training Efficiency

The GDMix is designed to train large models in a faster and efficient manner. With the large-scale parallelism, the framework consumes less than an hour to train models that include millions of entities and billions of parameters. 

Wrapping Up

The current version of GDMix supports logistic regression and DeText models for the fixed effect, then logistic regression for the random effects. The developers also mentioned that in the coming years, GDMix might support deep models for random effects if the increasing complexity can be justified by the improvement in relevance metrics. 

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.