LinkedIn Open-Sources GDMix, A Framework That Trains Efficient Personalisation Models

Recently, developers at LinkedIn open-sourced a deep learning framework known as GDMix. GDMix or Generalised Deep Mixed model is a deep ranking framework to train non-linear fixed effect and random effect models. According to the developers, this type of models is widely used in the personalisation of search as well as recommender systems.   

With more than 700 million members, billions of feed updates, and more than thousands of courses to choose from, the professional networking platform is heavily dependent on AI and machine learning techniques. Personalised ranking for search and recommender systems is one of the key technologies to achieve the goal of the best experience possible for the members in LinkedIn.

A fully personalised ranking algorithm includes features like request features, document features, context features and interactive features including a large number of categorical ID features. However, it is most often difficult to train models of this size efficiently.


Sign up for your weekly dose of what's up in emerging technology.

According to the developers, training such models may require tools like specialised processors, very large system memory, ultra-fast network connections, among others. To solve this issue, the developers introduced the GDMix framework, which will train these kinds of models efficiently while consuming less time.

Behind GDMix

The GDMix framework works by breaking down a large model into a global model, also known as a fixed effect and a large number of small models called random effects. After breaking the large model, it solves the problem individually. This means that the framework follows the divide-and-conquer approach that allows efficient training of large personalisation models with commodity hardware.

Download our Mobile App

According to the professional networking platform, the GDMix project is an extension of an early effort on generalised linear models, known as Photon ML, a machine learning library based on Apache Spark. The GDMix framework is an improvement of the Photon-ML library as it expands support for deep learning models. The framework can be easily applied to a variety of search as well as recommendation tasks.

Currently, GDMix supports three different operation modes, which are-

  1. Fixed effect model: Logistic Regression; Random effect model: Logistic Regression
  2. Fixed effect model: Deep NLP models supported by DeText; Random effect model: Logistic Regression.
  3. Fixed effect model: Arbitrary model provided by a user; Random effect model: Logistic Regression.

GDMix offers an efficient solution to train a model by taking a parallel blockwise coordinate descent approach. It supports both the per-entity random effects and training per-cohort random effects. 

GDMix expands the modelling capacity to include deep learning models. Particularly, GDMix leverages DeText, which is a deep learning ranking framework for text understanding, as its native deep learning model trainer. The framework is implemented in Tensorflow, Scipy and Spark. 

GDMix has a mixed implementation of Python and Scala, which is used for training models and processing intermediate data on Spark. The framework requires Python version 3.3+ and Apache Spark version 2.0+.

Key Features

There are three key features of the GDMix model. They are mentioned below-

Model Scalability

GDMix works by splitting the model into a fixed effect and many random effects. This split helps a developer to train models with hundreds of millions of entities and tens of billions of parameters.

Model Flexibility

Both the fixed effect and random effects of GDMix are designed to support various model types. The fixed effect supports linear models as well as deep learning models, while the random effect natively supports linear models. In GDMix, it is easier to add custom models, such as support vector machines (SVM), decision trees, and gradient boosting algorithms.

Training Efficiency

The GDMix is designed to train large models in a faster and efficient manner. With the large-scale parallelism, the framework consumes less than an hour to train models that include millions of entities and billions of parameters. 

Wrapping Up

The current version of GDMix supports logistic regression and DeText models for the fixed effect, then logistic regression for the random effects. The developers also mentioned that in the coming years, GDMix might support deep models for random effects if the increasing complexity can be justified by the improvement in relevance metrics. 

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox