MITB Banner

How LinkedIn Leverages DL To Detect Abusive Accounts

The model looks for signals of bot-like activity and classifies sequences of user behaviour as automated.

Share

LinkedIn

Organisations are adopting emerging machine learning technologies at breakneck speed, reducing human labour and dealing with data efficiently. LinkedIn is the largest professional and employment-oriented service platform, and for years, has been leveraging AI/ML to optimise different processings on the forum. As of March 2021, LinkedIn boasts over 740 million users in more than 200 countries and territories across the globe.  

LinkedIn’s Anti-Abuse AI Team works to create, deploy and maintain AI models to detect and prevent abuse on the platform. Platforms like LinkedIn are prone to abuses like the creation of fake accounts, member profile scraping, automated spam, and account takeovers.

Challenges

The team had to overcome three challenges:

  1. With attackers quickly adapting and evolving against the anti-abuse defence, there is a need to update LinkedIn’s adversarial behaviour tools constantly.
  2. This also transcends several heterogeneous parts of the website that need to be protected from attackers.
  3. Keep in mind the need to maximise signals since standard features do not entirely leverage the available signal in member activity patterns.

The team has created a DL model operating directly on raw sequences of member activity to overcome these challenges. In addition, the model leverages the available signal hidden in the data to prevent adversarial attacks. 

Logged in Accounts

The model was used to detect logged-in accounts scraping member profile data. Scraping is not destructive all the time search engines are authorised to scrape to collect and index information throughout the internet. Still, when done without permission, it is a nefarious practice. 

Unauthorised scrapers automate logged-in LinkedIn accounts; meaning, scraping information that is viewable when logged into a member account. The model looks for signals of bot-like activity and classifies sequences of user behaviour as automated. The team also leverages outlier detection to detect non-human activity. 

Activity Sequence Modelling Technique

Activity sequence modelling is a standardised dataset encapsulating the sequence of member requests on LinkedIn. These are overarchingly member activity patterns – “As a member visits LinkedIn, the member’s web browser makes many requests to LinkedIn’s servers; every request includes a path identifying the part of the site the member’s browser intends to access,” as explained by LinkedIn’s blog post. The sequence can be thought of as a “sentence” describing the member’s LinkedIn activity.

Source: LinkedIn

An illustration of LinkedIn’s arrangement of member requests in a sequence including information about the type of request, the order of requests, and the timing between requests. 

Standardise request paths translate specific request paths into a standardised token indicating the type of request. For instance, a profile view is illustrated as linkedin.com/in/jamesverbus/. 

The integer array automated process maps the standardised request paths to integers based on the frequency of that request path to help understand how common that specific type of request is for a given user. These requests are colour coded into the activity sequence, depending on the homogeneity, making it easier for the human eye to identify abusive activities. 

Source: LinkedIn

Comparison of 200 requests made by a non abusive member and an abusive member. The colours represent the recurrent nature of a specific request.

NLP techniques help to classify the sequences by replacing member requests and user actions as tokens to create the sequence and further classify them as abusive or not abusive. After processing the request path sequence data, the team leverages a supervised long short-term memory (LSTM) model to produce abuse scores. 

These are based on the sequence of the time difference between consecutive requests. LinkedIn’s policies state – “If we receive an abnormally high number of page requests or detect patterns that indicate the use of an automated tool, we may suspend or restrict that account.”

The last step before behaviour correction is arranging the training labels based on the type of abuse to be detected. An unsupervised outlier-detection based on LinkedIn’s isolation forest library generates the tags used to train the model. 

Isolation Forest Library

The library is an unsupervised outlier detection tool because outliers are “few and different” and thus are easier to isolate in leaf nodes and require fewer random splits. Thus, they can be used to randomly generate binary tree structures to non-parametrically capture the multi-dimensional feature distribution of the training dataset. This results in a shorter expected path length from the root node to the leaf node for outlier data points. As a result, isolation forests are a top-performing unsupervised outlier detection algorithm.

Source: LinkedIn

Example of an isolation tree

The activity sequence modelling technology helps to tackle anti-abuse issues by detecting abusive behaviour, preventing adversarial attackers, and providing a modelling approach that is generalisable and scalable to various attack surfaces. 

Share
Picture of Avi Gopani

Avi Gopani

Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.