How Predictive Analytics is used to Win Elections

The use of predictive analytics in election campaigns is increasing by the day. Due to the huge SBD available to political parties, data is now easier to access than ever before 
Listen to this story

The 2019 Lok Sabha election was a game-changer for the BJP. The Saffron party received 37.36% of the total vote share – the highest first-ever for a political party since 1989. In 2014 too, BJP had swept 31% of the vote share. What lies behind this winning streak? A mix of the ‘Modi wave’, political strategising, big money and Big Data.

Former US President Barack Obama was the first to use data analytics on a massive scale in his election campaign. He used a computer program called Project Narwhal, which had features like rapid iteration, minimal barriers between developers and operations staff, heavy use of cloud technology, and constant testing. With the help of such tools, he could interact with his voter base on Reddit, place ad campaigns on unconventional media and gauge the attitude and movements of his voter base. 


Sign up for your weekly dose of what's up in emerging technology.

In India, BJP relied heavily on Big Data and political analysts, using these to maximise numbers. 

Seven months before the 2014 general election, Modi had made a controversial statement – “build toilets before temples”. The BJP IT team noticed that 45% of the internet population agreed with the statement. This was the same group that fell into their prospective voter base. With the help of the data, the communication team of the BJP was able to convert that statement to one of the most used slogans India would see in the next decade: “Swachh Bharat”, which was received wholeheartedly by 68% of respondents. It was all possible due to the use of predictive analytics.

Predictive analytics is a tool that political parties occasionally employ to evaluate their potential voter base. In essence, it combines mathematical and statistical tools to create a model that predicts the future, based on past trends.

Predictive Analytics in political campaigns finds its base in SBD (Social Big Data). It can be used for voter modelling and personalisation, spam and social influence prediction, content segmentation and classification, and voter engagement. It is a well-known fact that Cambridge Analytica used more than 87 million Facebook user profile data to provide analytics assistance to competing political parties in the 2016 US election. 

Twitter, too, is used to get insight into public opinions. With 500 million daily tweets and 200 billion tweets a year, Twitter is one of the most sought-after tools for data collection by the political parties around the globe. Unlike other social media websites, almost all activities of users are public on this platform. Twitter also aids data collection as its API is easily accessible to the users.

The “user API” approach is one of the ways to gather data. It is carried out in two stages, the first involves mining the voter’s historical data while the second is gathering the voter’s current data. It is clear that previous data is used to forecast the political inclinations of users as well as their future ideologies.

Pre-processing procedures like data cleaning and quality improvement are carried out before the dataset is entered into the prediction module. Only after that are a person’s political preferences assessed. The two criteria used to quantify it are continuity and knowledgeability.

The number of political entities that can be identified from a user’s tweets during a specific time period is measured as continuity. Knowledgeability describes how familiar a user is with politics and whether they have done any work or research in the area. Political entities annotated from the user’s profile and tweets are the main factors in determining the knowledgeability of a user. This is done by running certain commands, like in the following table:

                                                        Source: ResearchGate

The list of political entities is also counted regularly, mainly because of interest changes. A person might get influenced by another party and change his ideology or their interest in politics may last only for the election season. 

The collected and cleansed data is divided into two groups: 1) On-topic users, who are interested in politics and 2) Off-topic users, who show minimal or no interest in it. Shortlisted voters from both groups are then used to develop a predictive model, which predicts a particular voter’s inclination for a political ideology. 

More Great AIM Stories

Lokesh Choudhary
Lokesh enjoys reading a lot and views himself as an armchair technology journalist. He enjoys sharing tales involving technology. His background in linguistics as a subject of the study did not prevent him from investigating the subjects of AI and Data Science. His email address is

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM