Listen to this story
The 2019 Lok Sabha election was a game-changer for the BJP. The Saffron party received 37.36% of the total vote share – the highest first-ever for a political party since 1989. In 2014 too, BJP had swept 31% of the vote share. What lies behind this winning streak? A mix of the ‘Modi wave’, political strategising, big money and Big Data.
Former US President Barack Obama was the first to use data analytics on a massive scale in his election campaign. He used a computer program called Project Narwhal, which had features like rapid iteration, minimal barriers between developers and operations staff, heavy use of cloud technology, and constant testing. With the help of such tools, he could interact with his voter base on Reddit, place ad campaigns on unconventional media and gauge the attitude and movements of his voter base.
In India, BJP relied heavily on Big Data and political analysts, using these to maximise numbers.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Seven months before the 2014 general election, Modi had made a controversial statement – “build toilets before temples”. The BJP IT team noticed that 45% of the internet population agreed with the statement. This was the same group that fell into their prospective voter base. With the help of the data, the communication team of the BJP was able to convert that statement to one of the most used slogans India would see in the next decade: “Swachh Bharat”, which was received wholeheartedly by 68% of respondents. It was all possible due to the use of predictive analytics.
Predictive analytics is a tool that political parties occasionally employ to evaluate their potential voter base. In essence, it combines mathematical and statistical tools to create a model that predicts the future, based on past trends.
Predictive Analytics in political campaigns finds its base in SBD (Social Big Data). It can be used for voter modelling and personalisation, spam and social influence prediction, content segmentation and classification, and voter engagement. It is a well-known fact that Cambridge Analytica used more than 87 million Facebook user profile data to provide analytics assistance to competing political parties in the 2016 US election.
Twitter, too, is used to get insight into public opinions. With 500 million daily tweets and 200 billion tweets a year, Twitter is one of the most sought-after tools for data collection by the political parties around the globe. Unlike other social media websites, almost all activities of users are public on this platform. Twitter also aids data collection as its API is easily accessible to the users.
The “user API” approach is one of the ways to gather data. It is carried out in two stages, the first involves mining the voter’s historical data while the second is gathering the voter’s current data. It is clear that previous data is used to forecast the political inclinations of users as well as their future ideologies.
Pre-processing procedures like data cleaning and quality improvement are carried out before the dataset is entered into the prediction module. Only after that are a person’s political preferences assessed. The two criteria used to quantify it are continuity and knowledgeability.
The number of political entities that can be identified from a user’s tweets during a specific time period is measured as continuity. Knowledgeability describes how familiar a user is with politics and whether they have done any work or research in the area. Political entities annotated from the user’s profile and tweets are the main factors in determining the knowledgeability of a user. This is done by running certain commands, like in the following table:
The list of political entities is also counted regularly, mainly because of interest changes. A person might get influenced by another party and change his ideology or their interest in politics may last only for the election season.
The collected and cleansed data is divided into two groups: 1) On-topic users, who are interested in politics and 2) Off-topic users, who show minimal or no interest in it. Shortlisted voters from both groups are then used to develop a predictive model, which predicts a particular voter’s inclination for a political ideology.