Twitter has been plagued with a spam bot problem for the last four years and has significantly added to the fake news phenomena that many news organisations are trying to battle with. In other words, bots have become the go-to tool to crank out trending topics on Twitter and even drum up artificial likes. While Twitter strictly prohibits fake engagement on its platform, Twitter’s spam bot problem has seen a recent spike reportedly.
Over the last four years, the microblogging site has received a lot of flak for not doing enough to counter the fake followers that also undermined the American elections. But that hasn’t stopped Twitter from facing extensive and sustained criticism for not doing enough about its fake followers. According to a recent research, by University of Southern California and Indiana University, 15 percent of all Twitter accounts are fake. The last U.S. presidential election saw a massive surge in bots and fake accounts that pushed presidential election debate to the level of a trash talk. At the same time, another research from University of Southern California revealed that 19 percent of US election-related tweets were generated from bot accounts and trolling Trump followers did make up a sizable activity on Twitter.
While a news investigation believes Romanian spammer, Laurentiu Ciocoiu is carpet bombing the microblogging site with spam bots. His modus operandi is – setting up a network of thousands of fake Twitter accounts that follow few official high profile accounts. These official accounts are usually celebrities and leading news organizations. The click-bait policy leads users to spammy websites.
Can AI be used to spot fake bots with spam filtering?
In an attempt to clean up the platform, Twitter leveraged IBM’s cognitive computing technology Watson to counter abuse on its platform. According to a study by Karna Analytics, artificial intelligence and machine learning algorithms can be used effectively to keep a check on spammy Twitter bots. The firm leveraged its text analytics algorithm Semantic Similarity for clustering contextually similar tweets. Citing a real world example, the Gurgaon-headquartered company analysed more than 50,000 tweets for the hashtag #Presidentialle and #Jio and deployed the Semantic Similarity technique to identify clusters of users that post similar tweets at multiple times. The company states that by using their proprietary text analytics algorithm, one can cluster contextually similar tweets together and identify the users behind these handles. Here’s one thing the company surmised about spam bots – tweets by bots focus on a narrow topic while normal tweets are text heavy and diverse.
Over 100,000 people subscribe to our newsletter.
See stories of Analytics and AI in your inbox.
Here are some of the characteristics of spam bots
- Bot accounts tweet in stages, usually four tweets in a minute
- Tweets include trending hastags and fake accounts generally lack profile images
- Fake accounts have no or few followers and the accounts they follow also post the same message
Machine Learning classifiers bust spam bots
Baltimore-headquartered company ZeroFox seems to have the perfect formula for safeguarding security of companies across social, mobile and web that says social media accounts are open to several digital risks –customer spams, phishing, piracy, chat attacks, impersonation and even physical threats. And that’s why machine learning techniques. The company uses machine learning classifier that works by scanning a broad set of variables to determine which category an unseen piece of data ultimately falls into. Hence, a machine learning classifier trained on known spammy customer chat links can analyze based on these characteristics – such as IP location, URL, redirects, path and find out whether a link should be blacklisted. And with continuous classifier re-trainings, the machine learning algorithms can stay one step ahead in ruling out spammy bots.
How to get started with machine learning classifiers
- To train classifiers, one needs huge pre-labeled data sets and labeling data sets can be a laborious task
- If one doesn’t have access to pre-labeled dataset, the company suggest finding one that fits the need
Cyber security in the age of hacking
The last hacking incident at HBO raised several eyebrows when popular TV show Game of Thrones’ episode was leaked online. Not just that, the American network’s official social media accounts – Facebook and Twitter were also hacked. The social media account was hacked by a Saudi Arabian group that goes by the name of OurMine and has also reportedly hacked accounts of Google, FB and Twitter CEOs. Over the years, AI has been increasingly used to beef up defense and tech giants and cyber security companies are quickly putting network protection platforms together to safeguard their accounts from malware and hacking. Social media platforms are a goldmine for business and consumer data and data science and as a ZeroFox executive puts it, data science can be effectively used to weaponize social media.