- This is one of the top voted thesis papers from upGrad's online working professional programs in partnership with one of the UK's leading universities.
Fake news is an infodemic, a disease worse than anything else we’ve ever seen. And it’s been around for longer than Covid19.
Sometimes, fake news has little impact. But when times are uncertain, and a global crisis is in effect, people look for information that alleviates their fear. Unfortunately, fear leaves people more susceptible to accepting misleading information as the real deal. A limited amount of reliable data in some instances also encourages people to take misinformation willingly.
Studies conducted during the pandemic showed that false news might have threatened public health globally – by touting the pandemic as a hoax and Covid19, regular flu. Another set of studies by the UK, in association with WHO, revealed that nearly 6,000 people worldwide were hospitalized due to Covid19-related fake news. It also resulted in the deaths of at least 800 people. All of this, within the first three months of the pandemic.
The Internet has boosted the spread of fake news from a single place to all over the world. Social media is free to use, and therefore, anyone can post a story that goes viral. But Social media platforms and online news outlets are undertaking countermeasures. They’re constantly updating their algorithms to track and block fake news.
It isn’t easy, though. While most articles are manually written, innovative fake news creators use natural language generation techniques to create excellent, realistic counterfeit ones. These models have created an urgent need to distinguish fake from real news and detect human-generated and machine-generated fake news.
The need for better Fake News detection
During the course of the project, several issues were observed that needed solutions. These gaps helped identify and create the scope of the study.
- With the introduction of AI-generated fake news, the spread of misinformation has increased manifold. Adding to the challenge, AI models can mimic human language, making them near-impossible to detect. They’re used to spread hoaxes, propaganda or any other form of deception through online mediums, mainly social media.
- Social media rapidly disseminates information across the globe today. However, its algorithms are not well-equipped enough to detect fake news with accuracy. Considering social media is one of the top spreaders of fake news, developing a social media-compatible model would be critical.
- Complex language models are currently in use to find and block the spread of fake information. But the models train in observing text generated by themselves. And aren’t tested adequately for human-generated fake news. If a particular piece of fake news is generated by a machine but edited a little bit by a human, the models find it difficult to classify it as fake.
- While there were various datasets for fake news generated by humans, AI-generated datasets were difficult to find. It is because AI language models are still considered dangerous for public use. Hence, data regarding these datasets isn’t readily available.
Building the Fake News Detection Model
Keeping the above gaps in mind, this research project sought to find answers to two chief questions:
● With the use of different classification techniques, how can machine-generated news be identified effectively?
● Is there scope to build a model which can detect both human and machine-generated fake news?
The project looked into several datasets, including a fake news repository in use for ongoing research at Arizona State University (ASU).
FakeNewsNet was one of the datasets used. It is a comprehensive dataset that contains fake news content, social context, and dynamic data that can facilitate detection.
Finding AI-generated fake news dataset was challenging. Therefore, the GROVER model for Neural Fake News generation was settled on. Using the GROVER model, a sample dataset of machine-generated articles was created. The articles covered varied domains such as health, politics, society, climate and more.
GROVER is an AI-based writing and language system developed by the University of Washington in association with the Allen Institute for AI. It can write fake news on a range of topics and in several styles. However, because it’s a one-of-a-kind model, there’s no other model yet that can identify the fake news it generates.
The project put FakeNewsNet and GROVER together to experiment and distinguish between human-generated and neural fake news.
To better understand the data gathered, the project experimented with three news writing styles – human-generated ‘Real’ and ‘Fake’ news and ‘AI-generated Fake news.’ Multiple iterations using linear classifiers with TF-IDF feature set on unigrams and bigrams were performed.
Starting with length, the project observed that the length of news articles written by humans and AI were similar. Factual news articles written by humans had a few longer ones, but these weren’t significant in number.
The articles were also tested for vocabulary. While AI models could replicate human writing styles, vocabulary is unique to each individual’s creative capacity. The research found that machine-generated articles had very few distinct words in each article. This finding indicated that AI language models have a limited word library as compared to humans. Checking for syntax further showed that machine-generated articles use more common nouns than articles written by humans.
After the language generation models were analyzed and compared, the project found they can imitate the writing styles of humans. However, they used lesser distinct words than humans, which indicated they have a limited word collection. Plus, they used proper nouns sparingly, further proving that they lack facts or evidence. They try not to share a lot of specific information, which makes factual articles genuine.
Once the project was completed, it concluded that simple linear algorithms could pick machine or AI-generated fake news quite well. Also, these algorithms can separate these fake articles from real articles or human-written ones.
Future Scope of this Research for Fake News Detection
Findings from this research can be used for various activities:
- Researchers can further analyze and compare several language generation models with the human writing style and inter-model styles. It will increase their scope and help with better fake news detection.
- Even though TF-IDF classifiers worked well, there are possibilities of exploring other features to improve the model and make it a generic fit.
- While the project focused on text-based news articles and language models, AI algorithms can also analyze other features such as images, videos, date and time, sources, website, and domain for valuable information.
- Teaching the detection model to trace the source of a machine-generated fake news article to the language model from which it originated will be a huge step forward. This development will ensure mitigating and blocking the spread of misinformation.
Poorva Sawant is an upGrad learner, and as a part of her program, she has developed the thesis report titled, Detection & Classification Of AI-Generated Fake News.
Join Our Discord Server. Be part of an engaging online community. Join Here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Poorva Sawant is a BI Solution Architect at Tata Consultancy Services with over 14 years of experience in providing strategic solutions in Data and Analytics. She holds a Masters Degree in Data Science and is enthusiastic about exploring new methodologies to solve complex business problems.