Active Hackathon

How Microsoft Set A New Benchmark To Track Fake News

Researchers from Microsoft, along with a team from Arizona State University, have published a work that has outperformed the current state-of-the-art models that detect fake news. Though the prevalence and promotion of misinformation have been since time immemorial, today, thanks to the convenience for access provided by the internet, fake news is rampant and has affected healthy conversations. 

Given the rapidly evolving nature of news events and the limited amount of annotated data, state-of-the-art systems on fake news detection face challenges due to the lack of large numbers of annotated training instances that are hard to come by for early detection.


Sign up for your weekly dose of what's up in emerging technology.

In this work, the authors exploited multiple weak signals from different user engagements. They call this approach multi-source weak social supervision or MWSS. They have leveraged limited amounts of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances. 

Experiments on real-world datasets demonstrate that the proposed framework outperforms state-of-the-art baselines for early detection of fake news without using any user engagements at prediction time.

How Meta Learning Came To The Rescue

Fake news is diverse in terms of topics, content, publishing methods and media platforms. There are also sophisticated linguistic styles geared to emulate accurate news. Consequently, training machine learning models on such sophisticated content requires large-scale annotated fake news data that is egregiously difficult to obtain. 

Second, it is important to detect fake news early. Most of the research on fake news detection rely on signals that require a long time to aggregate, making them unsuitable for early detection.

Prior works on detecting fake news rely on large amounts of labelled instances to train supervised models. Such large labelled training data is difficult to obtain in the early phase of fake news detection.

To overcome these issues, the authors devised a model that leverages a small amount of data that is annotated manually, and a large amount of weakly annotated data for joint training in a meta-learning framework. 

The models learn to estimate their respective contributions to optimize for the end task. To model the weights of weak labels, a Label Weighting Network (LWN) is developed. This network monitors the learning process of the fake news classifier.

The above picture illustrates multi-source weak social supervision (MWSS) in two phases: 

(a) compute the validation loss based on the validation dataset and retain the computation graph for LWN backward propagation; 

(b) update the classifier and its parameters through backward propagation on clean and weakly labelled data.

For experiments, the authors have used fake news detection benchmark datasets called FakeNewsNet, which contains news content from GossipCop5 and PolitiFact6. This data is annotated by professional journalists and experts, along with social context information. 

News content includes meta attributes of the news (eg, body text), whereas social context includes related users’ social engagements on the news items (eg, user comments in Twitter).

Key Findings

The authors have observed the following from their experiments:

  • Training only on clean data achieves better performance than training only on the weakly labelled data consistently across all datasets
  • On incorporating weakly labelled data in addition to the annotated clean data, the classification performance improves, when compared with results when using only clean labels
  • Merely merging the clean and weak sources of supervision without accounting for their reliability may not improve the prediction performance
  • The MWSS model not only learns the importance of different instances, but also learns the importance of the corresponding source
  • Powered by meta learning with a Label weighting network, MWSS outperforms state-of-the-art baselines without using any user engagements at prediction time

Read the original paper here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.