Researchers from Microsoft, along with a team from Arizona State University, have published a work that has outperformed the current state-of-the-art models that detect fake news. Though the prevalence and promotion of misinformation have been since time immemorial, today, thanks to the convenience for access provided by the internet, fake news is rampant and has affected healthy conversations.
Given the rapidly evolving nature of news events and the limited amount of annotated data, state-of-the-art systems on fake news detection face challenges due to the lack of large numbers of annotated training instances that are hard to come by for early detection.
Sign up for your weekly dose of what's up in emerging technology.
In this work, the authors exploited multiple weak signals from different user engagements. They call this approach multi-source weak social supervision or MWSS. They have leveraged limited amounts of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances.
Experiments on real-world datasets demonstrate that the proposed framework outperforms state-of-the-art baselines for early detection of fake news without using any user engagements at prediction time.
How Meta Learning Came To The Rescue
Fake news is diverse in terms of topics, content, publishing methods and media platforms. There are also sophisticated linguistic styles geared to emulate accurate news. Consequently, training machine learning models on such sophisticated content requires large-scale annotated fake news data that is egregiously difficult to obtain.
Second, it is important to detect fake news early. Most of the research on fake news detection rely on signals that require a long time to aggregate, making them unsuitable for early detection.
Prior works on detecting fake news rely on large amounts of labelled instances to train supervised models. Such large labelled training data is difficult to obtain in the early phase of fake news detection.
To overcome these issues, the authors devised a model that leverages a small amount of data that is annotated manually, and a large amount of weakly annotated data for joint training in a meta-learning framework.
The models learn to estimate their respective contributions to optimize for the end task. To model the weights of weak labels, a Label Weighting Network (LWN) is developed. This network monitors the learning process of the fake news classifier.
The above picture illustrates multi-source weak social supervision (MWSS) in two phases:
(a) compute the validation loss based on the validation dataset and retain the computation graph for LWN backward propagation;
(b) update the classifier and its parameters through backward propagation on clean and weakly labelled data.
For experiments, the authors have used fake news detection benchmark datasets called FakeNewsNet, which contains news content from GossipCop5 and PolitiFact6. This data is annotated by professional journalists and experts, along with social context information.
News content includes meta attributes of the news (eg, body text), whereas social context includes related users’ social engagements on the news items (eg, user comments in Twitter).
The authors have observed the following from their experiments:
- Training only on clean data achieves better performance than training only on the weakly labelled data consistently across all datasets
- On incorporating weakly labelled data in addition to the annotated clean data, the classification performance improves, when compared with results when using only clean labels
- Merely merging the clean and weak sources of supervision without accounting for their reliability may not improve the prediction performance
- The MWSS model not only learns the importance of different instances, but also learns the importance of the corresponding source
- Powered by meta learning with a Label weighting network, MWSS outperforms state-of-the-art baselines without using any user engagements at prediction time
Read the original paper here.