With millions of conversations taking place currently on the internet, it has given us numerable ways to create and define our own communities on the web. These platforms are a positive benefit to individuals and communities but at the same time, these can be breeding grounds for conflict and anti-social behaviour.
There is not enough research and knowledge around how interactions happen online between communities and users, especially in the space of conflicts. A team from the Computer Science and Linguistics department of Stanford University wanted to change just that. What they did: research conflict events by searching for cases where one community posted a hyperlink to another community. What they found: conflicts tend to be initiated by a handful of communities—less than 1% of communities start 74% of conflicts. In the long term, conflicts have adverse effects and reduce the overall activity of users in targeted communities. The researchers also came up with a way to predict conflicts on the web communities.
Data and Communities
The researchers used data from Reddit and Subreddit communities like ‘r/Documentaries’ or ‘r/StarWars’. They used 40 months of Reddit post and comment data from January 2014 to April 2017. There are no explicit labels of inter community interaction on Reddit. By following cross-links, users can be encouraged to participate on the linked post. The researchers removed overlapping cross-links, and obtained 137,113 cross-links made between 36,000 communities, which is then analysed.
Cross-linking is an act where a post from one discussion thread is shared in another discussion thread. As we know Reddit is a discussion thread based platform and hence in order to study the effect of the cross-link, researchers came up with a way to match posts on different threads. For each post p that they analyse, a matched post from the same community that was created closest in time to p is selected. They also defined the members of the source, S, and target, T communities.
How mobilisation works?
The researchers strongly believed that some special cross-links lead to mobilisation. Mobilisations are defined as cases where a cross-link leads to an increase in the number of comments. These comments are made by current source members on the discussion thread of the target post. Hence, to identify cases where source members are mobilised due to the cross-link, the researchers tried to build a model that measured the normal rate of comment and the increase in comments due to the cross-link.
The researchers compare the number of comments made by source members on the two threads, within a 12 hour window before and after the cross-link is created. To further distinguish categorise interactions, the researchers classify cross-links based on the sentiment of the source post. Negative source turns out to be a very important component of intergroup conflict. Crowdsourced Mechanical Turk labels, were asked to label the sentiment the source post as negative, neutral or positive. The researchers define two terms : “attackers” are the members of the source community, and “defenders” are the members of the target community.
Initiations of negative mobilisations
The researchers were very interested in knowing which communities and users tend to initiate negative mobilisations, and which communities they target. After digging into the properties of conflict initiating community, they found that less than 0.1% and 1% of source communities are responsible for 38% and 74% of the negative mobilisations, respectively. Therefore it is found that small number of communities echo anti-social behaviour and banning particular members can be effective. The researchers also find that negative mobilisations tend to occur between highly similar communities (based on tf-idf similarity) and are focussed on a few topics.
In communities made up of thousands or even millions of users, there is a varying degree of user participation and only a small number of participants are responsible for most of the activity. Examining individual user creators of the cross-links, the researchers have found that the creators are 10% more active in the source community. Further users that are successful in mobilising others are significantly more active than the ones who are not. The sentiment analysis also shows that the attackers expressed more ‘anger’ in their past comments.
Prediction Of Mobilisations
After analysing the impact of mobilisations, it was successfully shown that negative mobilisations can have long-term adverse impact. The study also shows ways in which we could predict the possibility of negative mobilisation as soon as it is created. The deep learning model built by the researchers, leverages both the text of the post as well as the user-to-community interaction network for predictions. We show that our model significantly outperforms the feature-based baseline for the task of predicting whether a cross-linking post will mobilise users.
The researchers also convert text, user and communities into 300-dimensional embeddings, which will be used as input to the deep learning model. This deep learning model is the socially-primed LSTM model. This “socially primed” LSTM model combines social and textual data. Specifically, the input sequences are the concatenation of :
(i) the user embedding of the post author,
(ii) the community embeddings of the source and target communities,
(iii) the word embeddings of the post text.
The resultant model created several machine learning prediction models and achieve high AUC of 0.76 for predicting if a post will mobilise users. This approach outperforms a number of strong baselines and could be used to create a 'raid' early-warning system for moderators to inform them of a potential impending influx of toxic users.
Register for our upcoming events:
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- WEBINAR: HOW TO BEGIN A CAREER IN DATA SCIENCE | 24th Oct
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Provide your comments below
What's Your Reaction?
As a thorough data geek, most of Abhijeet's day is spent in building and writing about intelligent systems. He also has deep interests in philosophy, economics and literature.