Collating Reddit posts of more than 800,000 users from 2018 to 2020, the researchers used various NLP techniques like trend analysis, supervised learning and unsupervised learning to characterise changes in the language used by mental health support groups.
The study performed classification among mental health subreddits (forums dedicated to a specific topic on Reddit) as well as non-mental health subreddits and identified important features that help understand how each mental health problem may manifest in language.
The trend analysis monitored COVID-19 related tokens (words, characters, or subwords) across subreddits and language features from January to April, to observe patterns. The researchers measured “how much the posts were about COVID-19” compared to the total number of words.
The result showed an early spike in the ‘healthanxiety’ subreddit as pandemic-related posts showed up in January, even before any other subreddits were posting about a possible pandemic. This evidence supports concerns regarding the prolonged stress that people with health anxiety may be experiencing.
As the pandemic progressed, a significant decrease in the number of posts was observed, which could mean that users avoid social media due to a perceived increase in news and discussions around the pandemic. This can be concerning as individuals are not using online means to seek support during difficult times, said the study.
Throughout subreddits, the study found a significant increase in tokens related to isolation like ‘lonely’, ‘can’t see anyone’, ‘quarantine’ and economic stress like ‘debt’, ‘rent’. At the same time, it observed a decrease in lexicons related to motion like ‘walk’, ‘visit’, ‘travel’.
Another sign of concern was the increasing use of negative semantics across subreddits. This was predominantly higher in the attention-deficit/hyperactivity disorder (ADHD), eating disorder, anxiety and depression subreddits and other non-mental health-related subreddits like personal finance, relationship and finance groups.
For unsupervised learning, the study used clustering. It grouped posts into clusters such as ‘loneliness’ or ‘substance use’ and then tracked how those groups changed as the pandemic progressed.
Natural clusters emerged related to ‘suicidality’ and ‘loneliness’ over the period. The number of posts in these clusters more than doubled during the pandemic as compared to the same months of the preceding year. The researchers also found the introduction of new topics specifically seeking ‘mental health help’ or ‘social interaction’.
The analysis also identified various cluster-subreddit pairings to identify which clusters enriched which subreddits. While the language mostly remained the same within clusters, pre and mid-pandemic, the variation was seen in terms of proximity of discussions to the respective subreddit. For instance, the ‘addiction’ subreddit was enriched with “Substance Use Alcohol” and “Substance Use Marijuana” clusters.
This analysis used Latent Dirichlet Allocation (LDA) to determine tokens that frequently appeared together in posts for a topic across mental health subreddits before the pandemic. Multiple models were created to assess the topic stability across and within subreddits, based on the token words of these topics.
While the topics that emerged from the pre-pandemic LDA largely matched the topics discussed before the pandemic, some of the tokens used for the topics discussed in subreddits saw a shift post-pandemic. For instance, autism and ADHD were always combined together into the same topic with tokens mostly related to school and work. The mid-pandemic LDA model split the tokens related to ADHD and autism.
The analysis also measured the similarity between two mental-health subreddits using LDA. Overall, it found that the more a subreddit discussed the COVID-19 pandemic, more it became similar to the ‘healthanxiety’ subreddit, which can suggest that symptoms of health anxiety might have increased in the psychiatric populations.
As COVID-19 pandemic impacts mental health, this type of analysis can help in identifying how people with different types of mental health problems have been impacted differently. Based on the study, one can infer the kind of changes that are seen in languages among different mental health patients when situations like pandemic arise.
Going forward, if applied to Reddit, this study can help in identifying the right paths of treatment and make resources available accordingly.