Tech Behind Storywrangler, The Analytics Tool Crawling Billions Of Social Media Posts

We make the dataset available through an interactive time series viewer and as downloadable time series and daily distribution.
Tech Behind Storywrangler, The Analytics Tool Crawling Billions Of Social Media Posts

University of Vermont (UVM) researchers recently unveiled a new tool called the Storywrangler to visualise the use of billions of words, hashtags and emoji posted on Twitter. Check out the code on GitHub.

In a research paper, “Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter,” researchers from the University of Vermont, in collaboration with Charles River Analytics, and MassMutual Data Science, detailed the working of a tool that curated over 150 billion tweets containing 1 trillion 1-grams from 2008 to 2021. 

The researchers have highlighted the tool’s potential showcasing use cases apropos social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest. 

How does it work? 

The team has broken down tweets into 1-, 2-, and 3 – grams across 100+ languages, generating frequencies for words, hashtags, numerals, handles, symbols, and emojis for each day. A 1-gram or unigram is a one-word sequence. Similarly, a 2-gram or bigram is a two-word sequence of words, a 3-gram or trigram is a three-word sequence, so and so forth. 

For example, in the below visuals from the tool’s online viewer, three global events from 2020 are highlighted: the death of Iranian general “Qasem Soleimani;” the beginning of the “Covid-19” pandemic; and the “Black Lives Matter” protests following the murder of “George Floyd”.

(Source: arXiv)

“We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions,” said the UVM researchers. 

Though Storywrangler leverages Twitter data, their method of tracking dynamic changes in ‘n-grams‘ can be extended to any evolving corpus. 

Thayer Alshaabi, a researcher from UVM, said, “It is like a telescope to look — in real-time — at all this data that people share on ‘social media.’ We hope people will use it themselves, in the same way, you might look up at the stars and ask your own questions.'”

Why Twitter? 

Powered by ‘UVM’s supercomputer’ at the Vermont Advanced Computing Core, Storywrangler provides a powerful lens for viewing and analysing the rise and fall of words, ideas, and tweets each day. “It is important because it shows major discourses as they are happening,” said Jane L. Adams, “It is quantifying collective attention.”  

“Though Twitter does not represent the whole of humanity, it is used by a very large and diverse group of people, which means that it encodes popularity and spreading,” noted the researchers. 

Interestingly, the researchers showed the tool could be used to predict political and financial turmoil. The team examined the percent change in the words ‘rebellion’ and ‘crackdown’ in various regions of the world and found the rise and fall of these terms were significantly associated with a change in a well-established index of geopolitical risk for those locations. 

Professor at the UVM’s computer science department, Christopher M. Danforth, said the Storywrangler offers a data-driven way to index what regular people are talking about in everyday conversations, not just what authors or reporters have chosen. 

Storywrangler aims to enable research in computational social science, data journalism, natural language processing, and the digital humanities.

UVM’s Danforth said a hashtag is being invented every second. “We did not know to look for that yesterday, but it will show up in the data and become part of the story.” 

Wrapping up 

With support from the ‘National Science Foundation,’ the UVM team is currently using Twitter to demonstrate how chatter on distributed social media can act as a kind of global sensor system — of what happened, how people reacted, and what’s next. 

In theory, other social media streams, including Reddit, 4chan and Weibo, can also be used to feed Storywrangler.

Download our Mobile App

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Sam Altman a Hypocrite? 

While on the one hand, Altman is advocating for the international community to build strong AI regulations, he is also worried when someone finally decides to regulate it