Tech Behind Storywrangler, The Analytics Tool Crawling Billions Of Social Media Posts

We make the dataset available through an interactive time series viewer and as downloadable time series and daily distribution.
Tech Behind Storywrangler, The Analytics Tool Crawling Billions Of Social Media Posts

University of Vermont (UVM) researchers recently unveiled a new tool called the Storywrangler to visualise the use of billions of words, hashtags and emoji posted on Twitter. Check out the code on GitHub.

In a research paper, “Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter,” researchers from the University of Vermont, in collaboration with Charles River Analytics, and MassMutual Data Science, detailed the working of a tool that curated over 150 billion tweets containing 1 trillion 1-grams from 2008 to 2021. 


Sign up for your weekly dose of what's up in emerging technology.

The researchers have highlighted the tool’s potential showcasing use cases apropos social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest. 

How does it work? 

The team has broken down tweets into 1-, 2-, and 3 – grams across 100+ languages, generating frequencies for words, hashtags, numerals, handles, symbols, and emojis for each day. A 1-gram or unigram is a one-word sequence. Similarly, a 2-gram or bigram is a two-word sequence of words, a 3-gram or trigram is a three-word sequence, so and so forth. 

For example, in the below visuals from the tool’s online viewer, three global events from 2020 are highlighted: the death of Iranian general “Qasem Soleimani;” the beginning of the “Covid-19” pandemic; and the “Black Lives Matter” protests following the murder of “George Floyd”.

(Source: arXiv)

“We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions,” said the UVM researchers. 

Though Storywrangler leverages Twitter data, their method of tracking dynamic changes in ‘n-grams‘ can be extended to any evolving corpus. 

Thayer Alshaabi, a researcher from UVM, said, “It is like a telescope to look — in real-time — at all this data that people share on ‘social media.’ We hope people will use it themselves, in the same way, you might look up at the stars and ask your own questions.'”

Why Twitter? 

Powered by ‘UVM’s supercomputer’ at the Vermont Advanced Computing Core, Storywrangler provides a powerful lens for viewing and analysing the rise and fall of words, ideas, and tweets each day. “It is important because it shows major discourses as they are happening,” said Jane L. Adams, “It is quantifying collective attention.”  

“Though Twitter does not represent the whole of humanity, it is used by a very large and diverse group of people, which means that it encodes popularity and spreading,” noted the researchers. 

Interestingly, the researchers showed the tool could be used to predict political and financial turmoil. The team examined the percent change in the words ‘rebellion’ and ‘crackdown’ in various regions of the world and found the rise and fall of these terms were significantly associated with a change in a well-established index of geopolitical risk for those locations. 

Professor at the UVM’s computer science department, Christopher M. Danforth, said the Storywrangler offers a data-driven way to index what regular people are talking about in everyday conversations, not just what authors or reporters have chosen. 

Storywrangler aims to enable research in computational social science, data journalism, natural language processing, and the digital humanities.

UVM’s Danforth said a hashtag is being invented every second. “We did not know to look for that yesterday, but it will show up in the data and become part of the story.” 

Wrapping up 

With support from the ‘National Science Foundation,’ the UVM team is currently using Twitter to demonstrate how chatter on distributed social media can act as a kind of global sensor system — of what happened, how people reacted, and what’s next. 

In theory, other social media streams, including Reddit, 4chan and Weibo, can also be used to feed Storywrangler.

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.