Active Hackathon

Google AI Releases Data Model To Predict Text Readability From Scrolling Interactions

This novel approach provides insights into subjective readability, whether an individual reader has found a text accessible and demonstrates that existing readability models can be improved by including feedback from scroll-based reading interactions.

Google AI recently announced the release of its data model for predicting text readability from scrolling interactions. The new data model shows that data from on-device reading interactions can be used to predict how readable a text is. This novel approach provides insights into subjective readability — whether an individual reader has found a text accessible — and demonstrates that existing readability models can be improved by including feedback from scroll-based reading interactions. In order to encourage research in this area and to help enable more personalized tools for language learning and text simplification, Google AI also released the dataset of reading interactions generated from scrolling behaviour–based readability assessment of English-language texts.

Traditional machine learning approaches to measure readability have exclusively relied on such linguistic features. However, using these features alone does not work well for online content because such content often contains abbreviations, emojis, broken text, and short passages, which detrimentally impact the performance of readability models.


Sign up for your weekly dose of what's up in emerging technology.

To address this, Google investigated whether aggregate data about the reading interactions of a group can be used to predict how difficult a text is, as well as how reading interactions may differ based on a readers’ understanding. When reading on a device, readers typically interact with the text by scrolling in a vertical fashion, which was hypothesized can be used as a coarse proxy for reading comprehension. They recorded the reading interactions by measuring different features of the participants’ scrolling behaviour, such as the speed, acceleration, and the number of times areas of text were revisited. This information was then used to produce a set of features for a readability classifier.

The research team tested the significance using linear mixed effect models. Using linear mixed-effect models gives us higher confidence that the differences in interactions we are observing are because of the text difficulty and not other random effects.

The results showed that multiple reading behaviours differed significantly based on the text level, for example, the average, maximum and minimum acceleration of scrolling. It was found that the most significant features were the total read time and the maximum reading speeds.

These features were then used as inputs to a machine learning algorithm. The team designed and trained a support vector machine (i.e., a binary classifier) to predict whether a text is either advanced or elementary based only on scrolling behaviours as individuals interacted with it. The dataset on which the model was trained contains 60 articles, each of which were read by an average of 17 participants. From these interactions, we produced aggregate features by taking the mean of the significant measures across participants.

Image Source: Google AI

The accuracy of the approach was measured using a metric called f-score, which measures how accurate the model is at classifying a text as either “easy” or “difficult” (where 1.0 reflects perfect classification accuracy). We are able to achieve an f-score of 0.77 on this task, using interaction features alone. This is the first work to show that it is possible to predict the readability of a text using only interaction features. It was also found that the addition of interaction features improves the f-score of this model from 0.84 to 0.88. In addition, the team were able to significantly outperform this system by using interaction information with simple vocabulary features, such as the number of words in the text, achieving an impressive f-score of 0.96.

Image Source: Google AI

Such an understanding is crucial when designing educational applications for low-proficiency readers and language learners because it can be used to match learners with appropriately levelled texts as well as to support readers in understanding texts beyond their reading level.

Image Source: Google AI

Through this, the research team confirms that there are statistically significant differences in the way that readers interact with advanced and elementary texts and that the comprehension scores of individuals correlate with specific measures of scrolling interaction.

More Great AIM Stories

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.