Google AI Releases Data Model To Predict Text Readability From Scrolling Interactions

This novel approach provides insights into subjective readability, whether an individual reader has found a text accessible and demonstrates that existing readability models can be improved by including feedback from scroll-based reading interactions.

Google AI recently announced the release of its data model for predicting text readability from scrolling interactions. The new data model shows that data from on-device reading interactions can be used to predict how readable a text is. This novel approach provides insights into subjective readability — whether an individual reader has found a text accessible — and demonstrates that existing readability models can be improved by including feedback from scroll-based reading interactions. In order to encourage research in this area and to help enable more personalized tools for language learning and text simplification, Google AI also released the dataset of reading interactions generated from scrolling behaviour–based readability assessment of English-language texts.

Traditional machine learning approaches to measure readability have exclusively relied on such linguistic features. However, using these features alone does not work well for online content because such content often contains abbreviations, emojis, broken text, and short passages, which detrimentally impact the performance of readability models.

To address this, Google investigated whether aggregate data about the reading interactions of a group can be used to predict how difficult a text is, as well as how reading interactions may differ based on a readers’ understanding. When reading on a device, readers typically interact with the text by scrolling in a vertical fashion, which was hypothesized can be used as a coarse proxy for reading comprehension. They recorded the reading interactions by measuring different features of the participants’ scrolling behaviour, such as the speed, acceleration, and the number of times areas of text were revisited. This information was then used to produce a set of features for a readability classifier.


Sign up for your weekly dose of what's up in emerging technology.

The research team tested the significance using linear mixed effect models. Using linear mixed-effect models gives us higher confidence that the differences in interactions we are observing are because of the text difficulty and not other random effects.

The results showed that multiple reading behaviours differed significantly based on the text level, for example, the average, maximum and minimum acceleration of scrolling. It was found that the most significant features were the total read time and the maximum reading speeds.

Download our Mobile App

These features were then used as inputs to a machine learning algorithm. The team designed and trained a support vector machine (i.e., a binary classifier) to predict whether a text is either advanced or elementary based only on scrolling behaviours as individuals interacted with it. The dataset on which the model was trained contains 60 articles, each of which were read by an average of 17 participants. From these interactions, we produced aggregate features by taking the mean of the significant measures across participants.

Image Source: Google AI

The accuracy of the approach was measured using a metric called f-score, which measures how accurate the model is at classifying a text as either “easy” or “difficult” (where 1.0 reflects perfect classification accuracy). We are able to achieve an f-score of 0.77 on this task, using interaction features alone. This is the first work to show that it is possible to predict the readability of a text using only interaction features. It was also found that the addition of interaction features improves the f-score of this model from 0.84 to 0.88. In addition, the team were able to significantly outperform this system by using interaction information with simple vocabulary features, such as the number of words in the text, achieving an impressive f-score of 0.96.

Image Source: Google AI

Such an understanding is crucial when designing educational applications for low-proficiency readers and language learners because it can be used to match learners with appropriately levelled texts as well as to support readers in understanding texts beyond their reading level.

Image Source: Google AI

Through this, the research team confirms that there are statistically significant differences in the way that readers interact with advanced and elementary texts and that the comprehension scores of individuals correlate with specific measures of scrolling interaction.

More Great AIM Stories

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

AIM Upcoming Events

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 10th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox