Google AI recently announced the release of its data model for predicting text readability from scrolling interactions. The new data model shows that data from on-device reading interactions can be used to predict how readable a text is. This novel approach provides insights into subjective readability — whether an individual reader has found a text accessible — and demonstrates that existing readability models can be improved by including feedback from scroll-based reading interactions. In order to encourage research in this area and to help enable more personalized tools for language learning and text simplification, Google AI also released the dataset of reading interactions generated from scrolling behaviour–based readability assessment of English-language texts.
Traditional machine learning approaches to measure readability have exclusively relied on such linguistic features. However, using these features alone does not work well for online content because such content often contains abbreviations, emojis, broken text, and short passages, which detrimentally impact the performance of readability models.
To address this, Google investigated whether aggregate data about the reading interactions of a group can be used to predict how difficult a text is, as well as how reading interactions may differ based on a readers’ understanding. When reading on a device, readers typically interact with the text by scrolling in a vertical fashion, which was hypothesized can be used as a coarse proxy for reading comprehension. They recorded the reading interactions by measuring different features of the participants’ scrolling behaviour, such as the speed, acceleration, and the number of times areas of text were revisited. This information was then used to produce a set of features for a readability classifier.
The research team tested the significance using linear mixed effect models. Using linear mixed-effect models gives us higher confidence that the differences in interactions we are observing are because of the text difficulty and not other random effects.
The results showed that multiple reading behaviours differed significantly based on the text level, for example, the average, maximum and minimum acceleration of scrolling. It was found that the most significant features were the total read time and the maximum reading speeds.
These features were then used as inputs to a machine learning algorithm. The team designed and trained a support vector machine (i.e., a binary classifier) to predict whether a text is either advanced or elementary based only on scrolling behaviours as individuals interacted with it. The dataset on which the model was trained contains 60 articles, each of which were read by an average of 17 participants. From these interactions, we produced aggregate features by taking the mean of the significant measures across participants.
The accuracy of the approach was measured using a metric called f-score, which measures how accurate the model is at classifying a text as either “easy” or “difficult” (where 1.0 reflects perfect classification accuracy). We are able to achieve an f-score of 0.77 on this task, using interaction features alone. This is the first work to show that it is possible to predict the readability of a text using only interaction features. It was also found that the addition of interaction features improves the f-score of this model from 0.84 to 0.88. In addition, the team were able to significantly outperform this system by using interaction information with simple vocabulary features, such as the number of words in the text, achieving an impressive f-score of 0.96.
Such an understanding is crucial when designing educational applications for low-proficiency readers and language learners because it can be used to match learners with appropriately levelled texts as well as to support readers in understanding texts beyond their reading level.
Through this, the research team confirms that there are statistically significant differences in the way that readers interact with advanced and elementary texts and that the comprehension scores of individuals correlate with specific measures of scrolling interaction.