Google is introducing Smart Scrolling, a new ML-based feature in its Recorder app that automatically marks important sections in the transcript and then indexes the scroll bar with like chapter headings automatically. The users can then scroll through the keywords or tap on them to skim through their sections of interest. The models used, claimed Google, are lightweight and preserve user privacy.
The team at Google AI leveraged their own BERT model with a tweak and the old school TF-IDF model to make this smart scrolling feature a reality. In the next section, we briefly discuss the machine learning behind this feature.
Overview Of The Recorder
Released last year, Recorder is a new kind of audio recording app for Google Pixel phones that leverages the above-discussed techniques to transcribe conversations, to detect audio bits like applause, laughter or even whistling. And, all these can be done offline.
The model that runs behind the Recorder app can index the conversation with timestamps to map words so that the user can just click on a word in the transcription and initiate playback starting from that point in the recording and this ability led to the new Smart Scrolling feature.
Read more about the Recorder app here.
ML Behind Smart Scrolling
To make the phone understand when to scroll and when to stop is a huge challenge. IT is even challenging to identify whether a section or keyword is important — what is of great importance to one person can be of less importance to another.
As illustrated above in the Smart Scrolling architecture, Google AI team designed their solution in such a way so that the model selects the top scored sections with highly rated keywords. A keyword here is considered to be highly rated if it better represented the unique information of the certain section of the transcript.
In a blog detailing how they got around this idea for real time knowledge capture, Google stated that it is important that the scroll indexing is complete by the time the users close their recording activity. The complexity of the computation pipeline, as shown above, can be time consuming. So Google reduced this computation over the whole duration of the recording by processing each section as soon as it is captured, and then these intermediate results are stored in memory.
The Smart Scrolling feature is a combination of two tasks. While one captures representative keywords from each section, the other picks which sections in the text are the most informative and unique.
For each task, two different natural language processing (NLP) approaches were used — a distilled BERT model pre-trained on data sourced from a Wikipedia dataset, alongside a modified extractive term frequency–inverse document frequency (TF-IDF) model.
Explaining how the advantages and disadvantage of TF-IDF and distilled BERT mutually benefited the final output, Google said that the TF-IDF approach is prone to finding uncommon keywords in the text (high bias), while the drawback for the bidirectional transformer model is the high variance of the possible keywords that can be extracted. Used together, these two models complement each other, forming a balanced bias-variance tradeoff.
“A keyword was more highly rated if it better represented the unique information of the section.”
Here’s step by step guide of how Google pulled this off:
- The extractive TF-IDF approach rates terms based on their frequency in the text compared to their inverse frequency in the trained dataset, and enables the finding of unique representative terms in the text.
- The TF-IDF-based model detects informative keywords by giving each word a score, which corresponds to how representative this keyword is within the text.
- The model then aggregates these features into a score using a pre-trained function curve.
- Once the keyword scores are retrieved from both models, they are normalised and combined by utilising NLP heuristics (e.g., the weighted average), removing duplicates across sections, and eliminating stop words and verbs.
To train the model, the team prepared a dataset that was labeled to perform this particular task of finding the highlighted keywords. A small batch of examples were initially labelled with the help of skilled raters to establish an initial dataset. Once the labeling process was complete, it was reviewed and corrections were made.
Using this limited labeled dataset, Google then ran automated model evaluations to establish initial metrics on model quality and to quickly assess the model performance. They tweaked the model heuristics parameters to reach the desired level of performance using a reliable model quality evaluation. The applications of Google’s BERT algorithm are very well documented. WIth Smart Scrolling, Google engineers have added another key application to the growing list of use cases of BERT.