Machine learning technologies, such as Natural Language Processing (NLP), are being increasingly deployed in healthcare settings especially during last year’s pandemic. Analysing text to identify diseases is among the primary tasks medical NLP systems are required to perform. Traditionally, training such classifiers for Named Entity Recognition (NER) has relied on hand-labelled training data. Such practices, however, are costly to construct and change if needed, in terms of cost and time, and ask for a great deal of domain expertise. This limits the applications of medical NLP—an imperative technology to narrow down informational gaps in healthcare.
According to Nigam Shah, professor of medicine (biomedical informatics) and biomedical data science at Stanford University and faculty member of the Stanford Institute for Human-Centred Artificial Intelligence (HAI), there is a lot of helpful information held within doctors’ notes and unstructured medical records that need to be quickly spread around.
To solve this problem, a team at Stanford University presented a new NLP system through a paper published in April 2021. The classifier, called Trove, is an open-sourced framework for weakly supervised entity classification using medical ontologies (i.e. databases of biomedical information) and rules generated by experts. Weakly supervised learning does not require hand-labelled training data. Instead, it looks at task-specific practices and imperfect labelling strategies to generate data faster programmatically. Thus, Trove automatically identifies entities in clinical text using publicly available ontologies and can be shared, inspected, and modified with ease.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Turning pandemics to possibilities
The costs of relabeling and regenerating training sets are exorbitant in traditional supervised ML—which is troublesome in demanding situations like the current pandemic where knowledge of the disease and underlying symptoms continually changes. Hence, the team behind Trove saw the COVID-19 pandemic as a ‘nice testbed’ to find a solution to arduous medical NLP systems and test Trove’s edge in the real world.

The researchers began by labelling COVID-19 symptoms through a public dataset called MIMIC. The team tested their weak supervision model on Stanford Hospital’s emergency department—where they found it to extract COVID-19 symptoms from patient notes at least as well as a hand-labelled MIMIC-based model would. Users also found Trove to be an easy tool to use where one could easily share labelling functionality, update it to store more information on COVID-19—all very quickly.
How Trove works
Trove utilises an open-sourced weak supervision framework known as Snorkel. Jason A. Fries, one of Trove’s developers, himself worked on Snorkel and has been working on weakly supervised NLP models for around six years. Despite this, Fries claims that Trove is more straightforward than the off-the-shelf uses of Snorkel. This simplification comes from Trove reportedly skipping a step that requires users to code many custom rules. This, coupled with Trove’s reliance on publically available ontologies, makes it very user-friendly for hospital data science teams. The laws the tool relies on can be amended as per the emergence of new scientific information without manually relabeling entire training datasets. Trove removes various bottlenecks in clinical natural language processing by making it easier to share labelling functions instead of training data. These functions are rules that tell users how to create their own training data. Thus, generating imperfect training labels from indirect sources like patient notes allows sharing the training model whilst preserving patient privacy.
As for the labelling tool itself, a recent study demonstrated Trove’s effectiveness in labelling chemicals, diseases, disorders, drugs, and COVID-19 related symptoms and risk factors in clinical text. A test conducted in this paper showed Trove to outperform previous state-of-the-art methods, which used manual annotation, by 1.5 per cent. This test used the Unified Medical Language System (UMLS), which contains more than 100 ontologies. The most significant gains, 10.9 per cent, enjoyed by Trove were made when the test added further, less accurate ontologies. Trove’s performance increased when a few specific task rules were implemented on these ontologies to correct observed errors.
Fries says that his vision for the future involves researching a ‘labelling function zoo’ similar to model zoos in machine learning, for researchers to share code for weakly supervised NLP. Shah hopes Trove’s future involves enabling it to identify information such as socioeconomic determinants to health (such as homelessness) in unstructured clinical text. As of the present, Trove manages to remove massive bottlenecks in clinical NLP.” As stated by Fries, Trove would allow users to take ontologies and ‘supercharge’ them, allowing them access to the performance they would see upon paying ‘a ton of doctors’ to label clinical notes manually.