Listen to this story
|
AI has changed the face of many sectors, but in healthcare its adoption is moving at snail’s pace. The sector faces several challenges such as data privacy, security, lack of interoperability, and the absence of regulation, which have restricted its adoption.
AI models are prone to mistakes, and as we know it is human to err. Google Research has asked the question, what would be the error rates when you combine the expertise of predictive AI models and clinicians?
In July this year, Google DeepMind joined hands with Google Research and introduced the Complimentary-driven-Deferral-to-Clinical-Workflow (CoDoC), a system that maximises accuracy by combining human expertise with predictive AI. The system essentially decides if the AI model is more accurate than a hypothetical clinician’s workflow of diagnosis. It does this using a confidence score of the predictive model as one of the inputs.
The comprehensive tests of the CoDoC with multiple real-world datasets has shown that with human expertise and predictive AI results with CoDoC provides greater accuracy. They saw a reduction of 25% in false positives for mammography datasets and more importantly, didn’t miss any true positives.
The published paper is a significant advancement in collaboration between AI and clinicians. It promises improved accuracy in determining disease with binary outcomes. Their datasets focused on breast cancer screening using X-Ray mammography and a triage for TB tests using chest X-Rays.
Sidestepping AI hurdles in medicine
Knowing when to say ‘I don’t know’ is essential when working with artificial intelligence tools in a medical setting. The paper addresses the crucial challenge of when to acknowledge uncertainty and then to pass on the responsibility to the clinician. “If you use CoDoC together with the AI tool, and the outputs of a real radiologist, and then CoDoC helps decide which opinion to use, the resulting accuracy is better than either the person or the AI tool alone,” says Alan Karthikesalingam at Google Health UK, who worked on the research.
The CoDoC model also does not require medical images from patients to make the diagnosis. This takes care of the privacy of the patient. It requires only three inputs for each case in training the dataset. First is the outputs of the hospital’s own existing predictive AI’s confidence score (0 is a certainty of no disease present and 1 is certain that the disease is present). Second, the outputs from a non-AI expert clinical workflow and finally historical ‘ground truth’ data.
The system could be compatible with any proprietary AI models and would not need access to the model’s inner workings or data it was trained on. To apply the CoDoC paradigm to existing predictive AI systems, researchers would follow the methodology described in the paper – this involves training a CoDoC-style model using the outputs of their own existing predictive AI system.
How the predictive model works
It learns by comparing the predictive AI model’s accuracy with what the doctor interprets and then checks how the accuracy varies with the confidence score generated by the predictive AI model.
After being trained, CoDoC is placed in a hypothetical future clinical workflow, working alongside both the predictive AI and the human doctor. When a new patient image is evaluated by the predictive AI model, its associated confidence score is fed into CoDoC.
By having an AI as an effective tool to confirm the diagnosis, doctors can be confident of their diagnoses even in edge cases, something that was not available before.
“The advantage of CoDoC is that it’s interoperable with a variety of proprietary AI systems,” says Krishnamurthy Dvijotham at Google DeepMind.
Inviting developers to test, validate, build
To help researchers build on their work, be transparent and ensure safer AI models for the real world, they’ve open-sourced CodoC’s code on Github.
This work is theoretical so far but according to the researchers it shows the AI system’s potential to adapt, improve performance on interpreting medical imaging across varied demographic populations, clinical settings, medical imaging equipment used, and disease types.
Helen Salisbury from the University of Oxford said, “It is a welcome development, but mammograms and tuberculosis checks involve fewer variables than most diagnostic decisions, so expanding the use of AI to other applications will be challenging.” She further said, “For systems where you have no chance to influence, post-hoc, what comes out the black box, it seems like a good idea to add on machine learning.”