“A disproportionate amount of power lies with research teams who, after determining the research questions.”
The improved methods of collecting high-quality data, coupled with advancements of machine learning models fueled a new wave of healthcare practices. From retinopathy to computer vision-based surgeries, algorithms have found their ways into critical life-saving domains. The potential is tremendous, but somehow the world is cynical about a total embrace.
This is because of the many ways in which bias creeps up into data and eventually to diagnosis. Biased data can lead to disproportionate negative impacts on already marginalised groups. Researchers from the likes of MIT, Microsoft and other top institutes have collaborated on investigating the lingering challenges of algorithmic bias in the realms of healthcare.
Building An Ethical Pipeline
The authors acknowledge that the disparities in results can also be due to the type of problem selection. Understudied use cases will barely make it to the final data that is fed into the machine learning model; hence it will remain biased. “It [choice of the problem] can also be a matter of justice if the research questions that are proposed, and ultimately funded, focus on the health needs of advantaged groups,” wrote the authors.
“Just as data are not neutral, algorithms are not neutral.”
Even determining disease of a patient can be skewed by how prevalent diseases are, or how they manifest in some patient populations. The authors state that patient disease occurrences are often selected as the prediction label for models.
The challenges don’t end there. Here are a few others:
- Imbalanced datasets
- Confounding bias
- Model generalisability
- Group Fairness
- Choice of an ethical framework and more. Read in detail about these challenges here.
So, how can we encourage model developers to build ethical considerations into the pipeline from the very beginning? In this review, the researchers gave a few recommendations to tackle the aforementioned roadblocks in building an ethical framework:
Recommendations
Identify the understudied
Practitioners should target historically understudied problems so as to deliver high-impact work. Furthermore, problems should be tackled by diverse teams and using frameworks that increase the probability that equity will be achieved.
This includes data collection, mainly. Researchers should work with domain experts to ensure that data reflecting underserved and understudied populations’ needs are collected.
As data collection is a key concern of the building the ethical ML pipeline, it should be framed as an important front-of-mind that includes clear disclosures about imbalanced datasets.
Leverage the literature
It is obvious that the model outputs should be unbiased while reflecting the task at hand. In the case of an ethical bias, the source of inequity should be accounted for in the ML model design. This can be done by leveraging literature that attempts to remove ethical biases during pre-processing, or with the use of a reasonable proxy.
Model objectives as the ones mentioned above should be articulated well in a pre-analysis plan. In addition to making ML model choices such as loss functions, researchers must address the importance of developing such a model and what are the caveats if one has to.
Audits & Checklists
The authors believe that the ML ethical design “checklists” can be used as a tool to systematically enumerate and consider ethical concerns prior to declaring success in a project. This can be done through audits that are designed to identify specific loopholes, and are paired with methods and procedures. The assessment should be done at group-by-group, rather than at a population level.
Having said that, the researchers of this review admitted that the responsibility of building ethical models and behaviour eventually relies on technical researchers fulfilling an obligation to engage with patients, clinical researchers, staff, and advocates to build ethical models.
Check the full report here.