Predicting cancer with swarm-learning without exposing personal data

Artificial intelligence can predict the presence of molecular alterations directly from routine histopathology slides.

Share

Published on May 6, 2022

by Pritam Bordoloi

Around 10 million cancer deaths were reported in 2020. In the same year, an estimated 19.3 million new cancer cases were registered, according to the Global Cancer Statistics 2020 report. In India, around 50 percent of deaths are due to late diagnosis.

Early detection can prevent a lot of cancer deaths. The AI models trained on large datasets comprising X-ray scans, test results and reams of medical literature have made early detection of cancer possible in many cases. However, the data used to train such models contain personally identifiable information, and a data breach could have serious ramifications.

Artificial intelligence can predict the presence of molecular alterations directly from routine histopathology slides. However, training robust AI systems requires large datasets for which data collection faces practical, ethical and legal obstacles. These obstacles could be overcome with swarm learning (SL), in which partners jointly train AI models while avoiding data transfer and monopolistic data governance.

Right on cue, a team of scientists has developed a novel AI technique for cancer detection without putting the personal information of the patients in danger. The research paper called, ‘Swarm learning for decentralised artificial intelligence in cancer histopathology’ claimed AI can help extract a wealth of clinically relevant data from digitised histopathology images without disclosing the information from hospitals.

The methodology

The research involves using the swarm-learning (SL) system to predict cancer by studying medical images of patient tissues. The researchers used the technique to teach the AI algorithms to spot patterns in data, such as genetic changes in human tissue.

The swarm learning system then send the newly trained algorithm to a central computer, without exposing any personal data of the patient. The next step is to merge with similar algorithms created in other hospitals to develop an optimised algorithm. The information is then relayed to local hospitals, where it is compared to the original data to detect genetic mutations. The same process is repeated multiple times to improve the algorithm. This allows the algorithm to work without releasing any crucial data to third parties.

Outcome

AI models trained using SL can predict BRAF mutational status and microsatellite instability from hematoxylin and eosin (H&E)-stained pathology slides of colorectal cancer. The researchers trained AI models on three patient cohorts from Northern Ireland, Germany and the US, and validated the prediction performance in two independent datasets from the UK.

Swarm learning-based AI models outperformed most locally trained models. Further, it performed on par with models trained on merged datasets. The researchers demonstrated that SL-based AI models are data efficient.

According to a Critical Insight report, data breaches increased by 84 percent between 2018 and 2021. During the same period, the number of individuals affected has also tripled from 14 million in 2018 to 45 million in 2021. Given the rapid advancement of AI and its application in healthcare, a lot of hospitals and companies will have access to swathes of critical healthcare data.

(Source: Critical Insight)

IBM’s Cost of a Data Breach Report 2021 revealed that the global average total cost of healthcare data breaches stood at USD 4.24 million.

(Source: IBM)

Such breaches have necessitated the need for such models. In the near future, SL can be used to train distributed AI models for any histopathology image analysis task, eliminating the need for data transfer research, where protecting confidential data has become pivotal.

Access all our open Survey & Awards Nomination forms in one place