Researchers from IIT Patna have introduced MedSumm, a multimodal approach that amalgamates Hindi-English codemixed medical queries with visual aids, providing a more comprehensive perspective on a patient’s medical condition.
Click here to read the paper, also to be published on ECIR 2024.
The researchers announce their intention to make the dataset, code, and pre-trained models publicly accessible.
The primary contributions of this research encompass the introduction of the MMCQS task, the creation of the MMCQS dataset, and the proposal of the advanced MedSumm framework. The MMCQS dataset comprises 3015 multimodal medical queries in Hindi-English codemixed language, accompanied by golden summaries in English that seamlessly merge visual and textual data.
The proposed framework harnesses the capabilities of LLMs and Vision Language Models (VLMs) namely CLIP, to facilitate multimodal medical question summarisation. The researchers showcase the tangible value of integrating visual information from images, demonstrating its potential to not only improve healthcare decision-making but also deepen the understanding of patient queries.
Researchers of the paper are Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep Majumdar, Sriparna Saha, Raghav Jain from IIT Patna along with Setu Sinha and Shivani Agarwal from Indira Gandhi Insitute of Medical Sciences, and Aman Chadha from Amazon generative AI and Stanford University.
The researchers used Llama 2, Mistral 7B, Vicuna, FLAN-T5, and Zephyr-7B for the final summary generation.
Leveraging the HealthCareMagic Dataset derived from MedDialog data, comprising 226,395 samples, 523 duplicates were removed. Guided by medical doctors, co-authors of the paper, 18 medical symptoms challenging to convey through text were identified and categorised into four groups: ENT, EYE, LIMB, and SKIN.
This framework would help in enhancing doctor-patient interactions and medical decision-making by summarising medical questions posed by patients. Despite the increasing complexity and quantity of medical data, existing research has predominantly centred on text-based methods, sidelining the integration of visual cues.
Furthermore, the scope of prior works in medical question summarisation has been confined to the English language, this dataset expands it to Hindi. The strategic integration of visual information from images aims to enhance the creation of medically detailed summaries.