Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation
Loading...
Date
Authors
Arafin, Protiva
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Pharmacogenomics (PGx) is very important for personalized medicine since it helps
doctors choose the right drugs and doses based on a person’s genetic makeup. But
the growing amount and complexity of PGx data, as well as the requirement to understand
clinical recommendations, make it harder to make good decisions. This study
puts forward a data-driven clinical decision support framework that combines large language
models (LLMs) with hybrid retrieval-augmented generation (RAG) to enhance
the response to pharmacogenomic questions.
The framework assesses two contemporary LLMs, Meta-LLaMA-3.1-8B-Instruct and
Qwen3-8B, through various configurations, encompassing base models, Low-Rank Adaptation
(LoRA) fine-tuning, and hybrid RAG-based methodologies. The structured pharmacogenomics
data from CPIC and the clinical guideline information from ClinPGx are
combined to make a huge dataset. To make it easier to find and use in models, the
data goes through procedures including merging, cleaning, normalizing, and converting
to JSONL format.A hybrid retrieval approach is aimed to enhance factual grounding
by integrating lexical filtering with semantic similarity through sentence embeddings.
This research use both automatic metrics and manual checks to rate the models on their
correctness, relevance, completeness, and clarity. The results reveal that Qwen works
well as a basic model, and that LLaMA gets much better when it is used with RAG
and LoRA, giving answers that are more aware of the context and therapeutically useful.
Fine-tuning alone doesn’t always work, which shows how limited it is to only use
parametric data. The results show that accuracy in clinical settings needs to be backed
up by consistency, relevance, and evidence.
This study demonstrates that employing retrieval methods alongside parameter-efficient
fine-tuning enhances the reliability and utility of LLM-based systems in clinical environments.
The proposed methodology establishes a scalable framework for the development
of trustworthy AI-driven solutions in pharmacogenomics and healthcare decision support.
