Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation

Arafin, Protiva

Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation

Files

ArafinP2026m-2b.pdf (1.81 MB)

Date

2026

Authors

Arafin, Protiva

Abstract

Pharmacogenomics (PGx) is very important for personalized medicine since it helps doctors choose the right drugs and doses based on a person’s genetic makeup. But the growing amount and complexity of PGx data, as well as the requirement to understand clinical recommendations, make it harder to make good decisions. This study puts forward a data-driven clinical decision support framework that combines large language models (LLMs) with hybrid retrieval-augmented generation (RAG) to enhance the response to pharmacogenomic questions. The framework assesses two contemporary LLMs, Meta-LLaMA-3.1-8B-Instruct and Qwen3-8B, through various configurations, encompassing base models, Low-Rank Adaptation (LoRA) fine-tuning, and hybrid RAG-based methodologies. The structured pharmacogenomics data from CPIC and the clinical guideline information from ClinPGx are combined to make a huge dataset. To make it easier to find and use in models, the data goes through procedures including merging, cleaning, normalizing, and converting to JSONL format.A hybrid retrieval approach is aimed to enhance factual grounding by integrating lexical filtering with semantic similarity through sentence embeddings. This research use both automatic metrics and manual checks to rate the models on their correctness, relevance, completeness, and clarity. The results reveal that Qwen works well as a basic model, and that LLaMA gets much better when it is used with RAG and LoRA, giving answers that are more aware of the context and therapeutically useful. Fine-tuning alone doesn’t always work, which shows how limited it is to only use parametric data. The results show that accuracy in clinical settings needs to be backed up by consistency, relevance, and evidence. This study demonstrates that employing retrieval methods alongside parameter-efficient fine-tuning enhances the reliability and utility of LLM-based systems in clinical environments. The proposed methodology establishes a scalable framework for the development of trustworthy AI-driven solutions in pharmacogenomics and healthcare decision support.

URI

https://knowledgecommons.lakeheadu.ca/handle/2453/5615

Collections

Electronic Theses and Dissertations from 2009

Full item page

Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By