Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation

Arafin, Protiva

Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation

dc.contributor.advisor	Alkhateeb, Abedalrhman
dc.contributor.advisor	Moniruzzaman, Md
dc.contributor.author	Arafin, Protiva
dc.contributor.committeemember	Alsmadi, Malek
dc.contributor.committeemember	Ahmed, Saad B.
dc.date.accessioned	2026-05-14T18:20:58Z
dc.date.created	2026
dc.date.issued	2026
dc.description.abstract	Pharmacogenomics (PGx) is very important for personalized medicine since it helps doctors choose the right drugs and doses based on a person’s genetic makeup. But the growing amount and complexity of PGx data, as well as the requirement to understand clinical recommendations, make it harder to make good decisions. This study puts forward a data-driven clinical decision support framework that combines large language models (LLMs) with hybrid retrieval-augmented generation (RAG) to enhance the response to pharmacogenomic questions. The framework assesses two contemporary LLMs, Meta-LLaMA-3.1-8B-Instruct and Qwen3-8B, through various configurations, encompassing base models, Low-Rank Adaptation (LoRA) fine-tuning, and hybrid RAG-based methodologies. The structured pharmacogenomics data from CPIC and the clinical guideline information from ClinPGx are combined to make a huge dataset. To make it easier to find and use in models, the data goes through procedures including merging, cleaning, normalizing, and converting to JSONL format.A hybrid retrieval approach is aimed to enhance factual grounding by integrating lexical filtering with semantic similarity through sentence embeddings. This research use both automatic metrics and manual checks to rate the models on their correctness, relevance, completeness, and clarity. The results reveal that Qwen works well as a basic model, and that LLaMA gets much better when it is used with RAG and LoRA, giving answers that are more aware of the context and therapeutically useful. Fine-tuning alone doesn’t always work, which shows how limited it is to only use parametric data. The results show that accuracy in clinical settings needs to be backed up by consistency, relevance, and evidence. This study demonstrates that employing retrieval methods alongside parameter-efficient fine-tuning enhances the reliability and utility of LLM-based systems in clinical environments. The proposed methodology establishes a scalable framework for the development of trustworthy AI-driven solutions in pharmacogenomics and healthcare decision support.
dc.identifier.uri	https://knowledgecommons.lakeheadu.ca/handle/2453/5615
dc.language.iso	en
dc.title	Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation
dc.type	Thesis
etd.degree.discipline	Computer Science
etd.degree.grantor	Lakehead University
etd.degree.level	Master
etd.degree.name	Master of Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ArafinP2026m-2b.pdf
Size:: 1.81 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.23 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations from 2009