Evidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation

dc.contributor.advisorAlkhateeb, Abedalrhman
dc.contributor.advisorMoniruzzaman, Md
dc.contributor.authorArafin, Protiva
dc.contributor.committeememberAlsmadi, Malek
dc.contributor.committeememberAhmed, Saad B.
dc.date.accessioned2026-05-14T18:20:58Z
dc.date.created2026
dc.date.issued2026
dc.description.abstractPharmacogenomics (PGx) is very important for personalized medicine since it helps doctors choose the right drugs and doses based on a person’s genetic makeup. But the growing amount and complexity of PGx data, as well as the requirement to understand clinical recommendations, make it harder to make good decisions. This study puts forward a data-driven clinical decision support framework that combines large language models (LLMs) with hybrid retrieval-augmented generation (RAG) to enhance the response to pharmacogenomic questions. The framework assesses two contemporary LLMs, Meta-LLaMA-3.1-8B-Instruct and Qwen3-8B, through various configurations, encompassing base models, Low-Rank Adaptation (LoRA) fine-tuning, and hybrid RAG-based methodologies. The structured pharmacogenomics data from CPIC and the clinical guideline information from ClinPGx are combined to make a huge dataset. To make it easier to find and use in models, the data goes through procedures including merging, cleaning, normalizing, and converting to JSONL format.A hybrid retrieval approach is aimed to enhance factual grounding by integrating lexical filtering with semantic similarity through sentence embeddings. This research use both automatic metrics and manual checks to rate the models on their correctness, relevance, completeness, and clarity. The results reveal that Qwen works well as a basic model, and that LLaMA gets much better when it is used with RAG and LoRA, giving answers that are more aware of the context and therapeutically useful. Fine-tuning alone doesn’t always work, which shows how limited it is to only use parametric data. The results show that accuracy in clinical settings needs to be backed up by consistency, relevance, and evidence. This study demonstrates that employing retrieval methods alongside parameter-efficient fine-tuning enhances the reliability and utility of LLM-based systems in clinical environments. The proposed methodology establishes a scalable framework for the development of trustworthy AI-driven solutions in pharmacogenomics and healthcare decision support.
dc.identifier.urihttps://knowledgecommons.lakeheadu.ca/handle/2453/5615
dc.language.isoen
dc.titleEvidence-grounded clinical pharmacogenomics question answering system using large language models and hybrid retrieval augmentation
dc.typeThesis
etd.degree.disciplineComputer Science
etd.degree.grantorLakehead University
etd.degree.levelMaster
etd.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ArafinP2026m-2b.pdf
Size:
1.81 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.23 KB
Format:
Item-specific license agreed upon to submission
Description: