The UTC Graduate School is pleased to announce that Major Schwartz will present Master’s research titled, A Question to Query LLM as a Pipeline Replacement in Knowledge Graph Question Answering Systems on 10/03/2025 at 5:00 PM EDT in ECS 313-G. Everyone is invited to attend.
Computer Science
Chair: Mengjun Xie
Co-Chair:
Abstract:
Knowledge Graph Question Answering (KGQA) pipelines commonly depend on separate entity and relation predictors before issuing a query to the graph, which introduces engineering complexity and costly inference passes over large vocabularies. This thesis presents a drop-in replacement for those modules: a fine-tuned large language model (LLM) that translates a natural-language question directly into an executable SPARQL query. We fine-tune instruction-tuned backbones, Llama-3.1-8B-Instruct and Mistral-7B-Instruct, using low-rank adaptation (LoRA). This approach allows us to train specific regions of the models on paired (question, gold SPARQL) examples, which are formatted through chat templates. As a result, the models can perform single-step semantic parsing without requiring explicit entity or relation linking. The training and inference pipeline includes a lightweight post-processor that corrects tokenizer-induced spacing artifacts in generated SPARQL, improving exact-match robustness without altering query structure. On a held-out test set, the fine-tuned models achieve 97.9% (Llama) and 94.0% (Mistral) exact-match accuracy for natural-language-to-SPARQL generation, demonstrating that a compact, end-to-end translator can meet or exceed the accuracy typically attributed to multi-module KGQA stacks while substantially simplifying the architecture. Beyond accuracy, the approach removes dependence on graph-specific entity/relation scorers and integrates cleanly into existing KGQA systems as a “swap-in” generator that emits executable queries.