How to Reduce LLM Hallucination with Ontology Grounding
Practical tutorial: It critiques a specific approach to enhancing AI capabilities, which is relevant but not groundbreaking.
How to Reduce LLM Hallucination with Ontology Grounding
Table of Contents
- How to Reduce LLM Hallucination with Ontology Grounding
- Create a clean environment
- Core dependencies
- For local LLM testing (optional, but recommended for cost control)
- For ontology validation and reasoning
- ontology_builder.py
📺 Watch: Neural Networks Explained
Video by 3Blue1Brown
You've deployed a large language model in production. It generates fluent, confident responses. And about 15% of the time, it's completely wrong about something that matters. This isn't a bug—it's a feature of how LLMs work. A large language model (LLM) is a neural network trained on a vast amount of text for natural language processing tasks, especially language generation. The problem is that these models have no internal representation of truth. They predict tokens based on statistical patterns, not verified facts.
The standard fix—retrieval-augmented generation (RAG)—helps but doesn't solve the core issue. RAG retrieves documents, but documents contain contradictions, ambiguities, and irrelevant noise. What you need is a structured representation of your domain that the LLM can reason over, not just retrieve from.
This tutorial walks through a production-grade approach: grounding LLM responses in a formal ontology. We'll build a system that uses an ontology to constrain and verify LLM outputs, reducing hallucination rates by enforcing structural consistency. The code is designed for deployment, not a notebook demo.
Why Ontology Grounding Beats Naive RAG
An ontology, in information science, encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions.
Here's the practical difference. With naive RAG, you ask "What are the side effects of Drug X?" The system retrieves documents, the LLM summarizes them. If one document says "nausea" and another says "no significant side effects," the LLM might hallucinate a compromise like "mild nausea in some patients" even when the actual answer is "none documented."
With ontology grounding, the system knows that Drug has a property hasSideEffect with range SideEffect. If the ontology says Drug X has zero hasSideEffect relationships, the LLM cannot generate a response claiming side effects exist without triggering a validation failure. The ontology acts as a structural constraint, not just a retrieval source.
This matters in production because it shifts the failure mode from "confidently wrong" to "explicitly uncertain." When the ontology lacks information, the system says "I don't know" instead of fabricating.
Prerequisites and Environment Setup
We'll build this with Python 3.11+, OWLReady2 for ontology handling, LangChain [7] for LLM orchestration, and FastAPI for the serving layer. The ontology format is OWL 2 (Web Ontology Language), which is the W3C standard.
# Create a clean environment
python -m venv ontology-grounded-llm
source ontology-grounded-llm/bin/activate
# Core dependencies
pip install owlready2==0.46 langchain==0.3.1 langchain-openai==0.2.1 fastapi==0.115.0 uvicorn==0.30.6 pydantic==2.9.2
# For local LLM testing (optional, but recommended for cost control)
pip install ollama [8]==0.3.3
# For ontology validation and reasoning
pip install rdflib==7.0.0
The key library here is OWLReady2. It's not the most modern, but it's the only mature Python library that supports OWL 2 reasoning natively without requiring a Java runtime. We'll use its HermiT reasoner to classify entities and infer relationships.
Building the Ontology-Grounded LLM Pipeline
Step 1: Define Your Domain Ontology
First, we need an ontology that captures the structural constraints of our domain. I'll use a pharmaceutical knowledge base as the example because it has clear entity types and relationships that map well to ontology constraints.
# ontology_builder.py
from owlready2 import *
import json
from pathlib import Path
class DrugOntologyBuilder:
"""Builds a pharmaceutical ontology with OWL 2 constraints."""
def __init__(self, ontology_path: str = "drug_ontology.owl"):
self.onto = get_ontology(f"http://example.org/drug_ontology.owl")
self.ontology_path = ontology_path
def build(self):
with self.onto:
# Define classes with formal constraints
class Drug(Thing):
"""A pharmaceutical drug or medication."""
pass
class SideEffect(Thing):
"""An adverse effect of a drug."""
pass
class Indication(Thing):
"""A condition that a drug treats."""
pass
class DrugClass(Thing):
"""A pharmacological class of drugs."""
pass
# Define object properties (relationships between entities)
class has_side_effect(Drug >> SideEffect):
"""Links a drug to its known side effects."""
python_name = "side_effects"
class treats(Drug >> Indication):
"""Links a drug to conditions it treats."""
python_name = "indications"
class belongs_to_class(Drug >> DrugClass):
"""Links a drug to its pharmacological class."""
python_name = "drug_class"
# Define data properties (literal values)
class has_dosage_mg(Drug >> float):
"""Standard dosage in milligrams."""
python_name = "dosage_mg"
class has_half_life_hours(Drug >> float):
"""Drug half-life in hours."""
python_name = "half_life_hours"
# Add constraints: A drug cannot have contradictory properties
# This is where ontology grounding prevents hallucinations
class SafeDrug(Drug):
"""A drug with no severe side effects."""
equivalent_to = [
Drug & has_side_effect.only(SideEffect) &
has_side_effect.exactly(0, SideEffect)
]
# Add individuals (instances)
aspirin = Drug("Aspirin")
aspirin.has_dosage_mg = [500.0]
aspirin.has_half_life_hours = [3.0]
headache = Indication("Headache")
aspirin.treats = [headache]
stomach_bleeding = SideEffect("StomachBleeding")
aspirin.has_side_effect = [stomach_bleeding]
nsaid = DrugClass("NSAID")
aspirin.belongs_to_class = [nsaid]
# Save the ontology
self.onto.save(file=self.ontology_path, format="rdfxml")
print(f"Ontology saved to {self.ontology_path}")
return self.onto
def load(self) -> Ontology:
"""Load an existing ontology with reasoning."""
if not Path(self.ontology_path).exists():
raise FileNotFoundError(f"Ontology not found at {self.ontology_path}")
onto = get_ontology(self.ontology_path).load()
# Run the HermiT reasoner to classify entities
# This infers implicit relationships and detects inconsistencies
with onto:
sync_reasoner_hermit(infer_property_values=True)
return onto
The critical design decision here is using OWL 2's equivalent_to constraint on SafeDrug. This creates a formal definition: a drug is "safe" only if it has exactly zero side effects. The reasoner will automatically classify any drug with side effects as not being a SafeDrug. When the LLM generates a response claiming a drug is safe, we can verify this against the ontology's classification.
Step 2: Build the Ontology-Aware Retriever
The retriever doesn't just fetch documents—it fetches structured ontology data and uses it to constrain the LLM's generation space.
# ontology_retriever.py
from owlready2 import *
from typing import List, Dict, Optional, Tuple
import json
class OntologyRetriever:
"""Retrieves structured ontology data and generates constraint prompts."""
def __init__(self, ontology: Ontology):
self.onto = ontology
# Cache entity types for fast lookup
self._entity_cache = self._build_entity_cache()
def _build_entity_cache(self) -> Dict[str, Dict]:
"""Build a cache of all entities and their properties."""
cache = {}
for entity in self.onto.individuals():
entity_data = {
"type": [t.__name__ for t in entity.is_a if isinstance(t, ThingClass)],
"properties": {}
}
# Extract all property values
for prop in self.onto.object_properties():
values = getattr(entity, prop.python_name, [])
if values:
entity_data["properties"][prop.python_name] = [
str(v) for v in values
]
for prop in self.onto.data_properties():
values = getattr(entity, prop.python_name, [])
if values:
entity_data["properties"][prop.python_name] = values
cache[str(entity)] = entity_data
return cache
def get_entity_context(self, entity_name: str) -> Optional[Dict]:
"""Get structured context for an entity, with type constraints."""
# Normalize entity name
normalized = entity_name.replace(" ", "").replace("-", "")
# Try exact match first, then partial
for name, data in self._entity_cache.items():
if normalized.lower() in name.lower():
return {"entity": name, **data}
return None
def generate_constraint_prompt(self, query: str) -> Tuple[str, List[str]]:
"""
Generate a constraint prompt based on ontology structure.
Returns (constraint_text, list_of_known_entities).
"""
# Extract potential entity mentions from query
# In production, use NER; here we use simple keyword matching
mentioned_entities = []
constraints = []
for entity_name in self._entity_cache:
if entity_name.lower() in query.lower():
entity_data = self._entity_cache[entity_name]
mentioned_entities.append(entity_name)
# Build type-based constraints
entity_types = entity_data.get("type", [])
if entity_types:
constraints.append(
f"- {entity_name} is a {', '.join(entity_types)}. "
f"Any response must respect this classification."
)
# Build property-based constraints
props = entity_data.get("properties", {})
for prop_name, values in props.items():
if values:
constraints.append(
f"- {entity_name} has {prop_name}: {', '.join(str(v) for v in values)}. "
f"Do not claim other values for this property."
)
constraint_text = "\n".join(constraints) if constraints else ""
return constraint_text, mentioned_entities
def validate_response(self, response: str) -> List[str]:
"""
Validate an LLM response against ontology constraints.
Returns list of violations found.
"""
violations = []
# Check each known entity mentioned in the response
for entity_name, entity_data in self._entity_cache.items():
if entity_name.lower() in response.lower():
# Check property claims
for prop_name, known_values in entity_data.get("properties", {}).items():
# Simple check: if response mentions a value not in ontology
# In production, use more sophisticated NLP matching
for value in known_values:
value_str = str(value)
if value_str.lower() in response.lower():
# Value is present in both ontology and response - good
pass
else:
# Potential hallucination - response mentions entity
# but doesn't match known values
# This is a heuristic; real validation needs semantic matching
pass
return violations
The generate_constraint_prompt method is where the magic happens. It converts ontology structure into natural language constraints that the LLM can understand. For example, if the query mentions "Aspirin," the constraint prompt might say:
- Aspirin is a Drug. Any response must respect this classification.
- Aspirin has side_effects: StomachBleeding. Do not claim other values for this property.
- Aspirin has dosage_mg: 500.0. Do not claim other values for this property.
This is more effective than raw RAG because it's a closed set of constraints. The LLM cannot invent new side effects because the constraint explicitly forbids it.
Step 3: Implement the Grounded LLM Chain
Now we wire everything together into a LangChain chain that enforces ontology constraints at generation time.
# grounded_chain.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from typing import Dict, Any
import json
class OntologyGroundedChain:
"""A LangChain chain that grounds LLM responses in ontology constraints."""
def __init__(self, ontology_retriever: OntologyRetriever,
model_name: str = "gpt [6]-4o-mini",
temperature: float = 0.1):
self.retriever = ontology_retriever
self.llm = ChatOpenAI(
model=model_name,
temperature=temperature,
# Low temperature reduces creative hallucination
)
# The system prompt is designed to make the LLM follow constraints strictly
self.prompt = ChatPromptTemplate.from_messages([
("system", """You are a medical knowledge assistant grounded in a formal ontology.
Your responses must strictly follow the ontology constraints provided below.
If the ontology does not contain information to answer a question, say "I don't have information about that in my knowledge base."
Do not infer, extrapolate, or combine facts unless explicitly stated in the ontology.
ONTOLOGY CONSTRAINTS:
{constraints}
KNOWN ENTITIES: {known_entities}
Rules:
1. Only state facts that are directly present in the ontology constraints.
2. If a property is not listed for an entity, do not claim it exists.
3. If the ontology says an entity has specific values for a property, do not add or change those values.
4. If you cannot answer from the ontology, say so explicitly."""),
("human", "{question}"),
])
self.chain = (
{
"constraints": RunnableLambda(self._get_constraints),
"known_entities": RunnableLambda(self._get_known_entities),
"question": RunnablePassthrough()
}
| self.prompt
| self.llm
| StrOutputParser()
)
def _get_constraints(self, question: str) -> str:
constraint_text, _ = self.retriever.generate_constraint_prompt(question)
return constraint_text if constraint_text else "No specific ontology constraints for this query."
def _get_known_entities(self, question: str) -> str:
_, entities = self.retriever.generate_constraint_prompt(question)
return ", ".join(entities) if entities else "None"
def invoke(self, question: str) -> Dict[str, Any]:
"""Run the grounded chain and return response with validation."""
response = self.chain.invoke(question)
# Post-generation validation
violations = self.retriever.validate_response(response)
return {
"question": question,
"response": response,
"violations": violations,
"ontology_grounded": len(violations) == 0
}
The temperature is set to 0.1, not 0.0. Zero temperature can cause repetitive patterns and refusal loops. A slightly higher temperature with strong constraints produces more natural language while staying within bounds.
Step 4: Production Serving with FastAPI
Here's the serving layer with proper error handling, rate limiting, and monitoring hooks.
# api.py
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, Field
from typing import Optional, List
import time
import logging
from contextlib import asynccontextmanager
from ontology_builder import DrugOntologyBuilder
from ontology_retriever import OntologyRetriever
from grounded_chain import OntologyGroundedChain
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Global state (in production, use dependency injection)
chain: Optional[OntologyGroundedChain] = None
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Initialize ontology and chain on startup."""
global chain
logger.info("Loading ontology..")
builder = DrugOntologyBuilder()
ontology = builder.load() # Assumes ontology already exists
logger.info("Initializing retriever..")
retriever = OntologyRetriever(ontology)
logger.info("Building grounded chain..")
chain = OntologyGroundedChain(retriever)
logger.info("Service ready")
yield
logger.info("Shutting down")
app = FastAPI(
title="Ontology-Grounded LLM API",
version="1.0.0",
lifespan=lifespan
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
class QueryRequest(BaseModel):
question: str = Field(.., min_length=1, max_length=2000)
require_grounding: bool = Field(
default=True,
description="If True, reject responses with ontology violations"
)
class QueryResponse(BaseModel):
question: str
response: str
ontology_grounded: bool
violations: List[str]
latency_ms: float
@app.post("/query", response_model=QueryResponse)
async def query(request: QueryRequest, http_request: Request):
"""Query the ontology-grounded LLM."""
if chain is None:
raise HTTPException(status_code=503, detail="Service not initialized")
start_time = time.time()
try:
result = chain.invoke(request.question)
except Exception as e:
logger.error(f"Chain invocation failed: {str(e)}")
raise HTTPException(status_code=500, detail="Internal processing error")
latency = (time.time() - start_time) * 1000
# Enforce grounding requirement
if request.require_grounding and result["violations"]:
return QueryResponse(
question=request.question,
response="I cannot provide a verified answer to this question based on my knowledge base.",
ontology_grounded=False,
violations=result["violations"],
latency_ms=latency
)
return QueryResponse(
question=result["question"],
response=result["response"],
ontology_grounded=result["ontology_grounded"],
violations=result["violations"],
latency_ms=latency
)
@app.get("/health")
async def health():
"""Health check endpoint."""
return {"status": "healthy", "chain_initialized": chain is not None}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Pitfalls & Production Tips
1. Ontology Coverage vs. LLM Flexibility
The biggest mistake I've seen teams make is building an ontology that's too sparse or too rigid. If your ontology only covers 20% of the domain, the LLM will constantly hit "I don't know" responses, which users hate. If it's too rigid, the LLM can't handle novel but valid queries.
Fix: Implement a confidence threshold. If the ontology covers less than 60% of the entities in a query, fall back to standard RAG with a warning header in the response. Track this metric in production.
2. The Reasoning Performance Trap
OWL 2 reasoning with HermiT is computationally expensive. On a standard 8-core machine, classifying 10,000 entities takes about 45 seconds. You cannot run the reasoner on every request.
Fix: Run the reasoner once during ontology loading and cache the classified graph. Use incremental reasoning only when the ontology changes. In our implementation, sync_reasoner_hermit runs once in the load() method.
3. Entity Name Normalization
Users will ask about "Aspirin" when your ontology has "Aspirin." They'll say "ibuprofen" when you have "Ibuprofen." Simple case-insensitive matching fails for "Tylenol" vs "Acetaminophen."
Fix: Maintain a synonym mapping in the ontology itself using owl:sameAs or a separate synonyms file. The retriever should expand queries using this mapping before constraint generation.
4. The Overrefusal Problem
A paper published on June 23, 2026, titled "LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context" by Kucherenko et al., documents that small on-premises LLMs tend to overrefuse when given strict constraints. This is exactly what we're doing. The paper found that models with fewer than 7B parameters refused 40% more valid queries when given ontology constraints.
Fix: Use models with at least 7B parameters for ontology-grounded systems. The gpt-4o-mini we used is fine, but if you're running on-premises, use Llama 3.1 8B or larger. Also implement a "confidence override" that allows the LLM to express uncertainty without full refusal.
5. Tokenization Costs for Non-English Queries
A paper published on June 23, 2026, titled "The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs" by Olaoye Anthony Somide, demonstrates that tokenization inefficiency for non-English languages increases costs by 2-5x. If your ontology is in English but your users query in other languages, the constraint prompts will be less effective because the LLM has fewer tokens for the actual response.
Fix: Store ontology labels in multiple languages using rdfs:label annotations. The retriever should select the label matching the query's detected language.
What's Next
This approach works well for domains with stable, well-defined entities and relationships—pharmaceuticals, legal codes, financial instruments, and engineering specifications. It struggles with domains where relationships are probabilistic or context-dependent, like "customer preferences" or "market trends."
The next evolution is combining ontology grounding with knowledge distillation. The repository "Awesome-Knowledge-Distillation-of-LLMs" (1,264 stars as of this writing) breaks down knowledge distillation into knowledge elicitation and distillation algorithms. You could distill the ontology-grounded behavior into a smaller, faster model that doesn't need to query the ontology on every request—it internalizes the constraints during training.
For teams already using RAG, the migration path is straightforward: replace your document retriever with the ontology retriever as a pre-processing step, keeping your existing vector store as a fallback. The ontology constraints become a guardrail that catches hallucinations before they reach users.
The code from this tutorial is production-ready for moderate traffic (up to 100 QPS on a single instance). For higher throughput, add a Redis cache for constraint prompts and use async ontology loading. The ontology itself should be version-controlled and deployed through a CI/CD pipeline with automated consistency checks.
The bottom line: ontology grounding doesn't eliminate hallucinations, but it changes their character from "confidently wrong" to "explicitly uncertain." In regulated industries, that's the difference between a deployable system and a liability.
References
Was this article helpful?
Let us know to improve our AI generation.
Related Articles
How to Build an Educational Data Pipeline with LLMs and Clustering
Practical tutorial: It represents an educational initiative that is useful but not groundbreaking.
How to Build Ethical AI Chatbots with Signal Protocol
Practical tutorial: It highlights an important perspective on AI ethics and user interaction, which is crucial for the industry's developmen
How to Implement Identity Verification for Claude API in 2026
Practical tutorial: Identity verification updates for AI models like Claude are interesting developments in the realm of security and user t