llamaIndex - 💡(How to fix) Fix [Question]: Measuring hallucination rates in production systems [4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20920Fetched 2026-04-08 00:30:11
View on GitHub
Comments
4
Participants
4
Timeline
5
Reactions
0
Author
Timeline (top)
commented ×4labeled ×1
RAW_BUFFERClick to expand / collapse

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

We've been experimenting with stress testing LLM systems for hallucinations and prompt injection.

Curious how people here measure hallucination rates in production systems?

Thanks! Terry

extent analysis

Fix Plan

Measuring Hallucination Rates in Production Systems

To measure hallucination rates in production systems, you can implement a simple logging mechanism to track and analyze model outputs.

Step 1: Log Model Outputs

Modify your LLM system to log the input prompts and corresponding model outputs. You can use a logging framework like Log4j or Python's built-in logging module.

Step 2: Implement Hallucination Detection

Create a function to detect hallucinations based on the input prompts and model outputs. For example, you can use a simple threshold-based approach:

def detect_hallucination(prompt, output):
    # Define a threshold for hallucination detection (e.g., 10% difference)
    threshold = 0.1
    
    # Calculate the difference between the input prompt and model output
    diff = levenshtein_distance(prompt, output)
    
    # Check if the difference exceeds the threshold
    if diff / len(prompt) > threshold:
        return True  # Hallucination detected
    else:
        return False  # No hallucination detected

Step 3: Log Hallucination Events

Modify your logging mechanism to log hallucination events, including the input prompts, model outputs, and detection results.

Step 4: Analyze Hallucination Rates

Use the logged data to calculate hallucination rates over time. You can use a simple formula:

def calculate_hallucination_rate(log_data):
    hallucinations = 0
    total_requests = 0
    
    for entry in log_data:
        if detect_hallucination(entry['prompt'], entry['output']):
            hallucinations += 1
        total_requests += 1
    
    return hallucinations / total_requests

Step 5: Visualize Hallucination Rates

Use a visualization tool like Grafana or Matplotlib to display hallucination rates over time.

Example

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - 💡(How to fix) Fix [Question]: Measuring hallucination rates in production systems [4 comments, 4 participants]