llamaIndex - ✅(Solved) Fix [Feat]: Add a RAG failure mode checklist doc (symptoms to minimal fixes) [2 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20702Fetched 2026-04-08 00:31:24
View on GitHub
Comments
4
Participants
3
Timeline
15
Reactions
0
Timeline (top)
commented ×4referenced ×4cross-referenced ×2labeled ×2

Fix Action

Fix / Workaround

Problem Description In production RAG apps, most failures are not “model is dumb”, it is a small set of repeatable failure modes: retrieval hallucination, wrong chunk selected, index fragmentation, bootstrap ordering races, config drift, etc. Users often cannot name the failure mode, so they patch randomly and lose time.

PR fix notes

PR #20721: docs: add RAG Failure Mode Checklist

Description (problem / solution / changelog)

Summary

Adds a comprehensive RAG Failure Mode Checklist documentation page to help users diagnose and fix common RAG pipeline issues.

Failure Modes Covered

  1. Retrieval Hallucination — retriever returns superficially relevant but wrong chunks
  2. Wrong Chunk Selection (Poor Chunking) — critical context split across chunks
  3. Index Fragmentation — duplicate/outdated/conflicting documents in index
  4. Config Drift — embedding model mismatch between index and query time
  5. Embedding Model Mismatch — wrong model for the domain
  6. Context Window Overflow — too many chunks stuffed into LLM prompt
  7. Missing Metadata Filtering — retrieval not scoped to relevant subset
  8. Poor Query Understanding — ambiguous or short queries
  9. LLM Synthesis Failures — right chunks retrieved but bad answer generated

Each section includes symptoms and minimal fixes referencing LlamaIndex components. Also includes a quick diagnostic flowchart.

Closes #20702

Changed files

  • docs/src/content/docs/framework/optimizing/rag_failure_mode_checklist.md (added, +194/-0)

PR #20760: docs: extend RAG Failure Mode Checklist with advanced failures

Description (problem / solution / changelog)

Follow-up to #20702 and #20721.

This PR keeps the existing RAG Failure Mode Checklist and extends it with a small set of system-level failure families that often show up in production, without changing any of the current recommendations.

Summary of changes

  • Keep sections 1–9 as-is (single-query failures: retrieval, chunking, embeddings, query understanding, synthesis).
  • Add section 10 “Embedding Metric Mismatch (Cosine Score ≠ True Meaning)” to cover cases where the distance metric or normalization does not match how meaning is distributed in the data.
  • Add section 11 “Session and Cache Memory Breaks” for cross-session instability caused by stateless indices, cache keys, or environment changes.
  • Add section 12 “Observability Gaps ("Black-Box Debugging")” to highlight that many issues cannot be fixed before basic traces and logs are in place.
  • Add section 13 “Index Lifecycle and Deployment Ordering” to capture failures caused by empty or half-built indices, wrong snapshot routing, or deployment ordering bugs.
  • Slightly update the introduction and the Quick Diagnostic Flowchart so they point to the new sections when issues appear only in production or after deploys.

All new content is written in a project-native way (no external dependencies or naming schemes) and is based on recurring failure patterns seen in real-world RAG deployments.

Happy to adjust wording, scope, or numbering if you would prefer a slimmer version or a separate “advanced” doc instead of extending this page.

Description

This is a documentation-only change that expands the existing RAG Failure Mode Checklist with several additional failure families that commonly appear in production systems (embedding metric issues, cross-session instability, observability gaps, and index lifecycle / deployment ordering problems).

Related issues: #20702, #20721 (docs follow-up; does not close new issues).

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

This is a documentation-only change; no code paths were modified, so no additional tests were added.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist

  • I have performed a self-review of my own changes
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • docs/src/content/docs/framework/optimizing/rag_failure_mode_checklist.md (modified, +93/-17)
RAW_BUFFERClick to expand / collapse

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hi maintainers, thanks for the project.

Problem Description In production RAG apps, most failures are not “model is dumb”, it is a small set of repeatable failure modes: retrieval hallucination, wrong chunk selected, index fragmentation, bootstrap ordering races, config drift, etc. Users often cannot name the failure mode, so they patch randomly and lose time.

Desired Solution you'd like Add a small doc page: “RAG failure mode checklist”. Structure:

  1. common symptoms users can observe
  2. what to inspect first (retrieval outputs, chunking, embeddings, store health, tracing)
  3. minimal structural fixes, not generic advice
  4. a simple taxonomy mapping so users can quickly classify before changing infra

I can draft a first version as a docs PR if the direction is accepted.

Alternatives Considered Rely on scattered troubleshooting notes and GitHub issues. This works but it is not systematic, and new users repeat the same mistakes.

Additional Context I have a compact 16 mode taxonomy (No.1 to No.16) that I can adapt into a neutral checklist format with minimal references, so it stays useful even without adopting any external tooling. If you prefer, we can keep it fully project native and only use it as an internal naming scheme.

https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

extent analysis

RAG Failure Mode Checklist Solution

Fix Plan

  1. Create a new doc page: Add a new Markdown file in the project's documentation directory, e.g., docs/rag_failure_modes.md.
  2. Define the taxonomy: Use the provided 16-mode taxonomy as a starting point and adapt it into a neutral checklist format. You can use the existing ProblemMap/README.md file as a reference.
  3. List common symptoms: Document common symptoms users can observe for each failure mode, e.g.:
### Retrieval Hallucination
- Unusual or nonsensical output
- Lack of relevance to input query
  1. Specify inspection steps: Outline what to inspect first for each failure mode, e.g.:
### Wrong Chunk Selected
- Check chunking algorithm configuration
- Verify chunking output matches expected input
  1. Provide minimal structural fixes: Document minimal structural fixes for each failure mode, e.g.:
### Index Fragmentation
- Run index rebalancing script
- Monitor index fragmentation metrics
  1. Add a simple taxonomy mapping: Create a simple taxonomy mapping to help users quickly classify failure modes, e.g.:
| Failure Mode | Taxonomy Number |
| --- | --- |
| Retrieval Hallucination | No. 1 |
| Wrong Chunk Selected | No. 3 |
| Index Fragmentation | No. 5 |
  1. Review and refine: Review the checklist with the team and refine it as needed.

Verification

  • Verify that the new doc page is accessible and easily navigable.
  • Test the checklist by simulating different failure modes and verifying that users can correctly identify and address the issue.
  • Monitor user feedback and update the checklist as needed.

Extra Tips

  • Keep the checklist concise and focused on minimal structural fixes.
  • Use clear and concise language to avoid

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix [Feat]: Add a RAG failure mode checklist doc (symptoms to minimal fixes) [2 pull requests, 4 comments, 3 participants]