llamaIndex - ✅(Solved) Fix Header-Aware Deterministic Chunking & Post-RAG Verification Pipeline [2 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21213Fetched 2026-04-08 01:48:47
View on GitHub
Comments
3
Participants
2
Timeline
13
Reactions
0
Timeline (top)
commented ×3cross-referenced ×2labeled ×2mentioned ×2

PR fix notes

PR #21281: feat: add HeaderAwareMarkdownSplitter node parser

Description (problem / solution / changelog)

Summary

Partial fix for #21213 — implements the HeaderAwareMarkdownSplitter component. The VerificationQueryEngine component is being handled separately by @DYNOSuprovo.

Problem

LlamaIndex has two markdown-related parsers with complementary gaps:

  • MarkdownNodeParser — respects headers but has no token size limit (a 50k-token section becomes one node)
  • SentenceSplitter — enforces token limits but has no markdown awareness (severs headers from their content)

Solution

HeaderAwareMarkdownSplitter fills the gap: it keeps each header grouped with its body text when the section fits within chunk_size, and splits oversized sections at paragraph/sentence/word boundaries with the header context prepended to every sub-chunk.

Usage

from llama_index.core.node_parser import HeaderAwareMarkdownSplitter

splitter = HeaderAwareMarkdownSplitter(chunk_size=512)
nodes = splitter.get_nodes_from_documents(documents)

# Each node has header_path metadata: "/Introduction/Setup/"

Key features

  • Header hierarchy tracked in header_path metadata (same convention as MarkdownNodeParser)
  • Deterministic — same input always produces same splits (no embedding model needed)
  • Code fence handling — both backtick (```) and tilde (~~~) fences
  • Multi-level fallback — paragraph → sentence → word boundaries for oversized content
  • Pluggable sub_splitter parameter for custom split strategies (e.g. semantic splitting)

Changes

FileChange
node_parser/file/header_aware_markdown.pyNew HeaderAwareMarkdownSplitter class (~250 lines)
node_parser/file/__init__.pyExport registration
node_parser/__init__.pyTop-level export
tests/node_parser/test_header_aware_markdown.py15 tests

Test plan

  • 15 new tests pass covering:
    • Basic single/multiple section splits
    • Nested header path metadata
    • Oversized section splitting (paragraph, sentence, word boundaries)
    • Code blocks (backtick and tilde fences)
    • Edge cases (empty doc, no headers, header with no body)
    • Determinism
    • Custom separator and custom sub_splitter
  • All 7 existing MarkdownNodeParser tests still pass

Changed files

  • llama-index-core/llama_index/core/node_parser/__init__.py (modified, +4/-0)
  • llama-index-core/llama_index/core/node_parser/file/__init__.py (modified, +4/-0)
  • llama-index-core/llama_index/core/node_parser/file/header_aware_markdown.py (added, +312/-0)
  • llama-index-core/tests/node_parser/test_header_aware_markdown.py (added, +189/-0)

PR #21302: feat: add VerificationQueryEngine component (#21213)

Description (problem / solution / changelog)

Description

Working in parallel with @shivam2407 on Issue #21213.

This PR introduces the VerificationQueryEngine class. It acts as a native Post-RAG guardrail that wraps any existing BaseQueryEngine. After the underlying engine generates a draft response, this component intercepts it and forces a secondary LLM to act as an adversarial judge—verifying the generated claims strictly against the retrieved source_nodes. Output metadata is augmented with verification results, and strict mode can actively block hallucinated responses.

Fixes #21213

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests (Relies entirely on existing core abstractions / BaseQueryEngine).

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-core/llama_index/core/query_engine/__init__.py (modified, +4/-0)
  • llama-index-core/llama_index/core/query_engine/verification_query_engine.py (added, +75/-0)
RAW_BUFFERClick to expand / collapse

Feature Description

The Problem Standard RAG pipelines often suffer from two major issues:

  1. Context Fragmentation: Standard character/token-based splitters often cut paragraphs or sections mid-thought, separating a heading from its subsequent content.
  2. Hallucination Risk: Standard query engines generate answers probabilistically based on retrieved context, without a deterministic step to ensure the final output is exclusively backed by cited quotes.

The Proposal: I have been building a custom extraction pipeline ("Pluto") and would love to contribute two concepts/modules upstream to LlamaIndex if the community finds them valuable:

  1. HeaderAwareMarkdownSplitter (or NodeParser enhancement): A node parser that refuses to split text strictly by token limits if it means severing a Markdown Header (#, ##) from its immediate paragraph. It ensures that chunks remain logically grouped by author intent rather than arbitrary token counts.

  2. VerificationQueryEngine (or Post-Process Node): A wrapper around existing query engines that adds a discrete, final "Audit" LLM call. After the initial response synthesizer generates a draft answer, this post-processor forces an LLM (ideally a larger model acting as a judge) to cross-reference every sentence of the draft against the raw source_nodes. It strips out or flags any sentence that lacks direct quote evidence, returning a "Confidence Score" alongside the final response.

Describe the proposed solution

  • For the Splitter: I would implement a custom NodeParser that uses Regex/AST parsing to identify Markdown structure, ensuring headers and their immediate children stay within the same TextNode whenever possible.
  • For the Verifier: A new component in the query_engine workflow that takes the Response object, runs a strict Boolean/Scoring prompt against it using the source_nodes as grounding truth, and appends the verification metrics to the response metadata.

Alternatives considered

Currently, users have to implement these verification loops manually in their application layer (like I did in my own project). Baking this into the framework as an optional QueryEngine or NodeParser would make building verifiable, enterprise-grade RAG pipelines much easier out-of-the-box.

Additional context

I have already implemented a working version of this logic in my own pipeline using native Python and FastAPI. I am very willing to write the code, tests, and documentation to port this into the LlamaIndex ecosystem as a PR if the maintainers think this aligns with the project's roadmap.

Let me know if you are open to a PR for either of these components!

Reason

No response

Value of Feature

No response

extent analysis

Fix Plan

To address the issues of Context Fragmentation and Hallucination Risk, we will implement two new components: HeaderAwareMarkdownSplitter and VerificationQueryEngine.

HeaderAwareMarkdownSplitter

  1. Create a new class HeaderAwareMarkdownSplitter that inherits from the existing NodeParser.
  2. Override the split method to use Regex/AST parsing to identify Markdown headers and ensure they are not separated from their immediate paragraphs.
  3. Use a library like markdown to parse the Markdown structure.

Example code:

import re
import markdown

class HeaderAwareMarkdownSplitter(NodeParser):
    def split(self, text):
        # Parse Markdown structure
        md = markdown.Markdown()
        tokens = md.parse(text)
        
        # Identify headers and their immediate paragraphs
        headers = []
        for token in tokens:
            if token.type == 'header':
                headers.append(token)
        
        # Split text while keeping headers and paragraphs together
        split_text = []
        current_paragraph = ''
        for line in text.splitlines():
            if re.match(r'^#+', line):
                if current_paragraph:
                    split_text.append(current_paragraph)
                    current_paragraph = ''
                current_paragraph += line + '\n'
            else:
                current_paragraph += line + '\n'
        if current_paragraph:
            split_text.append(current_paragraph)
        
        return split_text

VerificationQueryEngine

  1. Create a new class VerificationQueryEngine that wraps an existing query engine.
  2. Add a new method verify that takes the response object and source nodes as input.
  3. Use a larger language model to cross-reference the response against the source nodes and calculate a confidence score.

Example code:

class VerificationQueryEngine:
    def __init__(self, query_engine):
        self.query_engine = query_engine
    
    def verify(self, response, source_nodes):
        # Use a larger language model to verify the response
        verification_model = LargerLanguageModel()
        confidence_score = 0
        for sentence in response.sentences:
            verification_prompt = f"Is the sentence '{sentence}' supported by the source nodes?"
            verification_response = verification_model(verification_prompt, source_nodes)
            if verification_response == 'yes':
                confidence_score += 1
        confidence_score /= len(response.sentences)
        response.metadata['confidence_score'] = confidence_score
        return response

Verification

To verify the fix, test the HeaderAwareMarkdownSplitter and VerificationQueryEngine components separately and together.

  1. Test the HeaderAwareMarkdownSplitter with sample Markdown text to ensure it splits the text correctly.
  2. Test the VerificationQueryEngine with sample responses and source nodes to ensure it calculates the confidence score correctly.
  3. Integrate the two components into the RAG pipeline and test the end-to-end workflow.

Extra Tips

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix Header-Aware Deterministic Chunking & Post-RAG Verification Pipeline [2 pull requests, 3 comments, 2 participants]