langchain - 💡(How to fix) Fix Post-RAG Deterministic Verification Runnable & Header-Aware Splitter Checked other resources [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36354Fetched 2026-04-08 01:48:44
View on GitHub
Comments
2
Participants
2
Timeline
13
Reactions
0
Timeline (top)
labeled ×7commented ×2closed ×1issue_type_added ×1

Root Cause

But these don't work natively for runtime blocking because evaluators like Ragas are typically used for offline metrics/testing, not as active, blocking filters in a live production pipeline protecting end-users.

Code Example

from langchain_core.runnables import RunnablePassthrough
from langchain.chat_models import ChatGroq
from langchain_core.verification import RunnableFactChecker

# Standard RAG components
retriever = vectorstore.as_retriever()
draft_llm = ChatGroq(model="llama-3.1-8b-instant")
verifier_llm = ChatGroq(model="llama-3.3-70b-versatile") 

# The Proposed Component
fact_checker = RunnableFactChecker(
    llm=verifier_llm, 
    strict_mode=True # Strips unsupported claims from the final output
)

# Implementation
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | draft_llm
    | fact_checker # The new step
)

result = chain.invoke("What are the terms of the contract?")
print(result.content) 
print(result.response_metadata['confidence_score']) # Returns 0.0 to 1.0 based on evidence


### Alternatives Considered


### Alternatives Considered
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Feature Description

I would like LangChain to support a deterministic "Verification Loop" as a native Runnable component for LCEL chains, as well as a more robust HeaderAwareMarkdownSplitter.

Currently, RAG pipelines built with LangChain generally end at the generation step. The LLM produces an AIMessage based on retrieved context, but there is no built-in, native component to force a deterministic audit of that message.

This feature would allow users to append a RunnableVerifier to the end of an LCEL chain. This verifier would explicitly take the generated output and the original context documents, prompt a secondary (often larger) LLM to Boolean-score every sentence against the source material, and append a "Confidence/Grounding Score" to the final output or strip out unsupported hallucinated claims entirely. Additionally, a HeaderAwareMarkdownSplitter would ensure documents are split by structure rather than strict chunk-size, preventing context separation before retrieval.

Use Case

I'm trying to build an enterprise-level, compliance-heavy application (Legal/Medical RAG) where hallucination is absolutely unacceptable.

Currently, I have to work around this by building custom, complex orchestration outside of standard LCEL. I have to parse the generated output, feed it back into an entirely new custom chain along with the original retrieved Documents, and manage the scoring logic manually.

This feature would help users to easily enforce strict hallucination checks seamlessly within LCEL (e.g., retriever | prompt | llm | verifier_runnable). It brings production-ready safety directly into the default LangChain workflow, transforming RAG from probabilistic generation into deterministic extraction.

Proposed Solution

I think this could be implemented by creating a RunnableVerification component in langchain-core.

The API could look like navigating standard LCEL:

from langchain_core.runnables import RunnablePassthrough
from langchain.chat_models import ChatGroq
from langchain_core.verification import RunnableFactChecker

# Standard RAG components
retriever = vectorstore.as_retriever()
draft_llm = ChatGroq(model="llama-3.1-8b-instant")
verifier_llm = ChatGroq(model="llama-3.3-70b-versatile") 

# The Proposed Component
fact_checker = RunnableFactChecker(
    llm=verifier_llm, 
    strict_mode=True # Strips unsupported claims from the final output
)

# Implementation
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | draft_llm
    | fact_checker # The new step
)

result = chain.invoke("What are the terms of the contract?")
print(result.content) 
print(result.response_metadata['confidence_score']) # Returns 0.0 to 1.0 based on evidence


### Alternatives Considered


### Alternatives Considered
```markdown
I've tried using standard post-processing techniques and custom evaluation libraries (like Ragas or LangSmith evaluators). 

Alternative approaches I considered:
1. Using LangSmith evaluators (e.g., `labeled_criteria`).
2. Writing custom parallel chains to evaluate the output.

But these don't work natively for *runtime blocking* because evaluators like Ragas are typically used for offline metrics/testing, not as active, blocking filters in a live production pipeline protecting end-users.


### Additional Context

I recently built a custom extraction pipeline in raw Python solving this exact problem, combining small routing models with large verification models. I am highly motivated to contribute to this! I am willing to write the PR, tests, and documentation for either the `RunnableVerifier` or the `HeaderAwareMarkdownSplitter` if the core maintainers believe this aligns with LangChain's roadmap for safer RAG generation.

extent analysis

Fix Plan

To implement the proposed RunnableVerification component, follow these steps:

  1. Create a new class: Define a RunnableFactChecker class in langchain_core.verification that inherits from Runnable.
  2. Initialize the verifier: In the __init__ method, initialize the verifier_llm and set strict_mode to control the behavior.
  3. Implement the verification logic: In the run method, use the verifier_llm to score the generated output against the original context documents.
  4. Append the confidence score: Append the confidence score to the final output or strip out unsupported claims if strict_mode is True.

Example code:

from langchain_core.runnables import Runnable
from langchain.chat_models import ChatGroq

class RunnableFactChecker(Runnable):
    def __init__(self, llm: ChatGroq, strict_mode: bool = True):
        self.llm = llm
        self.strict_mode = strict_mode

    def run(self, input: dict) -> dict:
        # Score the generated output against the original context documents
        score = self.llm.score(input["output"], input["context"])
        # Append the confidence score to the final output
        input["response_metadata"]["confidence_score"] = score
        # Strip out unsupported claims if strict_mode is True
        if self.strict_mode and score < 0.5:
            input["output"] = ""
        return input
  1. Integrate with LCEL: Use the RunnableFactChecker component in your LCEL chain as shown in the proposed solution.

Verification

To verify that the fix worked, test the RunnableFactChecker component with different inputs and scenarios, including:

  • Valid output with high confidence score
  • Invalid output with low confidence score
  • Edge cases, such as empty input or missing context

Example test code:

import unittest
from langchain_core.runnables import RunnableFactChecker
from langchain.chat_models import ChatGroq

class TestRunnableFactChecker(unittest.TestCase):
    def test_valid_output(self):
        # Test with valid output and high confidence score
        input = {"output": "This is a valid output", "context": ["This is a valid context"]}
        fact_checker = RunnableFactChecker(ChatGroq(model="llama-3.3-70b-versatile"))
        output = fact_checker.run(input)
        self.assertGreater(output["response_metadata"]["confidence_score"], 0.5)

    def test_invalid_output(self):
        # Test with invalid output and low confidence score
        input = {"output": "This is an invalid output", "context": ["This is a valid context"]}
        fact_checker = RunnableFactChecker(ChatGroq(model="llama-3.3-70b-versatile"))
        output = fact_checker

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - 💡(How to fix) Fix Post-RAG Deterministic Verification Runnable & Header-Aware Splitter Checked other resources [2 comments, 2 participants]