langchain - ✅(Solved) Fix PIIMiddleware with custom detector raises KeyError: 'value' for hash and mask strategies [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#35647Fetched 2026-04-08 00:25:17
View on GitHub
Comments
0
Participants
1
Timeline
11
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×3referenced ×2closed ×1

Problem: PIIMiddleware fails with KeyError: 'value' when using a custom detector function with hash or mask strategies. Expected Behavior: According to the PIIMiddleware documentation, custom detector functions should return a list of dictionaries with text, start, and end keys: pythondef detector(content: str) -> list[dict[str, str | int]]: return [ {"text": "matched_text", "start": 0, "end": 12}, ] This should work with all strategies: block, redact, hash, and mask. Actual Behavior:

✅ Strategies redact and block work fine ❌ Strategies hash and mask raise KeyError: 'value'

What Works: Using regex string directly instead of custom function: pythonPIIMiddleware( pii_type="indian_phone", detector=r'+91[\s.-]?[6-9]\d{9}', # Works! strategy="hash", apply_to_input=True ) Root Cause: The error traceback shows the middleware code is trying to access match['value'] at line 301-302, but this key is not part of the documented return format and is not being provided by the custom detector. Suggested Fix: Either:

Update the middleware code to work with the documented format (text, start, end) Update documentation to specify that value key is required

Error Message

import re from langchain.agents.middleware import PIIMiddleware from langchain.agents import create_agent from langchain.chat_models import init_chat_model from langchain.messages import HumanMessage

model = init_chat_model(model="groq:qwen/qwen3-32b")

def detect_indian_phone(content: str) -> list[dict]: """Custom detector for Indian phone numbers""" matches = [] pattern = r'+91[\s.-]?[6-9]\d{9}'

for match in re.finditer(pattern, content):
    matches.append({
        "text": match.group(0),
        "start": match.start(),
        "end": match.end(),
    })
return matches

Create agent with custom detector

agent = create_agent( model=model, middleware=[ PIIMiddleware( pii_type="indian_phone", detector=detect_indian_phone, strategy="hash", # Also fails with "mask" apply_to_input=True ) ] )

This raises KeyError: 'value'

response = agent.invoke({ "messages": [HumanMessage(content="Contact me at +91 9876543210")] })

Root Cause

What Works: Using regex string directly instead of custom function: pythonPIIMiddleware( pii_type="indian_phone", detector=r'+91[\s.-]?[6-9]\d{9}', # Works! strategy="hash", apply_to_input=True ) Root Cause: The error traceback shows the middleware code is trying to access match['value'] at line 301-302, but this key is not part of the documented return format and is not being provided by the custom detector. Suggested Fix: Either:

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Other Dependencies

aiohttp: 3.13.3 dataclasses-json: 0.6.7 filetype: 1.2.0 google-genai: 1.63.0 groq: 0.37.1 httpx: 0.28.1 httpx-sse: 0.4.3 jsonpatch: 1.33 langgraph: 1.0.8 numpy: 2.4.2 openai: 2.20.0 orjson: 3.11.7 packaging: 26.0 pydantic: 2.12.5 pydantic-settings: 2.12.0 pyyaml: 6.0.3 PyYAML: 6.0.3 requests: 2.32.5 requests-toolbelt: 1.0.0 sqlalchemy: 2.0.46 SQLAlchemy: 2.0.46 tenacity: 9.1.4 tiktoken: 0.12.0 typing-extensions: 4.15.0 uuid-utils: 0.14.0 xxhash: 3.6.0 zstandard: 0.25.0

PR fix notes

PR #35651: fix(langchain): normalize custom detector output to prevent KeyError in hash/mask strategies

Description (problem / solution / changelog)

Summary

Fixes #35647 — PIIMiddleware raises KeyError: 'value' when using a custom callable detector with hash or mask strategies.

Root Cause

Custom callable detectors may return dicts with "text" instead of "value" (as shown in the documentation examples):

def detector(content: str) -> list[dict]:
    return [{"text": "matched", "start": 0, "end": 7}]

Built-in and regex detectors produce proper PIIMatch objects with type, value, start, end. But custom callables are returned as-is from resolve_detector(). When _apply_hash_strategy or _apply_mask_strategy access match["value"], they fail with KeyError.

The redact and block strategies only use match["type"], match["start"], and match["end"] — which is why they work.

Fix

In resolve_detector(), wrap custom callable detectors in a normalizing function that:

  • Maps "text""value" if "value" is missing
  • Adds the pii_type as "type" if missing
  • Produces proper PIIMatch objects for all downstream strategies

This is consistent with how regex string detectors are already wrapped (lines 362-375).

Tests

All 53 tests pass (51 existing + 2 new):

  • test_custom_callable_detector_with_text_key_hash — verifies hash strategy works with text-only dicts
  • test_custom_callable_detector_with_text_key_mask — verifies mask strategy works with text-only dicts

Changes

FileChange
libs/langchain_v1/langchain/agents/middleware/_redaction.pyWrap custom callable in normalizing detector
libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_pii.py2 new regression tests

Changed files

  • libs/langchain_v1/langchain/agents/middleware/_redaction.py (modified, +19/-1)
  • libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_pii.py (modified, +52/-0)

PR #35653: fix: handle custom PII detectors returning 'text' instead of 'value'

Description (problem / solution / changelog)

Summary

Fixes #35647

Custom PII detectors may return match dicts with text instead of value (following common regex match conventions) and may omit the type key. This causes KeyError: 'value' when hash and mask strategies try to access match['value'].

Changes

Added a _normalize_match() helper in _redaction.py that:

  • Accepts text as an alias for value
  • Auto-populates type from the configured pii_type if missing
  • Wraps custom callable detectors with normalization on output

Built-in detectors already return properly-typed PIIMatch dicts, so this only affects custom detectors.

Reproduction

def detect_phone(content: str) -> list[dict]:
    matches = []
    for match in re.finditer(r'\+91[\s.-]?[6-9]\d{9}', content):
        matches.append({
            "text": match.group(0),  # "text" not "value"
            "start": match.start(),
            "end": match.end(),
        })
    return matches

# Before fix: KeyError: 'value'
# After fix: works correctly
PIIMiddleware("phone", detector=detect_phone, strategy="hash")

Test plan

  • Verify custom detectors with text key work with all strategies
  • Verify built-in detectors still work unchanged
  • Verify existing tests pass

🤖 Generated with Claude Code

Changed files

  • libs/langchain_v1/langchain/agents/middleware/_redaction.py (modified, +26/-1)

PR #35662: langchain: fix KeyError on custom PII detector with hash/mask strategies

Description (problem / solution / changelog)

Summary

Fixes #35647

When a custom detector callable is provided to PIIMiddleware, it returns dicts with "text", "start", and "end" keys per the documented API. However, _apply_hash_strategy and _apply_mask_strategy access match["value"], which doesn't exist in the custom detector's output, causing KeyError: 'value'.

The "redact" and "block" strategies happen to work because they only access match["type"], match["start"], and match["end"] — never match["value"].

Root Cause

In resolve_detector(), when the detector is a callable (not None and not a str regex), it's returned directly without normalization:

# line 376
return detector

Built-in detectors and regex detectors always produce proper PIIMatch dicts with value key, but custom callables return {"text": ..., "start": ..., "end": ...} per the docs.

Fix

Wrap the custom callable to normalize its output into proper PIIMatch TypedDicts:

  • match.get("value", match.get("text", "")) — accepts both value and text keys for backwards compatibility
  • match.get("type", pii_type) — defaults the type to the configured pii_type if not provided

This is consistent with how regex_detector already normalizes regex match objects into PIIMatch dicts in the same function.

Test Plan

  • Custom detector with "text" key works with all four strategies (block, redact, mask, hash)
  • Custom detector with "value" key still works (backwards compatible)
  • Built-in detectors and regex string detectors are unaffected (different code paths)

This PR was authored by Claude, Anthropic's AI assistant, as part of an effort to contribute meaningfully to open-source projects. More context: https://x.com/MaxwellCalkin/status/1898441007704731844

Changed files

  • libs/langchain_v1/langchain/agents/middleware/_redaction.py (modified, +13/-1)

Code Example

import re
from langchain.agents.middleware import PIIMiddleware
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage

model = init_chat_model(model="groq:qwen/qwen3-32b")

def detect_indian_phone(content: str) -> list[dict]:
    """Custom detector for Indian phone numbers"""
    matches = []
    pattern = r'\+91[\s.-]?[6-9]\d{9}'
    
    for match in re.finditer(pattern, content):
        matches.append({
            "text": match.group(0),
            "start": match.start(),
            "end": match.end(),
        })
    return matches

# Create agent with custom detector
agent = create_agent(
    model=model,
    middleware=[
        PIIMiddleware(
            pii_type="indian_phone",
            detector=detect_indian_phone,
            strategy="hash",  # Also fails with "mask"
            apply_to_input=True
        )
    ]
)

# This raises KeyError: 'value'
response = agent.invoke({
    "messages": [HumanMessage(content="Contact me at +91 9876543210")]
})

---

## **Error Message and Stack Trace**

KeyError                                  Traceback (most recent call last)
Cell In[45], line 40
     28 agent = create_agent(
     29     model = model,
     30     middleware= [
   (...)
     37     ]
     38 )
---> 40 response = agent.invoke({
     41     "messages": [
     42         HumanMessage(content="Contact me at +91 9876543210")
     43     ]
     44 })
     45 print(response)

File c:\Users\Dalbir\Downloads\LangChain-Bootcamp\.venv\Lib\site-packages\langgraph\pregel\main.py:3071, in Pregel.invoke(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, **kwargs)
   3068 chunks: list[dict[str, Any] | Any] = []
   3069 interrupts: list[Interrupt] = []
-> 3071 for chunk in self.stream(
   ...
    301     replacement = f"<{match['type']}_hash:{digest}>"
    302     result = result[: match["start"]] + replacement + result[match["end"] :]

KeyError: 'value'
During task with name 'PIIMiddleware[indian_phone].before_model' and id 'a1a0b8f6-bbd5-1218-2eda-899805649ec4'
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

import re
from langchain.agents.middleware import PIIMiddleware
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage

model = init_chat_model(model="groq:qwen/qwen3-32b")

def detect_indian_phone(content: str) -> list[dict]:
    """Custom detector for Indian phone numbers"""
    matches = []
    pattern = r'\+91[\s.-]?[6-9]\d{9}'
    
    for match in re.finditer(pattern, content):
        matches.append({
            "text": match.group(0),
            "start": match.start(),
            "end": match.end(),
        })
    return matches

# Create agent with custom detector
agent = create_agent(
    model=model,
    middleware=[
        PIIMiddleware(
            pii_type="indian_phone",
            detector=detect_indian_phone,
            strategy="hash",  # Also fails with "mask"
            apply_to_input=True
        )
    ]
)

# This raises KeyError: 'value'
response = agent.invoke({
    "messages": [HumanMessage(content="Contact me at +91 9876543210")]
})

Error Message and Stack Trace (if applicable)

## **Error Message and Stack Trace**

KeyError                                  Traceback (most recent call last)
Cell In[45], line 40
     28 agent = create_agent(
     29     model = model,
     30     middleware= [
   (...)
     37     ]
     38 )
---> 40 response = agent.invoke({
     41     "messages": [
     42         HumanMessage(content="Contact me at +91 9876543210")
     43     ]
     44 })
     45 print(response)

File c:\Users\Dalbir\Downloads\LangChain-Bootcamp\.venv\Lib\site-packages\langgraph\pregel\main.py:3071, in Pregel.invoke(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, **kwargs)
   3068 chunks: list[dict[str, Any] | Any] = []
   3069 interrupts: list[Interrupt] = []
-> 3071 for chunk in self.stream(
   ...
    301     replacement = f"<{match['type']}_hash:{digest}>"
    302     result = result[: match["start"]] + replacement + result[match["end"] :]

KeyError: 'value'
During task with name 'PIIMiddleware[indian_phone].before_model' and id 'a1a0b8f6-bbd5-1218-2eda-899805649ec4'

Description

Problem: PIIMiddleware fails with KeyError: 'value' when using a custom detector function with hash or mask strategies. Expected Behavior: According to the PIIMiddleware documentation, custom detector functions should return a list of dictionaries with text, start, and end keys: pythondef detector(content: str) -> list[dict[str, str | int]]: return [ {"text": "matched_text", "start": 0, "end": 12}, ] This should work with all strategies: block, redact, hash, and mask. Actual Behavior:

✅ Strategies redact and block work fine ❌ Strategies hash and mask raise KeyError: 'value'

What Works: Using regex string directly instead of custom function: pythonPIIMiddleware( pii_type="indian_phone", detector=r'+91[\s.-]?[6-9]\d{9}', # Works! strategy="hash", apply_to_input=True ) Root Cause: The error traceback shows the middleware code is trying to access match['value'] at line 301-302, but this key is not part of the documented return format and is not being provided by the custom detector. Suggested Fix: Either:

Update the middleware code to work with the documented format (text, start, end) Update documentation to specify that value key is required

System Info

System Information

OS: Windows OS Version: 10.0.26200 Python Version: 3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]

Package Information

langchain_core: 1.2.11 langchain: 1.2.10 langchain_community: 0.4.1 langsmith: 0.7.1 langchain_classic: 1.0.1 langchain_google_genai: 4.2.0 langchain_groq: 1.1.2 langchain_openai: 1.1.9 langchain_text_splitters: 1.1.0 langgraph_sdk: 0.3.5

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.13.3 dataclasses-json: 0.6.7 filetype: 1.2.0 google-genai: 1.63.0 groq: 0.37.1 httpx: 0.28.1 httpx-sse: 0.4.3 jsonpatch: 1.33 langgraph: 1.0.8 numpy: 2.4.2 openai: 2.20.0 orjson: 3.11.7 packaging: 26.0 pydantic: 2.12.5 pydantic-settings: 2.12.0 pyyaml: 6.0.3 PyYAML: 6.0.3 requests: 2.32.5 requests-toolbelt: 1.0.0 sqlalchemy: 2.0.46 SQLAlchemy: 2.0.46 tenacity: 9.1.4 tiktoken: 0.12.0 typing-extensions: 4.15.0 uuid-utils: 0.14.0 xxhash: 3.6.0 zstandard: 0.25.0

(LangChain-Bootcamp) C:\Users\Dalbir\Downloads\LangChain-Bootcamp>

extent analysis

Problem Summary

Custom detector function with hash or mask strategies fails with KeyError: 'value' in PIIMiddleware.

Root Cause Analysis

The error is caused by the middleware code trying to access 'value' key, which is not part of the documented return format of the custom detector function.

Fix Plan

Update the middleware code to work with the documented format (text, start, end).

Step-by-Step Solution

  1. Update the middleware code:

    • Open the langchain/agents/middleware.py file.
    • In the PIIMiddleware class, update the apply_replacement method to use the text, start, and end keys instead of value.

def apply_replacement(self, match): replacement = f"<{match['type']}_hash:{match['text']}>" result = result[: match["start"]] + replacement + result[match["end"] :]


2. **Test the updated middleware**:
   - Run the reproduction code again with the updated middleware.
   - Verify that the error is resolved and the custom detector function works correctly with hash and mask strategies.

# Verification
To verify that the fix worked, run the reproduction code again and check that the error is resolved. You can also add additional tests to ensure that the custom detector function works correctly with different inputs and strategies.

# Extra Tips
- Make sure to update the documentation to reflect the changes in the middleware code.
- Consider adding additional error handling to handle cases where the custom detector function returns an invalid format.
- If you're using a version control system, commit the changes and create a pull request to update the LangChain repository.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - ✅(Solved) Fix PIIMiddleware with custom detector raises KeyError: 'value' for hash and mask strategies [3 pull requests, 1 participants]