litellm - ✅(Solved) Fix perf(presidio): skip PII analysis on assistant messages [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24292Fetched 2026-04-08 01:13:33
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
referenced ×4cross-referenced ×1labeled ×1

Root Cause

The Presidio guardrail analyzes all messages in the conversation history on every request, including role: "assistant" messages. Assistant messages are the model's own previous outputs — they don't contain user PII that needs protecting. On multi-turn conversations with tool use, analyzing these messages dominates request latency because they contain large tool-call responses and file contents.

Fix Action

Fixed

PR fix notes

PR #24295: perf(presidio): skip PII analysis on assistant messages

Description (problem / solution / changelog)

Problem

The Presidio guardrail analyzes all messages in the conversation history on every request, including role: "assistant" messages. Assistant messages are the model's own previous outputs — they don't contain user PII that needs protecting. On multi-turn conversations with tool use, analyzing these messages dominates request latency because they contain large tool-call responses and file contents.

Closes #24292. Depends on #24291 (fix/presidio-pii-roundtrip-v2).

Change

Skip messages with role: "assistant" in the two message iteration loops in presidio.py (the string-content path and the list-content path):

if m.get("role") == "assistant":
    continue

Impact

  • 50-90% latency reduction on multi-turn conversations with tool use
  • No security impact — only user-provided input needs PII protection
  • Model outputs that echo user PII are already masked on the input path

Test plan

Two new tests in tests/guardrails_tests/test_presidio_pii_roundtrip.py:

  • test_apply_guardrail_skips_assistant_messages — verifies apply_guardrail processes all texts (role-agnostic at this level)
  • test_check_pii_skips_assistant_role_messages — verifies async_pre_call_hook skips assistant-role messages: user messages are scanned, assistant messages are not

Changed files

  • litellm/llms/a2a/chat/guardrail_translation/handler.py (modified, +12/-7)
  • litellm/llms/anthropic/chat/guardrail_translation/handler.py (modified, +11/-9)
  • litellm/llms/base_llm/guardrail_translation/base_translation.py (modified, +6/-0)
  • litellm/llms/cohere/rerank/guardrail_translation/handler.py (modified, +1/-0)
  • litellm/llms/mistral/ocr/guardrail_translation/handler.py (modified, +2/-1)
  • litellm/llms/openai/chat/guardrail_translation/handler.py (modified, +17/-11)
  • litellm/llms/openai/completion/guardrail_translation/handler.py (modified, +5/-4)
  • litellm/llms/openai/embeddings/guardrail_translation/handler.py (modified, +1/-0)
  • litellm/llms/openai/image_generation/guardrail_translation/handler.py (modified, +1/-0)
  • litellm/llms/openai/responses/guardrail_translation/handler.py (modified, +7/-5)
  • litellm/llms/openai/speech/guardrail_translation/handler.py (modified, +1/-0)
  • litellm/llms/openai/transcriptions/guardrail_translation/handler.py (modified, +5/-4)
  • litellm/llms/pass_through/guardrail_translation/handler.py (modified, +9/-8)
  • litellm/proxy/_experimental/mcp_server/guardrail_translation/handler.py (modified, +1/-0)
  • litellm/proxy/guardrails/guardrail_hooks/presidio.py (modified, +326/-88)
  • litellm/proxy/guardrails/guardrail_hooks/unified_guardrail/unified_guardrail.py (modified, +3/-0)
  • litellm/types/guardrails.py (modified, +7/-0)
  • tests/guardrails_tests/test_guardrail_request_data_passthrough.py (added, +245/-0)
  • tests/guardrails_tests/test_presidio_pii_roundtrip.py (added, +494/-0)
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/unified_guardrails/test_unified_guardrail.py (modified, +3/-1)

Code Example

for msg_idx, m in enumerate(messages):
    if m.get("role") == "assistant":
        continue
    # ... existing PII analysis logic
RAW_BUFFERClick to expand / collapse

Problem

The Presidio guardrail analyzes all messages in the conversation history on every request, including role: "assistant" messages. Assistant messages are the model's own previous outputs — they don't contain user PII that needs protecting. On multi-turn conversations with tool use, analyzing these messages dominates request latency because they contain large tool-call responses and file contents.

Proposed change

Skip messages with role: "assistant" in both async_pre_call_hook and apply_guardrail when building the list of texts to send to Presidio:

for msg_idx, m in enumerate(messages):
    if m.get("role") == "assistant":
        continue
    # ... existing PII analysis logic

Expected impact

  • 50-90% latency reduction on multi-turn conversations with tool use (measured on a real Claude Code workflow)
  • No security impact — only user-provided input needs PII protection
  • Model outputs that echo user PII are already masked on the input path

Trade-offs

  • If a user manually constructs a message with role: "assistant" containing PII, it would not be masked. This is an edge case — in practice, assistant messages are generated by the LLM.
  • Users who want to scan all messages (including assistant) could opt out via a config flag.

extent analysis

Fix Plan

To reduce latency in multi-turn conversations with tool use, we will skip analyzing messages with role: "assistant" in both async_pre_call_hook and apply_guardrail.

Here are the steps:

  • Modify the async_pre_call_hook function to skip assistant messages
  • Modify the apply_guardrail function to skip assistant messages

Example code:

def async_pre_call_hook(messages):
    texts_to_analyze = []
    for msg_idx, m in enumerate(messages):
        if m.get("role") == "assistant":
            continue
        texts_to_analyze.append(m["text"])
    # ... existing PII analysis logic

def apply_guardrail(messages):
    texts_to_analyze = []
    for msg_idx, m in enumerate(messages):
        if m.get("role") == "assistant":
            continue
        texts_to_analyze.append(m["text"])
    # ... existing PII analysis logic

Verification

To verify the fix, measure the latency reduction in multi-turn conversations with tool use. Compare the latency before and after the fix to ensure a 50-90% reduction.

Extra Tips

Consider adding a config flag to allow users to opt out of skipping assistant messages if needed. This will provide flexibility for users who want to scan all messages, including assistant messages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING