litellm - ✅(Solved) Fix perf(presidio): skip PII analysis on assistant messages [1 pull requests, 1 participants]

firestaerter3 · 2026-03-21T14:13:23Z

[litellm] PR 24295: perf presidio : skip PII analysis on assistant messages - Repository: BerriAI/litellm - Author: firestaerter3 - State: open | merged: False… # PR #24295: perf(presidio): skip PII analysis on assistant messages - Repository: BerriAI/litellm - Author: firestaerter3 - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/24295 ## Description (problem / solution / changelog) ## Problem The Presidio guardrail analyzes all messages in the conversation history on every request, including `role: "assistant"` messages. Assistant messages are the model's own previous outputs — they don't contain user PII that needs protecting. On multi-turn conversations with tool use, analyzing these messages dominates request latency because they contain large tool-call responses and file contents. Closes #24292. Depends on #24291 (`fix/presidio-pii-roundtrip-v2`). ## Change Skip messages with `role: "assistant"` in the two message iteration loops in `presidio.py` (the string-content path and the list-content path): ```python if m.get("role") == "assistant": continue ``` ## Impact - **50-90% latency reduction** on multi-turn conversations with tool use - No security impact — only user-provided input needs PII protection - Model outputs that echo user PII are already masked on the input path ## Test plan Two new tests in `tests/guardrails_tests/test_presidio_pii_roundtrip.py`: - [x] `test_apply_guardrail_skips_assistant_messages` — verifies `apply_guardrail` processes all texts (role-agnostic at this level) - [x] `test_check_pii_skips_assistant_role_messages` — verifies `async_pre_call_hook` skips assistant-role messages: user messages are scanned, assistant messages are not ## Changed files - `litellm/llms/a2a/chat/guardrail_translation/handler.py` (modified, +12/-7) - `litellm/llms/anthropic/chat/guardrail_translation/handler.py` (modified, +11/-9) - `litellm/llms/base_llm/guardrail_translation/base_translation.py` (modified, +6/-0) - `litellm/llms/cohere/rerank/guardrail_translation/handler.py` (modified, +1/-0) - `litellm/llms/mistral/ocr/guardrail_translation/handler.py` (modified, +2/-1) - `litellm/llms/openai/chat/guardrail_translation/handler.py` (modified, +17/-11) - `litellm/llms/openai/completion/guardrail_translation/handler.py` (modified, +5/-4) - `litellm/llms/openai/embeddings/guardrail_translation/handler.py` (modified, +1/-0) - `litellm/llms/openai/image_generation/guardrail_translation/handler.py` (modified, +1/-0) - `litellm/llms/openai/responses/guardrail_translation/handler.py` (modified, +7/-5) - `litellm/llms/openai/speech/guardrail_translation/handler.py` (modified, +1/-0) - `litellm/llms/openai/transcriptions/guardrail_translation/handler.py` (modified, +5/-4) - `litellm/llms/pass_through/guardrail_translation/handler.py` (modified, +9/-8) - `litellm/proxy/_experimental/mcp_server/guardrail_translation/handler.py` (modified, +1/-0) - `litellm/proxy/guardrails/guardrail_hooks/presidio.py` (modified, +326/-88) - `litellm/proxy/guardrails/guardrail_hooks/unified_guardrail/unified_guardrail.py` (modified, +3/-0) - `litellm/types/guardrails.py` (modified, +7/-0) - `tests/guardrails_tests/test_guardrail_request_data_passthrough.py` (added, +245/-0) - `tests/guardrails_tests/test_presidio_pii_roundtrip.py` (added, +494/-0) - `tests/test_litellm/proxy/guardrails/guardrail_hooks/unified_guardrails/test_unified_guardrail.py` (modified, +3/-1) ## Fixed - Fixed by PR: perf(presidio): skip PII analysis on assistant messages (https://github.com/BerriAI/litellm/pull/24295) ## Problem The Presidio guardrail analyzes all messages in the conversation history on every request, including `role: "assistant"` messages. Assistant messages are the model's own previous outputs — they don't contain user PII that needs protecting. On multi-turn conversations with tool use, analyzing these messages dominates request latency because they contain large tool-call responses and file contents. ## Proposed change Skip messages with `role: "assistant"` in both `async_pre_call_hook` and `apply_guardrail` when building the list of texts to send to Presidio: ```python for msg_idx, m in enumerate(messages): if m.get("role") == "assistant": continue # ... existing PII analysis logic ``` ## Expected impact - **50-90% latency reduction** on multi-turn conversations with tool use (measured on a real Claude Code workflow) - No security impact — only user-provided input needs PII protection - Model outputs that echo user PII are already masked on the input path ## Trade-offs - If a user manually constructs a message with `role: "assistant"` containing PII, it would not be masked. This is an edge case — in practice, assistant messages are generated by the LLM. - Users who want to scan all messages (including assistant) could opt out via a config flag.

litellm2026-03-21 14:13:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24292•Fetched 2026-04-08 01:13:33

View on GitHub

Comments

Participants

Timeline

Reactions

Author

firestaerter3

Participants

firestaerter3

Timeline (top)

referenced ×4cross-referenced ×1labeled ×1

Root Cause

The Presidio guardrail analyzes all messages in the conversation history on every request, including role: "assistant" messages. Assistant messages are the model's own previous outputs — they don't contain user PII that needs protecting. On multi-turn conversations with tool use, analyzing these messages dominates request latency because they contain large tool-call responses and file contents.

Fix Action

Fixed

Fixed by PR: perf(presidio): skip PII analysis on assistant messages (https://github.com/BerriAI/litellm/pull/24295)

PR fix notes

PR #24295: perf(presidio): skip PII analysis on assistant messages

Repository: BerriAI/litellm
Author: firestaerter3
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24295

Description (problem / solution / changelog)

Problem

Closes #24292. Depends on #24291 (fix/presidio-pii-roundtrip-v2).

Change

Skip messages with role: "assistant" in the two message iteration loops in presidio.py (the string-content path and the list-content path):

if m.get("role") == "assistant":
    continue

Impact

50-90% latency reduction on multi-turn conversations with tool use
No security impact — only user-provided input needs PII protection
Model outputs that echo user PII are already masked on the input path

Test plan

Two new tests in tests/guardrails_tests/test_presidio_pii_roundtrip.py:

test_apply_guardrail_skips_assistant_messages — verifies apply_guardrail processes all texts (role-agnostic at this level)
test_check_pii_skips_assistant_role_messages — verifies async_pre_call_hook skips assistant-role messages: user messages are scanned, assistant messages are not

Changed files

litellm/llms/a2a/chat/guardrail_translation/handler.py (modified, +12/-7)
litellm/llms/anthropic/chat/guardrail_translation/handler.py (modified, +11/-9)
litellm/llms/base_llm/guardrail_translation/base_translation.py (modified, +6/-0)
litellm/llms/cohere/rerank/guardrail_translation/handler.py (modified, +1/-0)
litellm/llms/mistral/ocr/guardrail_translation/handler.py (modified, +2/-1)
litellm/llms/openai/chat/guardrail_translation/handler.py (modified, +17/-11)
litellm/llms/openai/completion/guardrail_translation/handler.py (modified, +5/-4)
litellm/llms/openai/embeddings/guardrail_translation/handler.py (modified, +1/-0)
litellm/llms/openai/image_generation/guardrail_translation/handler.py (modified, +1/-0)
litellm/llms/openai/responses/guardrail_translation/handler.py (modified, +7/-5)
litellm/llms/openai/speech/guardrail_translation/handler.py (modified, +1/-0)
litellm/llms/openai/transcriptions/guardrail_translation/handler.py (modified, +5/-4)
litellm/llms/pass_through/guardrail_translation/handler.py (modified, +9/-8)
litellm/proxy/_experimental/mcp_server/guardrail_translation/handler.py (modified, +1/-0)
litellm/proxy/guardrails/guardrail_hooks/presidio.py (modified, +326/-88)
litellm/proxy/guardrails/guardrail_hooks/unified_guardrail/unified_guardrail.py (modified, +3/-0)
litellm/types/guardrails.py (modified, +7/-0)
tests/guardrails_tests/test_guardrail_request_data_passthrough.py (added, +245/-0)
tests/guardrails_tests/test_presidio_pii_roundtrip.py (added, +494/-0)
tests/test_litellm/proxy/guardrails/guardrail_hooks/unified_guardrails/test_unified_guardrail.py (modified, +3/-1)

Code Example

for msg_idx, m in enumerate(messages):
    if m.get("role") == "assistant":
        continue
    # ... existing PII analysis logic

RAW_BUFFERClick to expand / collapse

Problem

Proposed change

Skip messages with role: "assistant" in both async_pre_call_hook and apply_guardrail when building the list of texts to send to Presidio:

for msg_idx, m in enumerate(messages):
    if m.get("role") == "assistant":
        continue
    # ... existing PII analysis logic

Expected impact

50-90% latency reduction on multi-turn conversations with tool use (measured on a real Claude Code workflow)
No security impact — only user-provided input needs PII protection
Model outputs that echo user PII are already masked on the input path

Trade-offs

If a user manually constructs a message with role: "assistant" containing PII, it would not be masked. This is an edge case — in practice, assistant messages are generated by the LLM.
Users who want to scan all messages (including assistant) could opt out via a config flag.

extent analysis

Fix Plan

To reduce latency in multi-turn conversations with tool use, we will skip analyzing messages with role: "assistant" in both async_pre_call_hook and apply_guardrail.

Here are the steps:

Modify the async_pre_call_hook function to skip assistant messages
Modify the apply_guardrail function to skip assistant messages

Example code:

def async_pre_call_hook(messages):
    texts_to_analyze = []
    for msg_idx, m in enumerate(messages):
        if m.get("role") == "assistant":
            continue
        texts_to_analyze.append(m["text"])
    # ... existing PII analysis logic

def apply_guardrail(messages):
    texts_to_analyze = []
    for msg_idx, m in enumerate(messages):
        if m.get("role") == "assistant":
            continue
        texts_to_analyze.append(m["text"])
    # ... existing PII analysis logic

Verification

To verify the fix, measure the latency reduction in multi-turn conversations with tool use. Compare the latency before and after the fix to ensure a 50-90% reduction.

Extra Tips

Consider adding a config flag to allow users to opt out of skipping assistant messages if needed. This will provide flexibility for users who want to scan all messages, including assistant messages.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix perf(presidio): skip PII analysis on assistant messages [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #24295: perf(presidio): skip PII analysis on assistant messages

Description (problem / solution / changelog)

Problem

Change

Impact

Test plan

Changed files

Code Example

Problem

Proposed change

Expected impact

Trade-offs

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix perf(presidio): skip PII analysis on assistant messages [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #24295: perf(presidio): skip PII analysis on assistant messages

Description (problem / solution / changelog)

Problem

Change

Impact

Test plan

Changed files

Code Example

Problem

Proposed change

Expected impact

Trade-offs

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING