litellm - ✅(Solved) Fix [Bug]: post_call tool_permission guardrail silently drops all streaming plain-text responses [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26547Fetched 2026-04-27 05:29:40
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
referenced ×3cross-referenced ×1

Root Cause

In async_post_call_streaming_iterator_hook inside litellm/proxy/guardrails/guardrail_hooks/tool_permission.py (lines ~665–686):

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return    # ← BUG: generator exits with zero yields
              #   client receives empty stream
              #   UI still shows response because DB log happened before this

This is an async generator. A bare return in an async generator is legal Python but produces zero yields. Since LiteLLM routes the stream through this hook (not alongside it), whatever the hook yields is what the client receives. Yielding nothing means the client gets nothing.

Fix Action

Fix

Replace the bare return with a re-emit:

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return

PR fix notes

PR #26551: fix(guardrails): re-emit chunks in tool_permission streaming hook when no tool_calls found

Description (problem / solution / changelog)

Relevant issues

Fixes #26547

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Root Cause

async_post_call_streaming_iterator_hook in tool_permission.py is an async generator (it has yield statements further down the function). In the if not tool_calls: branch — the path taken when the LLM responds with plain text and no tool calls — it did a bare return.

In an async generator, a bare return is equivalent to raise StopAsyncIteration. It terminates the generator without yielding anything. The client therefore received only data: [DONE] with no content chunks, silently dropping the entire plain-text response.

This was consistently reproducible with any plain-text LLM reply while the tool_permission guardrail was enabled.

Fix

Before returning, pass the already-assembled ModelResponse through MockResponseIterator (already imported in the same scope for the allowed-tool path) and yield each chunk:

# BEFORE
if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return

# AFTER
if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk
    return

This mirrors exactly what the guardrail does at the end of the allowed-tool path (line ~736), making both code paths symmetric.

Changes

  • litellm/proxy/guardrails/guardrail_hooks/tool_permission.py — 5-line fix in async_post_call_streaming_iterator_hook
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/test_tool_permission.py — new regression test test_async_post_call_streaming_iterator_hook_plain_text_yields_chunks that asserts the hook yields ≥ 1 chunk when the LLM replies with plain text

Screenshots / Proof of Fix

Locally validated against a running LiteLLM proxy with this fix applied (via Docker volume mount). Plain-text streaming responses now reach the client correctly. See issue #26547 for full reproduction details.

Note: CLA signature in progress at https://cla-assistant.io/BerriAI/litellm — will be completed before merge.

Changed files

  • docs/my-website/docs/proxy/guardrails/xecguard.md (added, +314/-0)
  • docs/my-website/sidebars.js (modified, +1/-0)
  • litellm/exceptions.py (modified, +30/-1)
  • litellm/integrations/custom_guardrail.py (modified, +1/-37)
  • litellm/litellm_core_utils/prompt_templates/factory.py (modified, +19/-4)
  • litellm/llms/bedrock/chat/converse_transformation.py (modified, +2/-2)
  • litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py (modified, +7/-1)
  • litellm/llms/ollama/chat/transformation.py (modified, +7/-2)
  • litellm/llms/predibase/chat/handler.py (modified, +43/-228)
  • litellm/llms/predibase/chat/transformation.py (modified, +212/-7)
  • litellm/proxy/guardrails/guardrail_hooks/tool_permission.py (modified, +5/-0)
  • litellm/proxy/guardrails/guardrail_hooks/unified_guardrail/unified_guardrail.py (modified, +10/-0)
  • litellm/proxy/guardrails/guardrail_hooks/xecguard/__init__.py (added, +45/-0)
  • litellm/proxy/guardrails/guardrail_hooks/xecguard/xecguard.py (added, +591/-0)
  • litellm/proxy/pass_through_endpoints/pass_through_endpoints.py (modified, +73/-5)
  • litellm/router.py (modified, +4/-2)
  • litellm/types/guardrails.py (modified, +5/-0)
  • litellm/types/llms/ollama.py (modified, +1/-0)
  • litellm/types/proxy/guardrails/guardrail_hooks/xecguard.py (added, +77/-0)
  • tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py (modified, +109/-0)
  • tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py (modified, +80/-0)
  • tests/test_litellm/llms/ollama/test_ollama_chat_transformation.py (modified, +95/-0)
  • tests/test_litellm/llms/test_predibase_transformation.py (added, +612/-0)
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/test_tool_permission.py (modified, +40/-0)
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/test_xecguard.py (added, +1904/-0)
  • tests/test_litellm/proxy/pass_through_endpoints/test_passthrough_post_call_guardrails.py (added, +276/-0)
  • tests/test_litellm/test_router.py (modified, +63/-0)
  • ui/litellm-dashboard/public/assets/logos/xecguard.svg (added, +4/-0)
  • ui/litellm-dashboard/src/components/guardrails/guardrail_garden_configs.ts (modified, +6/-0)
  • ui/litellm-dashboard/src/components/guardrails/guardrail_garden_data.ts (modified, +10/-0)
  • ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx (modified, +2/-0)

Code Example

curl -X POST http://localhost:4000/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "guardrails": ["tool-permission-guardrail"],
    "stream": true,
    "messages": [{"role": "user", "content": "What is 2 + 2?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "Bash",
        "description": "Execute a shell command",
        "parameters": {
          "type": "object",
          "properties": {"command": {"type": "string"}},
          "required": ["command"]
        }
      }
    }]
  }'

---

# same request without "stream": true

---

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return    # ← BUG: generator exits with zero yields
              #   client receives empty stream
              #   UI still shows response because DB log happened before this

---

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return
RAW_BUFFERClick to expand / collapse

What happened?

When mode: "post_call" is active on the tool_permission guardrail and a client sends "stream": true, any LLM response that is plain text (no tool call) is silently dropped. The client receives data: [DONE] with no content before it. The LiteLLM UI shows the full response correctly because DB logging happens before the buggy code runs.

Reproduction

Two curl commands — the only variable between them is "stream": true:

Test A — stream: true → empty response (BUG)

curl -X POST http://localhost:4000/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "guardrails": ["tool-permission-guardrail"],
    "stream": true,
    "messages": [{"role": "user", "content": "What is 2 + 2?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "Bash",
        "description": "Execute a shell command",
        "parameters": {
          "type": "object",
          "properties": {"command": {"type": "string"}},
          "required": ["command"]
        }
      }
    }]
  }'

Actual output: data: [DONE] — no content chunks

Test B — no stream → full response (WORKS)

# same request without "stream": true

Actual output: {"choices":[{"message":{"content":"2 + 2 = 4"}}]}

Root Cause

In async_post_call_streaming_iterator_hook inside litellm/proxy/guardrails/guardrail_hooks/tool_permission.py (lines ~665–686):

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return    # ← BUG: generator exits with zero yields
              #   client receives empty stream
              #   UI still shows response because DB log happened before this

This is an async generator. A bare return in an async generator is legal Python but produces zero yields. Since LiteLLM routes the stream through this hook (not alongside it), whatever the hook yields is what the client receives. Yielding nothing means the client gets nothing.

Fix

Replace the bare return with a re-emit:

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return

Environment

  • LiteLLM version: main-stable (reproduced on latest main branch, April 2026)
  • Python: 3.11
  • Deployment: Docker

Related

  • PR #22702 fixes a sibling problem in proxy/utils.py (NoneType crash in the same dispatch loop) — same symptom family, different file and root cause
  • This specific return on line ~686 of tool_permission.py has no issue or PR yet

extent analysis

TL;DR

Replace the bare return statement in async_post_call_streaming_iterator_hook with a re-emit of the model response to fix the issue where plain text LLM responses are silently dropped when mode: "post_call" is active and the client sends "stream": true.

Guidance

  • Identify the async_post_call_streaming_iterator_hook function in litellm/proxy/guardrails/guardrail_hooks/tool_permission.py and locate the bare return statement.
  • Replace the bare return with a re-emit of the model response using a MockResponseIterator as shown in the provided fix.
  • Verify that the client receives the full response when mode: "post_call" is active and the client sends "stream": true.
  • Test the fix using the provided curl commands to ensure that the response is no longer empty.

Example

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return

Notes

This fix assumes that the MockResponseIterator class is defined and functional. If this class is not defined, additional work may be required to create or import it.

Recommendation

Apply the provided workaround by replacing the bare return statement with a re-emit of the model response. This fix directly addresses the root cause of the issue and should resolve the problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: post_call tool_permission guardrail silently drops all streaming plain-text responses [1 pull requests, 1 participants]