litellm - ✅(Solved) Fix [Bug]: post_call tool_permission guardrail silently drops all streaming plain-text responses [1 pull requests, 1 participants]

someswar177 · 2026-04-26T07:56:21Z

[litellm] PR 26551: fix guardrails : re-emit chunks in tool permission streaming hook when no tool calls found - Repository: BerriAI/litellm - Author: someswar… # PR #26551: fix(guardrails): re-emit chunks in tool_permission streaming hook when no tool_calls found - Repository: BerriAI/litellm - Author: someswar177 - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/26551 ## Description (problem / solution / changelog) ## Relevant issues Fixes #26547 ## Pre-Submission checklist - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory - [ ] My PR passes all unit tests on `make test-unit` - [x] My PR's scope is as isolated as possible, it only solves 1 specific problem - [ ] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Type 🐛 Bug Fix ## Root Cause `async_post_call_streaming_iterator_hook` in `tool_permission.py` is an async generator (it has `yield` statements further down the function). In the `if not tool_calls:` branch — the path taken when the LLM responds with plain text and no tool calls — it did a bare `return`. In an async generator, a bare `return` is equivalent to `raise StopAsyncIteration`. It terminates the generator without yielding anything. The client therefore received only `data: [DONE]` with no content chunks, silently dropping the entire plain-text response. This was consistently reproducible with any plain-text LLM reply while the `tool_permission` guardrail was enabled. ## Fix Before returning, pass the already-assembled `ModelResponse` through `MockResponseIterator` (already imported in the same scope for the allowed-tool path) and yield each chunk: ```python # BEFORE if not tool_calls: verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found") return # AFTER if not tool_calls: verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found") mock_response = MockResponseIterator(model_response=assembled_model_response) async for chunk in mock_response: yield chunk return ``` This mirrors exactly what the guardrail does at the end of the allowed-tool path (line ~736), making both code paths symmetric. ## Changes - `litellm/proxy/guardrails/guardrail_hooks/tool_permission.py` — 5-line fix in `async_post_call_streaming_iterator_hook` - `tests/test_litellm/proxy/guardrails/guardrail_hooks/test_tool_permission.py` — new regression test `test_async_post_call_streaming_iterator_hook_plain_text_yields_chunks` that asserts the hook yields ≥ 1 chunk when the LLM replies with plain text ## Screenshots / Proof of Fix Locally validated against a running LiteLLM proxy with this fix applied (via Docker volume mount). Plain-text streaming responses now reach the client correctly. See issue #26547 for full reproduction details. > **Note:** CLA signature in progress at https://cla-assistant.io/BerriAI/litellm — will be completed before merge. ## Changed files - `docs/my-website/docs/proxy/guardrails/xecguard.md` (added, +314/-0) - `docs/my-website/sidebars.js` (modified, +1/-0) - `litellm/exceptions.py` (modified, +30/-1) - `litellm/integrations/custom_guardrail.py` (modified, +1/-37) - `litellm/litellm_core_utils/prompt_templates/factory.py` (modified, +19/-4) - `litellm/llms/bedrock/chat/converse_transformation.py` (modified, +2/-2) - `litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py` (modified, +7/-1) - `litellm/llms/ollama/chat/transformation.py` (modified, +7/-2) - `litellm/llms/predibase/chat/handler.py` (modified, +43/-228) - `litellm/llms/predibase/chat/transformation.py` (modified, +212/-7) - `litellm/proxy/guardrails/guardrail_hooks/tool_permission.py` (modified, +5/-0) - `litellm/proxy/guardrails/guardrail_hooks/unified_guardrail/unified_guardrail.py` (modified, +10/-0) - `litellm/proxy/guardrails/guardrail_hooks/xecguard/__init__.py` (added, +45/-0) - `litellm/proxy/guardrails/guardrail_hooks/xecguard/xecguard.py` (added, +591/-0) - `litellm/proxy/pass_through_endpoints/pass_through_endpoints.py` (modified, +73/-5) - `litellm/router.py` (modified, +4/-2) - `litellm/types/guardrails.py` (modified, +5/-0) - `litellm/types/llms/ollama.py` (modified, +1/-0) - `litellm/types/proxy/guardrails/guardrail_hooks/xecguard.py` (added, +77/-0) - `tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py` (modified, +109/-0) - `tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py` (modified, +80/-0) - `tests/test_litellm/llms/ollama/test_ollama_chat_transformation.py` (modified, +95/-0) - `tests/test_litellm/llms/test_predibase_transformation.py` (added, +612/-0) - `tests/test_litellm/proxy/guardrails/guardrail_hooks/test_tool_permission.py` (modified, +40/-0) - `tests/test_litellm/proxy/guardrails/guardrail_hooks/test_xecguard.py`

litellm2026-04-26 07:56:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26547•Fetched 2026-04-27 05:29:40

View on GitHub

Comments

Participants

Timeline

Reactions

Author

someswar177

Participants

someswar177

Timeline (top)

referenced ×3cross-referenced ×1

Root Cause

In async_post_call_streaming_iterator_hook inside litellm/proxy/guardrails/guardrail_hooks/tool_permission.py (lines ~665–686):

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return    # ← BUG: generator exits with zero yields
              #   client receives empty stream
              #   UI still shows response because DB log happened before this

This is an async generator. A bare return in an async generator is legal Python but produces zero yields. Since LiteLLM routes the stream through this hook (not alongside it), whatever the hook yields is what the client receives. Yielding nothing means the client gets nothing.

Fix Action

Fix

Replace the bare return with a re-emit:

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return

PR fix notes

PR #26551: fix(guardrails): re-emit chunks in tool_permission streaming hook when no tool_calls found

Repository: BerriAI/litellm
Author: someswar177
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26551

Description (problem / solution / changelog)

Relevant issues

Fixes #26547

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Root Cause

async_post_call_streaming_iterator_hook in tool_permission.py is an async generator (it has yield statements further down the function). In the if not tool_calls: branch — the path taken when the LLM responds with plain text and no tool calls — it did a bare return.

In an async generator, a bare return is equivalent to raise StopAsyncIteration. It terminates the generator without yielding anything. The client therefore received only data: [DONE] with no content chunks, silently dropping the entire plain-text response.

This was consistently reproducible with any plain-text LLM reply while the tool_permission guardrail was enabled.

Fix

Before returning, pass the already-assembled ModelResponse through MockResponseIterator (already imported in the same scope for the allowed-tool path) and yield each chunk:

# BEFORE
if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return

# AFTER
if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk
    return

This mirrors exactly what the guardrail does at the end of the allowed-tool path (line ~736), making both code paths symmetric.

Changes

litellm/proxy/guardrails/guardrail_hooks/tool_permission.py — 5-line fix in async_post_call_streaming_iterator_hook
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_tool_permission.py — new regression test test_async_post_call_streaming_iterator_hook_plain_text_yields_chunks that asserts the hook yields ≥ 1 chunk when the LLM replies with plain text

Screenshots / Proof of Fix

Locally validated against a running LiteLLM proxy with this fix applied (via Docker volume mount). Plain-text streaming responses now reach the client correctly. See issue #26547 for full reproduction details.

Note: CLA signature in progress at https://cla-assistant.io/BerriAI/litellm — will be completed before merge.

Changed files

docs/my-website/docs/proxy/guardrails/xecguard.md (added, +314/-0)
docs/my-website/sidebars.js (modified, +1/-0)
litellm/exceptions.py (modified, +30/-1)
litellm/integrations/custom_guardrail.py (modified, +1/-37)
litellm/litellm_core_utils/prompt_templates/factory.py (modified, +19/-4)
litellm/llms/bedrock/chat/converse_transformation.py (modified, +2/-2)
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py (modified, +7/-1)
litellm/llms/ollama/chat/transformation.py (modified, +7/-2)
litellm/llms/predibase/chat/handler.py (modified, +43/-228)
litellm/llms/predibase/chat/transformation.py (modified, +212/-7)
litellm/proxy/guardrails/guardrail_hooks/tool_permission.py (modified, +5/-0)
litellm/proxy/guardrails/guardrail_hooks/unified_guardrail/unified_guardrail.py (modified, +10/-0)
litellm/proxy/guardrails/guardrail_hooks/xecguard/__init__.py (added, +45/-0)
litellm/proxy/guardrails/guardrail_hooks/xecguard/xecguard.py (added, +591/-0)
litellm/proxy/pass_through_endpoints/pass_through_endpoints.py (modified, +73/-5)
litellm/router.py (modified, +4/-2)
litellm/types/guardrails.py (modified, +5/-0)
litellm/types/llms/ollama.py (modified, +1/-0)
litellm/types/proxy/guardrails/guardrail_hooks/xecguard.py (added, +77/-0)
tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py (modified, +109/-0)
tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py (modified, +80/-0)
tests/test_litellm/llms/ollama/test_ollama_chat_transformation.py (modified, +95/-0)
tests/test_litellm/llms/test_predibase_transformation.py (added, +612/-0)
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_tool_permission.py (modified, +40/-0)
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_xecguard.py (added, +1904/-0)
tests/test_litellm/proxy/pass_through_endpoints/test_passthrough_post_call_guardrails.py (added, +276/-0)
tests/test_litellm/test_router.py (modified, +63/-0)
ui/litellm-dashboard/public/assets/logos/xecguard.svg (added, +4/-0)
ui/litellm-dashboard/src/components/guardrails/guardrail_garden_configs.ts (modified, +6/-0)
ui/litellm-dashboard/src/components/guardrails/guardrail_garden_data.ts (modified, +10/-0)
ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx (modified, +2/-0)

Code Example

curl -X POST http://localhost:4000/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "guardrails": ["tool-permission-guardrail"],
    "stream": true,
    "messages": [{"role": "user", "content": "What is 2 + 2?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "Bash",
        "description": "Execute a shell command",
        "parameters": {
          "type": "object",
          "properties": {"command": {"type": "string"}},
          "required": ["command"]
        }
      }
    }]
  }'

---

# same request without "stream": true

---

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return    # ← BUG: generator exits with zero yields
              #   client receives empty stream
              #   UI still shows response because DB log happened before this

---

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return

RAW_BUFFERClick to expand / collapse

What happened?

When mode: "post_call" is active on the tool_permission guardrail and a client sends "stream": true, any LLM response that is plain text (no tool call) is silently dropped. The client receives data: [DONE] with no content before it. The LiteLLM UI shows the full response correctly because DB logging happens before the buggy code runs.

Reproduction

Two curl commands — the only variable between them is "stream": true:

Test A — stream: true → empty response (BUG)

curl -X POST http://localhost:4000/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "guardrails": ["tool-permission-guardrail"],
    "stream": true,
    "messages": [{"role": "user", "content": "What is 2 + 2?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "Bash",
        "description": "Execute a shell command",
        "parameters": {
          "type": "object",
          "properties": {"command": {"type": "string"}},
          "required": ["command"]
        }
      }
    }]
  }'

Actual output: data: [DONE] — no content chunks

Test B — no stream → full response (WORKS)

# same request without "stream": true

Actual output: {"choices":[{"message":{"content":"2 + 2 = 4"}}]}

Root Cause

In async_post_call_streaming_iterator_hook inside litellm/proxy/guardrails/guardrail_hooks/tool_permission.py (lines ~665–686):

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    return    # ← BUG: generator exits with zero yields
              #   client receives empty stream
              #   UI still shows response because DB log happened before this

Fix

Replace the bare return with a re-emit:

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return

Environment

LiteLLM version: main-stable (reproduced on latest main branch, April 2026)
Python: 3.11
Deployment: Docker

PR #22702 fixes a sibling problem in proxy/utils.py (NoneType crash in the same dispatch loop) — same symptom family, different file and root cause
This specific return on line ~686 of tool_permission.py has no issue or PR yet

extent analysis

TL;DR

Replace the bare return statement in async_post_call_streaming_iterator_hook with a re-emit of the model response to fix the issue where plain text LLM responses are silently dropped when mode: "post_call" is active and the client sends "stream": true.

Guidance

Identify the async_post_call_streaming_iterator_hook function in litellm/proxy/guardrails/guardrail_hooks/tool_permission.py and locate the bare return statement.
Replace the bare return with a re-emit of the model response using a MockResponseIterator as shown in the provided fix.
Verify that the client receives the full response when mode: "post_call" is active and the client sends "stream": true.
Test the fix using the provided curl commands to ensure that the response is no longer empty.

Example

if not tool_calls:
    verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
    mock_response = MockResponseIterator(model_response=assembled_model_response)
    async for chunk in mock_response:
        yield chunk    # re-emit before exiting
    return

Notes

This fix assumes that the MockResponseIterator class is defined and functional. If this class is not defined, additional work may be required to create or import it.

Recommendation

Apply the provided workaround by replacing the bare return statement with a re-emit of the model response. This fix directly addresses the root cause of the issue and should resolve the problem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#LLM response #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: post_call tool_permission guardrail silently drops all streaming plain-text responses [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix

PR fix notes

PR #26551: fix(guardrails): re-emit chunks in tool_permission streaming hook when no tool_calls found

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Type

Root Cause

Fix

Changes

Screenshots / Proof of Fix

Changed files

Code Example

What happened?

Reproduction

Root Cause

Fix

Environment

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING