litellm - 💡(How to fix) Fix [Bug]: websearch_interception silently truncates streaming response on /v1/messages — follow-up call always uses stream=False

litellm2026-05-12 04:43:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

When using websearch_interception with a search provider (e.g. Tavily) via Claude Code (which uses the /v1/messages endpoint) with stream=True, the tool call executes successfully, but the final LLM output is silently truncated mid-response with no error logs. Actual behavior: The response cuts off mid-generation. No error, no non-200 status, no log entry — it simply stops. 5. Observe that the litellm_web_search tool call completes successfully, but the final LLM response is cut off mid-generation with no error.

Root Cause

The bug lives in two cooperating places inside handler.py.

Step 1 — stream is silently killed by the pre-hook:

Lines 364–369:

if kwargs.get("stream"):
    kwargs["stream"] = False
    kwargs["_websearch_interception_converted_stream"] = True

The flag _websearch_interception_converted_stream is set so that the streaming response can be reconstructed later.

Step 2 — _prepare_followup_kwargs strips the flag:

Lines 707–721:

return {
    k: v
    for k, v in kwargs.items()
    if not k.startswith("_websearch_interception") and k not in _internal_keys
}

Any key prefixed with _websearch_interception — including _websearch_interception_converted_stream — is stripped from kwargs before the follow-up call.

Step 3 — _execute_agentic_loop never passes stream to the follow-up call:

Lines 758–764:

return await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=request_patch.messages,
    model=request_patch.model or model,
    **optional_params,
    **request_patch.kwargs,
)

stream is not passed here. In anthropic_messages, original_stream is computed as (line 193–195):

original_stream = stream or kwargs.get("_websearch_interception_converted_stream", False)

Since stream defaults to False and the flag has been stripped by Step 2, original_stream evaluates to False. The follow-up call always returns a non-streaming response, which is handed back to Claude Code expecting SSE chunks — causing the response to appear truncated.

Fix Action

Fix / Workaround

return await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=request_patch.messages,
    model=request_patch.model or model,
    **optional_params,
    **request_patch.kwargs,
)

return await anthropic_messages.acreate( max_tokens=max_tokens, messages=request_patch.messages, model=request_patch.model or model, stream=original_stream, # <-- add this **optional_params, **request_patch.kwargs, )


v1.83.14-stable.patch.3

Code Example

if kwargs.get("stream"):
    kwargs["stream"] = False
    kwargs["_websearch_interception_converted_stream"] = True

---

return {
    k: v
    for k, v in kwargs.items()
    if not k.startswith("_websearch_interception") and k not in _internal_keys
}

---

return await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=request_patch.messages,
    model=request_patch.model or model,
    **optional_params,
    **request_patch.kwargs,
)

---

original_stream = stream or kwargs.get("_websearch_interception_converted_stream", False)

---

original_stream = kwargs.get("_websearch_interception_converted_stream", False) or stream

return await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=request_patch.messages,
    model=request_patch.model or model,
    stream=original_stream,   # <-- add this
    **optional_params,
    **request_patch.kwargs,
)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Expected behavior: The full LLM response is streamed back to Claude Code after the search results are injected.

Actual behavior: The response cuts off mid-generation. No error, no non-200 status, no log entry — it simply stops.

Root Cause

The bug lives in two cooperating places inside handler.py.

Step 1 — stream is silently killed by the pre-hook:

Lines 364–369:

if kwargs.get("stream"):
    kwargs["stream"] = False
    kwargs["_websearch_interception_converted_stream"] = True

The flag _websearch_interception_converted_stream is set so that the streaming response can be reconstructed later.

Step 2 — _prepare_followup_kwargs strips the flag:

Lines 707–721:

return {
    k: v
    for k, v in kwargs.items()
    if not k.startswith("_websearch_interception") and k not in _internal_keys
}

Any key prefixed with _websearch_interception — including _websearch_interception_converted_stream — is stripped from kwargs before the follow-up call.

Step 3 — _execute_agentic_loop never passes stream to the follow-up call:

Lines 758–764:

return await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=request_patch.messages,
    model=request_patch.model or model,
    **optional_params,
    **request_patch.kwargs,
)

stream is not passed here. In anthropic_messages, original_stream is computed as (line 193–195):

original_stream = stream or kwargs.get("_websearch_interception_converted_stream", False)

Suggested Fix

Pass the original stream intent explicitly to the follow-up anthropic_messages.acreate() call in _execute_agentic_loop:

original_stream = kwargs.get("_websearch_interception_converted_stream", False) or stream

return await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=request_patch.messages,
    model=request_patch.model or model,
    stream=original_stream,   # <-- add this
    **optional_params,
    **request_patch.kwargs,
)

This ensures the follow-up call respects the original streaming intent from the client, before _prepare_followup_kwargs strips the flag.

Steps to Reproduce

Configure LiteLLM proxy with websearch_interception and a search tool.
Connect Claude Code to the LiteLLM proxy via ANTHROPIC_BASE_URL.
Send a normal conversation message (no web search triggered) — confirm streaming works fine.
Send a message that triggers a web search (e.g. "What are the latest AI news today?").
Observe that the litellm_web_search tool call completes successfully, but the final LLM response is cut off mid-generation with no error.

Relevant log output

No response

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.14-stable.patch.3

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#LLM response #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: websearch_interception silently truncates streaming response on /v1/messages — follow-up call always uses stream=False

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Root Cause

Suggested Fix

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: websearch_interception silently truncates streaming response on /v1/messages — follow-up call always uses stream=False

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Root Cause

Suggested Fix

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

RELATED_DISCOVERY

TRENDING