litellm - 💡(How to fix) Fix [Bug]: websearch_interception does not fire on Anthropic `/v1/messages` with hosted_vllm + proxy alias model [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26046Fetched 2026-04-19 15:06:08
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×4unsubscribed ×1
  1. custom_llm_provider is None at the _try_websearch_short_circuit() call site
  2. litellm.get_llm_provider() fails for proxy-config-only aliases
  3. provider_str becomes "", which is not in enabled_providers
  4. Short-circuit returns None immediately
  5. Request proceeds to anthropic_messages_handler()LiteLLMMessagesToCompletionTransformationHandler → vLLM

At minimum, this affects the hosted_vllm/ path when used with a proxy-config-only alias model name, and it may affect other non-standard provider prefixes as well.

Error Message

if not custom_llm_provider: try: _, custom_llm_provider, _, _ = litellm.get_llm_provider(model=model) except Exception: pass # Silently fails — provider stays None

Root Cause

Possible root cause

Fix Action

Fix / Workaround

Workaround used

Code Example

litellm_settings:
  drop_params: true
  success_callback: ["websearch_interception"]
  websearch_interception_params:
    enabled_providers: ["hosted_vllm"]
    search_tool_name: brave-search

model_list:
  - model_name: qwen-local
    litellm_params:
      model: hosted_vllm/qwen/qwen3.6-35b-a3b-fp8
      api_base: http://127.0.0.1:8038/v1
      api_key: sk-dummy

search_tools:
  - search_tool_name: brave-search
    litellm_params:
      search_provider: brave
      api_key: os.environ/BRAVE_SEARCH_API_KEY

---

curl -X POST http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-dummy" \
  -d '{
    "model": "qwen-local",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Search for today AI news"}],
    "tools": [{"name": "web_search", "type": "web_search_20250305"}],
    "tool_choice": {"type": "tool", "name": "web_search"}
  }'

---

Initialized Success Callbacks - ['websearch_interception']
Proxy initialized with Search Tools: brave-search (brave)
# ... but no "Short-circuit search detected" or Brave API call logs
POST /v1/messages HTTP/1.1" 200 OK  (returns Qwen's text response, not search results)

---

curl -X POST http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hosted_vllm/qwen/qwen3.6-35b-a3b-fp8",
    "messages": [{"role": "user", "content": "Search for today AI news"}],
    "tools": [{"name": "web_search", "type": "web_search_20250305"}],
    "tool_choice": {"type": "tool", "name": "web_search"},
    "max_tokens": 1024
  }'

---

# Line 230 in handler.py — called BEFORE provider resolution
short_circuit_response = await _try_websearch_short_circuit(
    model=model,
    messages=messages,
    tools=tools,
    custom_llm_provider=custom_llm_provider,  # ← This is None at this point!
    stream=original_stream,
)

---

provider_str = custom_llm_provider or ""  # → ""

---

if self.enabled_providers is not None and provider_str not in self.enabled_providers:
    return None  # "" is not in ["hosted_vllm"] → exits immediately

---

if not custom_llm_provider:
    try:
        _, custom_llm_provider, _, _ = litellm.get_llm_provider(model=model)
    except Exception:
        pass  # Silently fails — provider stays None

---

Initialized Success Callbacks - ['websearch_interception']
Proxy initialized with Search Tools: brave-search (brave)
# ... but no "Short-circuit search detected" or Brave API call logs
POST /v1/messages HTTP/1.1" 200 OK  (returns Qwen's text response, not search results)

---

curl -X POST http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hosted_vllm/qwen/qwen3.6-35b-a3b-fp8",
    "messages": [{"role": "user", "content": "Search for today AI news"}],
    "tools": [{"name": "web_search", "type": "web_search_20250305"}],
    "tool_choice": {"type": "tool", "name": "web_search"},
    "max_tokens": 1024
  }'

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Checklist

  • I have searched the existing issues and checked that no open issue duplicates this one.
  • This bug is reproducible with a minimal reproduction below.

What happened?

websearch_interception callback does not trigger when sending Anthropic Messages API requests through the experimental pass-through path with a proxy-config-only alias model (e.g., qwen-local). The request reaches the backend vLLM instead of being intercepted by the web search handler.

Steps to reproduce

  1. Start LiteLLM proxy with this config:
litellm_settings:
  drop_params: true
  success_callback: ["websearch_interception"]
  websearch_interception_params:
    enabled_providers: ["hosted_vllm"]
    search_tool_name: brave-search

model_list:
  - model_name: qwen-local
    litellm_params:
      model: hosted_vllm/qwen/qwen3.6-35b-a3b-fp8
      api_base: http://127.0.0.1:8038/v1
      api_key: sk-dummy

search_tools:
  - search_tool_name: brave-search
    litellm_params:
      search_provider: brave
      api_key: os.environ/BRAVE_SEARCH_API_KEY
  1. Send a standalone web search request via Anthropic Messages API:
curl -X POST http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-dummy" \
  -d '{
    "model": "qwen-local",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Search for today AI news"}],
    "tools": [{"name": "web_search", "type": "web_search_20250305"}],
    "tool_choice": {"type": "tool", "name": "web_search"}
  }'
  1. Observe the response.

Expected behavior

The websearch_interception callback should:

  • Detect the standalone web_search request via _try_websearch_short_circuit()
  • Execute Brave Search server-side
  • Return a synthetic Anthropic response with search results — without forwarding to vLLM

(For non-standalone requests with other tools mixed in, pre_call_anthropic_messages_api() should convert the web_search tool to litellm_web_search format before forwarding.)

Actual behavior

The request is forwarded directly to vLLM. The model (Qwen) responds with a text-only answer saying it has no internet access. No web search is executed. No interception logs appear in the proxy output.

Relevant log output:

Initialized Success Callbacks - ['websearch_interception']
Proxy initialized with Search Tools: brave-search (brave)
# ... but no "Short-circuit search detected" or Brave API call logs
POST /v1/messages HTTP/1.1" 200 OK  (returns Qwen's text response, not search results)

Environment

  • LiteLLM version: 1.83.9
  • Python version: 3.12.3 (venv at ~/.venvs/litellm-proxy)
  • OS: Ubuntu 24.04.4 LTS (Noble Numbat)
  • Backend: hosted_vllm (vLLM v0.19.1-cu130, container qwen36-dgxspark)
  • Upstream model: qwen/qwen3.6-35b-a3b-fp8
  • Search provider: Brave Search
  • Endpoint: Anthropic /v1/messages via experimental pass-through path
  • Config type: Proxy config (model alias defined only in YAML, not in LiteLLM global model registry)

Additional details

Reproduction with prefixed model name

Even when using the prefixed model name instead of the alias:

curl -X POST http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hosted_vllm/qwen/qwen3.6-35b-a3b-fp8",
    "messages": [{"role": "user", "content": "Search for today AI news"}],
    "tools": [{"name": "web_search", "type": "web_search_20250305"}],
    "tool_choice": {"type": "tool", "name": "web_search"},
    "max_tokens": 1024
  }'

The short-circuit still does not fire — the request still reaches vLLM.


Possible root cause

The issue appears to be in the order of operations in litellm/llms/anthropic/experimental_pass_through/messages/handler.py:

_try_websearch_short_circuit() (line ~118-164)

This function is called at the entry point of anthropic_messages() before anthropic_messages_handler() resolves the provider.

# Line 230 in handler.py — called BEFORE provider resolution
short_circuit_response = await _try_websearch_short_circuit(
    model=model,
    messages=messages,
    tools=tools,
    custom_llm_provider=custom_llm_provider,  # ← This is None at this point!
    stream=original_stream,
)

When custom_llm_provider is None (which is the case for Anthropic Messages API requests from Claude Code or curl), try_short_circuit_search() computes:

provider_str = custom_llm_provider or ""  # → ""

Then checks:

if self.enabled_providers is not None and provider_str not in self.enabled_providers:
    return None  # "" is not in ["hosted_vllm"] → exits immediately

Why litellm.get_llm_provider(model=...) doesn't help

The handler attempts to derive the provider when it's None:

if not custom_llm_provider:
    try:
        _, custom_llm_provider, _, _ = litellm.get_llm_provider(model=model)
    except Exception:
        pass  # Silently fails — provider stays None

But litellm.get_llm_provider(model="qwen-local") fails because qwen-local is a proxy-config-only alias — it's not in LiteLLM's global model registry. The alias-to-provider mapping only exists in the proxy's config YAML (model_list[].litellm_params.model = hosted_vllm/...).

_pre_call_anthropic_messages_api() hook

This callback also does not appear to be invoked in the experimental pass-through path. The anthropic_messages() function calls _execute_pre_request_hooks() which runs async_pre_request_hook() on registered callbacks, but the pre_call_anthropic_messages_api method of WebSearchInterceptionLogger is apparently not triggered through this path.

Summary

  1. custom_llm_provider is None at the _try_websearch_short_circuit() call site
  2. litellm.get_llm_provider() fails for proxy-config-only aliases
  3. provider_str becomes "", which is not in enabled_providers
  4. Short-circuit returns None immediately
  5. Request proceeds to anthropic_messages_handler()LiteLLMMessagesToCompletionTransformationHandler → vLLM

At minimum, this affects the hosted_vllm/ path when used with a proxy-config-only alias model name, and it may affect other non-standard provider prefixes as well.

Workaround used

A self-hosted Node.js custom gateway was built that directly translates Anthropic Messages API ↔ OpenAI chat/completions, with Brave Search wired in at the application level. This bypasses the LiteLLM experimental pass-through path entirely. While effective, it loses the benefit of LiteLLM's built-in web search interception.

Steps to Reproduce

Relevant log output:

Initialized Success Callbacks - ['websearch_interception']
Proxy initialized with Search Tools: brave-search (brave)
# ... but no "Short-circuit search detected" or Brave API call logs
POST /v1/messages HTTP/1.1" 200 OK  (returns Qwen's text response, not search results)

Environment

  • LiteLLM version: 1.83.9
  • Python version: 3.12.3 (venv at ~/.venvs/litellm-proxy)
  • OS: Ubuntu 24.04.4 LTS (Noble Numbat)
  • Backend: hosted_vllm (vLLM v0.19.1-cu130, container qwen36-dgxspark)
  • Upstream model: qwen/qwen3.6-35b-a3b-fp8
  • Search provider: Brave Search
  • Endpoint: Anthropic /v1/messages via experimental pass-through path
  • Config type: Proxy config (model alias defined only in YAML, not in LiteLLM global model registry)

Additional details

Reproduction with prefixed model name

Even when using the prefixed model name instead of the alias:

curl -X POST http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hosted_vllm/qwen/qwen3.6-35b-a3b-fp8",
    "messages": [{"role": "user", "content": "Search for today AI news"}],
    "tools": [{"name": "web_search", "type": "web_search_20250305"}],
    "tool_choice": {"type": "tool", "name": "web_search"},
    "max_tokens": 1024
  }'

The short-circuit still does not fire — the request still reaches vLLM.


Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

v1.83.9

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The websearch_interception callback does not trigger for Anthropic Messages API requests through the experimental pass-through path with a proxy-config-only alias model, causing the request to reach the backend vLLM instead of being intercepted by the web search handler.

Guidance

  • The issue seems to be related to the order of operations in litellm/llms/anthropic/experimental_pass_through/messages/handler.py, specifically in the _try_websearch_short_circuit() function.
  • The custom_llm_provider is None when _try_websearch_short_circuit() is called, causing the provider_str to be an empty string, which is not in the enabled_providers list.
  • To fix this, you could modify the _try_websearch_short_circuit() function to handle the case where custom_llm_provider is None, or update the litellm.get_llm_provider() function to correctly handle proxy-config-only aliases.
  • Another possible solution is to use a self-hosted Node.js custom gateway to translate Anthropic Messages API requests, as described in the workaround used by the user.

Example

No code example is provided as the issue is complex and requires a deeper understanding of the LiteLLM codebase.

Notes

The issue is specific to the experimental pass-through path and proxy-config-only alias models, and may not affect other use cases. The user has already implemented a workaround using a self-hosted Node.js custom gateway.

Recommendation

Apply a workaround, such as using a self-hosted Node.js custom gateway, until the issue is fixed in the LiteLLM codebase. This will allow you to bypass the experimental pass-through path and correctly handle Anthropic Messages API requests.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The websearch_interception callback should:

  • Detect the standalone web_search request via _try_websearch_short_circuit()
  • Execute Brave Search server-side
  • Return a synthetic Anthropic response with search results — without forwarding to vLLM

(For non-standalone requests with other tools mixed in, pre_call_anthropic_messages_api() should convert the web_search tool to litellm_web_search format before forwarding.)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: websearch_interception does not fire on Anthropic `/v1/messages` with hosted_vllm + proxy alias model [1 participants]