litellm - 💡(How to fix) Fix [Bug]: WebSearch Interception follow-up request ignores custom `api_base` / `api_key`, causing 401 and deployment cooldown cascade [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26389Fetched 2026-04-24 10:36:35
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

When using the websearch_interception callback with models behind a custom Anthropic-compatible endpoint (e.g. api_base: https://my-proxy.example.com/anthropic), the follow-up request in the agentic loop ignores the deployment's api_base and api_key and falls back to https://api.anthropic.com with ANTHROPIC_API_KEY env var. This results in a 401 Unauthorized error, which then triggers a deployment cooldown cascade that blocks all subsequent requests to that model for the cooldown duration (default 5s).

Error Message

httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://api.anthropic.com/v1/messages'

Root Cause

Two code locations are involved:

Fix Action

Workaround

Setting the environment variable ANTHROPIC_API_BASE to the custom endpoint can mitigate this temporarily, but this is not viable when mixing real Anthropic models with third-party Anthropic-compatible endpoints (all would be routed to the same api_base).

Code Example

model_list:
     - model_name: kimi-k2.5-tencent-claude
       litellm_params:
         model: anthropic/kimi-k2.5
         api_base: https://api.lkeap.cloud.tencent.com/plan/anthropic
         api_key: os.environ/TENCENT_API_KEY
         custom_llm_provider: anthropic

---

litellm_settings:
     callbacks:
       - websearch_interception
     websearch_interception_params:
       enabled_providers:
         - anthropic

---

httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://api.anthropic.com/v1/messages'

---

RouterRateLimitError: No deployments available for selected model, Try again in 5 seconds.
Passed model=kimi-k2.5-tencent-claude. cooldown_list=['7fbca9db-18a0-4f6d-922a-8605353125de']

---

litellm_logging_obj.model_call_details["agentic_loop_params"] = {
    "model": original_model,
    "custom_llm_provider": custom_llm_provider,
}

---

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    # api_key and api_base are NOT passed
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

---

# litellm/llms/anthropic/common_utils.py:547-555
return (
    api_base                          # None
    or get_secret_str("ANTHROPIC_API_BASE")   # not set
    or get_secret_str("ANTHROPIC_BASE_URL")   # not set
    or "https://api.anthropic.com"    # <-- fallback
)

---

if litellm_logging_obj is not None:
    agentic_loop_params: Dict[str, Any] = {
        "model": original_model,
        "custom_llm_provider": custom_llm_provider,
    }
    if dynamic_api_key is not None:
        agentic_loop_params["api_key"] = dynamic_api_key
    if dynamic_api_base is not None:
        agentic_loop_params["api_base"] = dynamic_api_base
    litellm_logging_obj.model_call_details["agentic_loop_params"] = agentic_loop_params

---

_followup_api_key: Optional[str] = None
_followup_api_base: Optional[str] = None
if logging_obj is not None:
    agentic_params = logging_obj.model_call_details.get("agentic_loop_params", {})
    full_model_name = agentic_params.get("model", model)
    _followup_api_key = agentic_params.get("api_key")
    _followup_api_base = agentic_params.get("api_base")

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    api_key=_followup_api_key,
    api_base=_followup_api_base,
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

When using the websearch_interception callback with models behind a custom Anthropic-compatible endpoint (e.g. api_base: https://my-proxy.example.com/anthropic), the follow-up request in the agentic loop ignores the deployment's api_base and api_key and falls back to https://api.anthropic.com with ANTHROPIC_API_KEY env var. This results in a 401 Unauthorized error, which then triggers a deployment cooldown cascade that blocks all subsequent requests to that model for the cooldown duration (default 5s).

Environment

  • LiteLLM Proxy (latest main branch)
  • Model configured with custom_llm_provider: anthropic and a custom api_base (e.g. Tencent Cloud Anthropic-compatible endpoint)
  • websearch_interception callback enabled in litellm_settings.callbacks
  • Request via Anthropic Messages API: POST /v1/messages?beta=true

Steps to Reproduce

  1. Add a model deployment with custom_llm_provider: anthropic and a custom api_base (not api.anthropic.com), e.g.:

    model_list:
      - model_name: kimi-k2.5-tencent-claude
        litellm_params:
          model: anthropic/kimi-k2.5
          api_base: https://api.lkeap.cloud.tencent.com/plan/anthropic
          api_key: os.environ/TENCENT_API_KEY
          custom_llm_provider: anthropic
  2. Enable websearch_interception in callbacks:

    litellm_settings:
      callbacks:
        - websearch_interception
      websearch_interception_params:
        enabled_providers:
          - anthropic
  3. Send a request to this model via the Anthropic Messages API with a web_search tool, such that the model returns a web_search tool_use response.

  4. Observe the logs.

Expected Behavior

The follow-up request (after web search results are collected) should use the same api_base and api_key as the original deployment, sending the request to https://api.lkeap.cloud.tencent.com/plan/anthropic/v1/messages.

Actual Behavior

The follow-up request is sent to https://api.anthropic.com/v1/messages with no valid API key, resulting in:

httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://api.anthropic.com/v1/messages'

Then the Router puts the deployment into cooldown (cooldown_time: 5), causing all subsequent requests to this model to return 429 No deployments available:

RouterRateLimitError: No deployments available for selected model, Try again in 5 seconds.
Passed model=kimi-k2.5-tencent-claude. cooldown_list=['7fbca9db-18a0-4f6d-922a-8605353125de']

Root Cause

Two code locations are involved:

1. agentic_loop_params does not store api_base / api_key

File: litellm/llms/anthropic/experimental_pass_through/messages/handler.py:351-354

litellm_logging_obj.model_call_details["agentic_loop_params"] = {
    "model": original_model,
    "custom_llm_provider": custom_llm_provider,
}

Only model and custom_llm_provider are preserved. The dynamic_api_key and dynamic_api_base (returned by get_llm_provider() just above) are discarded.

2. Follow-up request has no api_base / api_key

File: litellm/integrations/websearch_interception/handler.py:783-789

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    # api_key and api_base are NOT passed
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

anthropic_messages.acreate() receives api_key=None and api_base=None, so the underlying AnthropicModelInfo.get_api_base() falls back through:

# litellm/llms/anthropic/common_utils.py:547-555
return (
    api_base                          # None
    or get_secret_str("ANTHROPIC_API_BASE")   # not set
    or get_secret_str("ANTHROPIC_BASE_URL")   # not set
    or "https://api.anthropic.com"    # <-- fallback
)

3. Cooldown Cascade

The 401 from the wrong endpoint triggers the Router's cooldown mechanism for the deployment. Since there is typically only one deployment per model, all subsequent requests fail with 429 until the cooldown expires. If requests keep coming, the cooldown keeps getting renewed, creating a persistent outage.

Suggested Fix

File 1: litellm/llms/anthropic/experimental_pass_through/messages/handler.py

Store dynamic_api_key and dynamic_api_base in agentic_loop_params:

if litellm_logging_obj is not None:
    agentic_loop_params: Dict[str, Any] = {
        "model": original_model,
        "custom_llm_provider": custom_llm_provider,
    }
    if dynamic_api_key is not None:
        agentic_loop_params["api_key"] = dynamic_api_key
    if dynamic_api_base is not None:
        agentic_loop_params["api_base"] = dynamic_api_base
    litellm_logging_obj.model_call_details["agentic_loop_params"] = agentic_loop_params

File 2: litellm/integrations/websearch_interception/handler.py

Retrieve and pass api_key / api_base to the follow-up request:

_followup_api_key: Optional[str] = None
_followup_api_base: Optional[str] = None
if logging_obj is not None:
    agentic_params = logging_obj.model_call_details.get("agentic_loop_params", {})
    full_model_name = agentic_params.get("model", model)
    _followup_api_key = agentic_params.get("api_key")
    _followup_api_base = agentic_params.get("api_base")

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    api_key=_followup_api_key,
    api_base=_followup_api_base,
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

Workaround

Setting the environment variable ANTHROPIC_API_BASE to the custom endpoint can mitigate this temporarily, but this is not viable when mixing real Anthropic models with third-party Anthropic-compatible endpoints (all would be routed to the same api_base).

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.7-stable.patch.1

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The most likely fix is to update the agentic_loop_params to store api_base and api_key, and then pass these values to the follow-up request in the websearch_interception callback.

Guidance

  • Update litellm/llms/anthropic/experimental_pass_through/messages/handler.py to store dynamic_api_key and dynamic_api_base in agentic_loop_params.
  • Update litellm/integrations/websearch_interception/handler.py to retrieve and pass api_key and api_base to the follow-up request.
  • Verify that the follow-up request is sent to the correct api_base with the correct api_key.
  • Test the workaround by setting the environment variable ANTHROPIC_API_BASE to the custom endpoint, but note that this is not a viable long-term solution.

Example

# litellm/llms/anthropic/experimental_pass_through/messages/handler.py
agentic_loop_params = {
    "model": original_model,
    "custom_llm_provider": custom_llm_provider,
    "api_key": dynamic_api_key,
    "api_base": dynamic_api_base,
}

# litellm/integrations/websearch_interception/handler.py
_followup_api_key = agentic_params.get("api_key")
_followup_api_base = agentic_params.get("api_base")
final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    api_key=_followup_api_key,
    api_base=_followup_api_base,
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

Notes

The provided code changes should fix the issue, but it's essential to test them thoroughly to ensure that they work as expected. Additionally, the workaround using the `ANTH

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING