litellm - 💡(How to fix) Fix [Bug]: WebSearch Interception follow-up request ignores custom `api_base` / `api_key`, causing 401 and deployment cooldown cascade [1 participants]

litellm2026-04-24 05:46:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26389•Fetched 2026-04-24 10:36:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jesset

Participants

jesset

Timeline (top)

labeled ×3

When using the websearch_interception callback with models behind a custom Anthropic-compatible endpoint (e.g. api_base: https://my-proxy.example.com/anthropic), the follow-up request in the agentic loop ignores the deployment's api_base and api_key and falls back to https://api.anthropic.com with ANTHROPIC_API_KEY env var. This results in a 401 Unauthorized error, which then triggers a deployment cooldown cascade that blocks all subsequent requests to that model for the cooldown duration (default 5s).

Error Message

httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://api.anthropic.com/v1/messages'

Root Cause

Two code locations are involved:

Fix Action

Workaround

Setting the environment variable ANTHROPIC_API_BASE to the custom endpoint can mitigate this temporarily, but this is not viable when mixing real Anthropic models with third-party Anthropic-compatible endpoints (all would be routed to the same api_base).

Code Example

model_list:
     - model_name: kimi-k2.5-tencent-claude
       litellm_params:
         model: anthropic/kimi-k2.5
         api_base: https://api.lkeap.cloud.tencent.com/plan/anthropic
         api_key: os.environ/TENCENT_API_KEY
         custom_llm_provider: anthropic

---

litellm_settings:
     callbacks:
       - websearch_interception
     websearch_interception_params:
       enabled_providers:
         - anthropic

---

httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://api.anthropic.com/v1/messages'

---

RouterRateLimitError: No deployments available for selected model, Try again in 5 seconds.
Passed model=kimi-k2.5-tencent-claude. cooldown_list=['7fbca9db-18a0-4f6d-922a-8605353125de']

---

litellm_logging_obj.model_call_details["agentic_loop_params"] = {
    "model": original_model,
    "custom_llm_provider": custom_llm_provider,
}

---

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    # api_key and api_base are NOT passed
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

---

# litellm/llms/anthropic/common_utils.py:547-555
return (
    api_base                          # None
    or get_secret_str("ANTHROPIC_API_BASE")   # not set
    or get_secret_str("ANTHROPIC_BASE_URL")   # not set
    or "https://api.anthropic.com"    # <-- fallback
)

---

if litellm_logging_obj is not None:
    agentic_loop_params: Dict[str, Any] = {
        "model": original_model,
        "custom_llm_provider": custom_llm_provider,
    }
    if dynamic_api_key is not None:
        agentic_loop_params["api_key"] = dynamic_api_key
    if dynamic_api_base is not None:
        agentic_loop_params["api_base"] = dynamic_api_base
    litellm_logging_obj.model_call_details["agentic_loop_params"] = agentic_loop_params

---

_followup_api_key: Optional[str] = None
_followup_api_base: Optional[str] = None
if logging_obj is not None:
    agentic_params = logging_obj.model_call_details.get("agentic_loop_params", {})
    full_model_name = agentic_params.get("model", model)
    _followup_api_key = agentic_params.get("api_key")
    _followup_api_base = agentic_params.get("api_base")

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    api_key=_followup_api_key,
    api_base=_followup_api_base,
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

Environment

LiteLLM Proxy (latest main branch)
Model configured with custom_llm_provider: anthropic and a custom api_base (e.g. Tencent Cloud Anthropic-compatible endpoint)
websearch_interception callback enabled in litellm_settings.callbacks
Request via Anthropic Messages API: POST /v1/messages?beta=true

Steps to Reproduce

Add a model deployment with custom_llm_provider: anthropic and a custom api_base (not api.anthropic.com), e.g.:

model_list:
  - model_name: kimi-k2.5-tencent-claude
    litellm_params:
      model: anthropic/kimi-k2.5
      api_base: https://api.lkeap.cloud.tencent.com/plan/anthropic
      api_key: os.environ/TENCENT_API_KEY
      custom_llm_provider: anthropic

Enable websearch_interception in callbacks:

litellm_settings:
  callbacks:
    - websearch_interception
  websearch_interception_params:
    enabled_providers:
      - anthropic

Send a request to this model via the Anthropic Messages API with a web_search tool, such that the model returns a web_search tool_use response.
Observe the logs.

Expected Behavior

The follow-up request (after web search results are collected) should use the same api_base and api_key as the original deployment, sending the request to https://api.lkeap.cloud.tencent.com/plan/anthropic/v1/messages.

Actual Behavior

The follow-up request is sent to https://api.anthropic.com/v1/messages with no valid API key, resulting in:

httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://api.anthropic.com/v1/messages'

Then the Router puts the deployment into cooldown (cooldown_time: 5), causing all subsequent requests to this model to return 429 No deployments available:

RouterRateLimitError: No deployments available for selected model, Try again in 5 seconds.
Passed model=kimi-k2.5-tencent-claude. cooldown_list=['7fbca9db-18a0-4f6d-922a-8605353125de']

Root Cause

Two code locations are involved:

1. `agentic_loop_params` does not store `api_base` / `api_key`

File: litellm/llms/anthropic/experimental_pass_through/messages/handler.py:351-354

litellm_logging_obj.model_call_details["agentic_loop_params"] = {
    "model": original_model,
    "custom_llm_provider": custom_llm_provider,
}

Only model and custom_llm_provider are preserved. The dynamic_api_key and dynamic_api_base (returned by get_llm_provider() just above) are discarded.

2. Follow-up request has no `api_base` / `api_key`

File: litellm/integrations/websearch_interception/handler.py:783-789

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    # api_key and api_base are NOT passed
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

anthropic_messages.acreate() receives api_key=None and api_base=None, so the underlying AnthropicModelInfo.get_api_base() falls back through:

# litellm/llms/anthropic/common_utils.py:547-555
return (
    api_base                          # None
    or get_secret_str("ANTHROPIC_API_BASE")   # not set
    or get_secret_str("ANTHROPIC_BASE_URL")   # not set
    or "https://api.anthropic.com"    # <-- fallback
)

3. Cooldown Cascade

The 401 from the wrong endpoint triggers the Router's cooldown mechanism for the deployment. Since there is typically only one deployment per model, all subsequent requests fail with 429 until the cooldown expires. If requests keep coming, the cooldown keeps getting renewed, creating a persistent outage.

Suggested Fix

File 1: `litellm/llms/anthropic/experimental_pass_through/messages/handler.py`

Store dynamic_api_key and dynamic_api_base in agentic_loop_params:

if litellm_logging_obj is not None:
    agentic_loop_params: Dict[str, Any] = {
        "model": original_model,
        "custom_llm_provider": custom_llm_provider,
    }
    if dynamic_api_key is not None:
        agentic_loop_params["api_key"] = dynamic_api_key
    if dynamic_api_base is not None:
        agentic_loop_params["api_base"] = dynamic_api_base
    litellm_logging_obj.model_call_details["agentic_loop_params"] = agentic_loop_params

File 2: `litellm/integrations/websearch_interception/handler.py`

Retrieve and pass api_key / api_base to the follow-up request:

_followup_api_key: Optional[str] = None
_followup_api_base: Optional[str] = None
if logging_obj is not None:
    agentic_params = logging_obj.model_call_details.get("agentic_loop_params", {})
    full_model_name = agentic_params.get("model", model)
    _followup_api_key = agentic_params.get("api_key")
    _followup_api_base = agentic_params.get("api_base")

final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    api_key=_followup_api_key,
    api_base=_followup_api_base,
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

Workaround

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.7-stable.patch.1

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The most likely fix is to update the agentic_loop_params to store api_base and api_key, and then pass these values to the follow-up request in the websearch_interception callback.

Guidance

Update litellm/llms/anthropic/experimental_pass_through/messages/handler.py to store dynamic_api_key and dynamic_api_base in agentic_loop_params.
Update litellm/integrations/websearch_interception/handler.py to retrieve and pass api_key and api_base to the follow-up request.
Verify that the follow-up request is sent to the correct api_base with the correct api_key.
Test the workaround by setting the environment variable ANTHROPIC_API_BASE to the custom endpoint, but note that this is not a viable long-term solution.

Example

# litellm/llms/anthropic/experimental_pass_through/messages/handler.py
agentic_loop_params = {
    "model": original_model,
    "custom_llm_provider": custom_llm_provider,
    "api_key": dynamic_api_key,
    "api_base": dynamic_api_base,
}

# litellm/integrations/websearch_interception/handler.py
_followup_api_key = agentic_params.get("api_key")
_followup_api_base = agentic_params.get("api_base")
final_response = await anthropic_messages.acreate(
    max_tokens=max_tokens,
    messages=follow_up_messages,
    model=full_model_name,
    api_key=_followup_api_key,
    api_base=_followup_api_base,
    **optional_params_without_max_tokens,
    **kwargs_for_followup,
)

Notes

The provided code changes should fix the issue, but it's essential to test them thoroughly to ensure that they work as expected. Additionally, the workaround using the `ANTH

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory optimization #batch processing #GPU compatibility #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - 💡(How to fix) Fix [Bug]: WebSearch Interception follow-up request ignores custom `api_base` / `api_key`, causing 401 and deployment cooldown cascade [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Check for existing issues

What happened?

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

1. agentic_loop_params does not store api_base / api_key

2. Follow-up request has no api_base / api_key

3. Cooldown Cascade

Suggested Fix

File 1: litellm/llms/anthropic/experimental_pass_through/messages/handler.py

File 2: litellm/integrations/websearch_interception/handler.py

Workaround

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. `agentic_loop_params` does not store `api_base` / `api_key`

2. Follow-up request has no `api_base` / `api_key`

File 1: `litellm/llms/anthropic/experimental_pass_through/messages/handler.py`

File 2: `litellm/integrations/websearch_interception/handler.py`