litellm - 💡(How to fix) Fix [Bug]: APIConnectionError hardcoded in cooldown_handlers.py prevents failover to healthy deployments [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#27362Fetched 2026-05-07 03:33:00
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×1

Error Message

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}} router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434 router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}} ← same host again router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

Root Cause

In litellm/router_utils/cooldown_handlers.py, the _is_cooldown_required() function contains a hardcoded exclusion list:

# line 57
ignored_strings = ["APIConnectionError"]

This causes _is_cooldown_required() to return False for any exception containing "APIConnectionError" in its string representation, regardless of allowed_fails or allowed_fails_policy configuration. As a result:

  • The failed deployment is never added to the cooldown set
  • async_get_available_deployment continues to select the same dead host on every retry
  • All configured retries are wasted against the same unreachable host
  • No failover occurs

This is confirmed in debug logs, which show get_available_deployment returning the identical deployment (same api_base) on every retry attempt:

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

Fix Action

Fix

Remove "APIConnectionError" from the ignored_strings list, or replace it with an empty list:

# cooldown_handlers.py line 57
# Before:
ignored_strings = ["APIConnectionError"]

# After:
ignored_strings = []

With this fix applied, the cooldown mechanism correctly marks unreachable hosts as unhealthy after the configured number of allowed_fails, and subsequent retries are routed to other healthy deployments in the model group.

Code Example

# line 57
ignored_strings = ["APIConnectionError"]

---

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

---

# cooldown_handlers.py line 57
# Before:
ignored_strings = ["APIConnectionError"]

# After:
ignored_strings = []

---

model_list:
  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host1:11434

  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60

---

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

When using the LiteLLM proxy with multiple deployments of the same model across different Ollama hosts, failover to a healthy deployment never occurs when a host is unreachable. The router retries the same dead host repeatedly until num_retries is exhausted, then returns an error — even though other healthy deployments exist in the same model group.

Root Cause

In litellm/router_utils/cooldown_handlers.py, the _is_cooldown_required() function contains a hardcoded exclusion list:

# line 57
ignored_strings = ["APIConnectionError"]

This causes _is_cooldown_required() to return False for any exception containing "APIConnectionError" in its string representation, regardless of allowed_fails or allowed_fails_policy configuration. As a result:

  • The failed deployment is never added to the cooldown set
  • async_get_available_deployment continues to select the same dead host on every retry
  • All configured retries are wasted against the same unreachable host
  • No failover occurs

This is confirmed in debug logs, which show get_available_deployment returning the identical deployment (same api_base) on every retry attempt:

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

Attempted Troubleshooting

The following config options were tried and had no effect due to this bug:

  • allowed_fails: 0
  • allowed_fails_policy.APIConnectionErrorAllowedFails: 0
  • background_health_checks: true
  • enable_pre_call_checks: true
  • order field on deployments
  • weight field on deployments
  • Various routing_strategy values (least-busy, simple-shuffle)

Fix

Remove "APIConnectionError" from the ignored_strings list, or replace it with an empty list:

# cooldown_handlers.py line 57
# Before:
ignored_strings = ["APIConnectionError"]

# After:
ignored_strings = []

With this fix applied, the cooldown mechanism correctly marks unreachable hosts as unhealthy after the configured number of allowed_fails, and subsequent retries are routed to other healthy deployments in the model group.

Steps to Reproduce

Steps to Reproduce

  1. Configure LiteLLM proxy with multiple deployments of the same model_name across different Ollama hosts:
model_list:
  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host1:11434

  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60
  1. Take host1 offline (stop Ollama or firewall the port)
  2. Send a chat completion request for my-model
  3. Observe that all retries go to host1 and the request fails — host2 is never tried

Additional Context

The intent of ignored_strings appears to be to avoid cooling down deployments on transient or client-side errors. However, APIConnectionError specifically indicates the host is unreachable — it is precisely the error type that should trigger cooldown and failover. Excluding it defeats the entire purpose of multi-deployment routing for self-hosted backends like Ollama.

Relevant log output

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.6 (also confirmed present in v1.82.1)

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: APIConnectionError hardcoded in cooldown_handlers.py prevents failover to healthy deployments [1 participants]