litellm - 💡(How to fix) Fix [Bug]: APIConnectionError hardcoded in cooldown_handlers.py prevents failover to healthy deployments [1 participants]

Error Message

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}} router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434 router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}} ← same host again router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

Root Cause

In litellm/router_utils/cooldown_handlers.py, the _is_cooldown_required() function contains a hardcoded exclusion list:

# line 57
ignored_strings = ["APIConnectionError"]

This causes _is_cooldown_required() to return False for any exception containing "APIConnectionError" in its string representation, regardless of allowed_fails or allowed_fails_policy configuration. As a result:

The failed deployment is never added to the cooldown set
async_get_available_deployment continues to select the same dead host on every retry
All configured retries are wasted against the same unreachable host
No failover occurs

This is confirmed in debug logs, which show get_available_deployment returning the identical deployment (same api_base) on every retry attempt:

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

Fix Action

Fix

Remove "APIConnectionError" from the ignored_strings list, or replace it with an empty list:

# cooldown_handlers.py line 57
# Before:
ignored_strings = ["APIConnectionError"]

# After:
ignored_strings = []

With this fix applied, the cooldown mechanism correctly marks unreachable hosts as unhealthy after the configured number of allowed_fails, and subsequent retries are routed to other healthy deployments in the model group.

Code Example

# line 57
ignored_strings = ["APIConnectionError"]

---

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

---

# cooldown_handlers.py line 57
# Before:
ignored_strings = ["APIConnectionError"]

# After:
ignored_strings = []

---

model_list:
  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host1:11434

  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60

---

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

When using the LiteLLM proxy with multiple deployments of the same model across different Ollama hosts, failover to a healthy deployment never occurs when a host is unreachable. The router retries the same dead host repeatedly until num_retries is exhausted, then returns an error — even though other healthy deployments exist in the same model group.

Root Cause

In litellm/router_utils/cooldown_handlers.py, the _is_cooldown_required() function contains a hardcoded exclusion list:

# line 57
ignored_strings = ["APIConnectionError"]

The failed deployment is never added to the cooldown set
async_get_available_deployment continues to select the same dead host on every retry
All configured retries are wasted against the same unreachable host
No failover occurs

This is confirmed in debug logs, which show get_available_deployment returning the identical deployment (same api_base) on every retry attempt:

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

Attempted Troubleshooting

The following config options were tried and had no effect due to this bug:

allowed_fails: 0
allowed_fails_policy.APIConnectionErrorAllowedFails: 0
background_health_checks: true
enable_pre_call_checks: true
order field on deployments
weight field on deployments
Various routing_strategy values (least-busy, simple-shuffle)

Fix

Remove "APIConnectionError" from the ignored_strings list, or replace it with an empty list:

# cooldown_handlers.py line 57
# Before:
ignored_strings = ["APIConnectionError"]

# After:
ignored_strings = []

Steps to Reproduce

Configure LiteLLM proxy with multiple deployments of the same model_name across different Ollama hosts:

model_list:
  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host1:11434

  - model_name: my-model
    litellm_params:
      model: ollama_chat/llama3.1:8b
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60

Take host1 offline (stop Ollama or firewall the port)
Send a chat completion request for my-model
Observe that all retries go to host1 and the request fails — host2 is never tried

Additional Context

The intent of ignored_strings appears to be to avoid cooling down deployments on transient or client-side errors. However, APIConnectionError specifically indicates the host is unreachable — it is precisely the error type that should trigger cooldown and failover. Excluding it defeats the entire purpose of multi-deployment routing for self-hosted backends like Ollama.

Relevant log output

router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434
router.py:8924 - get_available_deployment for model: gemma3:12b, Selected deployment: {'litellm_params': {'api_base': 'http://calormen:11434', ...}}  ← same host again
router.py:2067 - litellm.acompletion(model=ollama_chat/gemma3:12b) Exception litellm.APIConnectionError: Cannot connect to host calormen:11434

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.6 (also confirmed present in v1.82.1)

Twitter / LinkedIn details

No response

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: APIConnectionError hardcoded in cooldown_handlers.py prevents failover to healthy deployments [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

Code Example

Check for existing issues

What happened?

Bug Description

Root Cause

Attempted Troubleshooting

Fix

Steps to Reproduce

Steps to Reproduce

Additional Context

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: APIConnectionError hardcoded in cooldown_handlers.py prevents failover to healthy deployments [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

Code Example

Check for existing issues

What happened?

Bug Description

Root Cause

Attempted Troubleshooting

Fix

Steps to Reproduce

Steps to Reproduce

Additional Context

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

RELATED_DISCOVERY

TRENDING