litellm - 💡(How to fix) Fix [Bug]: aembedding missing num_retries kwarg causes zero retries and no failover for embedding model groups [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#27363Fetched 2026-05-07 03:32:58
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

Error Message

When using the LiteLLM proxy router with multiple deployments of the same embedding model_name across different hosts, failover never occurs when a host is unreachable. The router attempts the selected deployment once, fails, and returns an error — no retries, no failover to other deployments in the model group.

Root Cause

In litellm/router.py, the aembedding() method does not set num_retries in kwargs before calling async_function_with_fallbacks(). Compare aembedding to every other router method (acompletion, acreate_file, atext_completion, etc.) which all include this line:

kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)

aembedding (missing the line — bug):

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding          # ✓
        # kwargs["num_retries"] = ...                           # ✗ MISSING
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

acreate_file (correct pattern — for comparison):

async def acreate_file(self, model, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["original_function"] = self._acreate_file
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ✓ present
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Note: _update_kwargs_before_fallbacks also sets num_retries (line 2234), but it uses kwargs.get("num_retries", self.num_retries) — so if num_retries is already in kwargs it won't be overwritten. The problem is that async_function_with_retries pops num_retries from kwargs (line 5564), so on the retry path the key is gone and _update_kwargs_before_fallbacks may not re-set it correctly. The result is num_retries evaluates to 0 inside the retry loop, causing immediate failure with no retries attempted.

Fix Action

Fix

Add the missing line to aembedding() in router.py:

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ADD THIS
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Code Example

kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)

---

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding          # ✓
        # kwargs["num_retries"] = ...                           # ✗ MISSING
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

---

async def acreate_file(self, model, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["original_function"] = self._acreate_file
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ✓ present
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

---

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ADD THIS
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

---

model_list:
  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host1:11434

  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

When using the LiteLLM proxy router with multiple deployments of the same embedding model_name across different hosts, failover never occurs when a host is unreachable. The router attempts the selected deployment once, fails, and returns an error — no retries, no failover to other deployments in the model group.

Root Cause

In litellm/router.py, the aembedding() method does not set num_retries in kwargs before calling async_function_with_fallbacks(). Compare aembedding to every other router method (acompletion, acreate_file, atext_completion, etc.) which all include this line:

kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)

aembedding (missing the line — bug):

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding          # ✓
        # kwargs["num_retries"] = ...                           # ✗ MISSING
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

acreate_file (correct pattern — for comparison):

async def acreate_file(self, model, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["original_function"] = self._acreate_file
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ✓ present
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Note: _update_kwargs_before_fallbacks also sets num_retries (line 2234), but it uses kwargs.get("num_retries", self.num_retries) — so if num_retries is already in kwargs it won't be overwritten. The problem is that async_function_with_retries pops num_retries from kwargs (line 5564), so on the retry path the key is gone and _update_kwargs_before_fallbacks may not re-set it correctly. The result is num_retries evaluates to 0 inside the retry loop, causing immediate failure with no retries attempted.

Fix

Add the missing line to aembedding() in router.py:

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ADD THIS
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Steps to Reproduce

  1. Configure LiteLLM proxy with multiple deployments of the same embedding model_name:
model_list:
  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host1:11434

  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60
  1. Take host1 offline
  2. Send an embedding request for mxbai-embed-large
  3. Observe the request fails immediately with no retries — host2 is never tried
  4. Repeat the same test with a chat completion model group — failover works correctly, confirming the bug is specific to aembedding

Additional Context

This bug was discovered while investigating embedding failover in a multi-host Ollama setup. Chat completion failover was working correctly (after separately fixing the APIConnectionError cooldown bug — see related issue #27362 ). Embedding failover was still broken despite identical config, and code inspection revealed this missing line as the cause.

All other async router methods set num_retries in kwargs before calling async_function_with_fallbacks. The aembedding method appears to have been overlooked when this pattern was established or when retries were added to other methods.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.6 (also confirmed present in v1.82.1)

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: aembedding missing num_retries kwarg causes zero retries and no failover for embedding model groups [1 participants]