litellm - 💡(How to fix) Fix [Bug]: aembedding missing num_retries kwarg causes zero retries and no failover for embedding model groups [1 participants]

litellm2026-05-07 02:03:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#27363•Fetched 2026-05-07 03:32:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dredwilliams

Participants

dredwilliams

Timeline (top)

labeled ×3

Error Message

When using the LiteLLM proxy router with multiple deployments of the same embedding model_name across different hosts, failover never occurs when a host is unreachable. The router attempts the selected deployment once, fails, and returns an error — no retries, no failover to other deployments in the model group.

Root Cause

In litellm/router.py, the aembedding() method does not set num_retries in kwargs before calling async_function_with_fallbacks(). Compare aembedding to every other router method (acompletion, acreate_file, atext_completion, etc.) which all include this line:

kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)

aembedding (missing the line — bug):

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding          # ✓
        # kwargs["num_retries"] = ...                           # ✗ MISSING
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

acreate_file (correct pattern — for comparison):

async def acreate_file(self, model, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["original_function"] = self._acreate_file
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ✓ present
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Note: _update_kwargs_before_fallbacks also sets num_retries (line 2234), but it uses kwargs.get("num_retries", self.num_retries) — so if num_retries is already in kwargs it won't be overwritten. The problem is that async_function_with_retries pops num_retries from kwargs (line 5564), so on the retry path the key is gone and _update_kwargs_before_fallbacks may not re-set it correctly. The result is num_retries evaluates to 0 inside the retry loop, causing immediate failure with no retries attempted.

Fix Action

Fix

Add the missing line to aembedding() in router.py:

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ADD THIS
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Code Example

kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)

---

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding          # ✓
        # kwargs["num_retries"] = ...                           # ✗ MISSING
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

---

async def acreate_file(self, model, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["original_function"] = self._acreate_file
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ✓ present
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

---

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ADD THIS
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

---

model_list:
  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host1:11434

  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

Root Cause

kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)

aembedding (missing the line — bug):

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding          # ✓
        # kwargs["num_retries"] = ...                           # ✗ MISSING
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

acreate_file (correct pattern — for comparison):

async def acreate_file(self, model, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["original_function"] = self._acreate_file
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ✓ present
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Fix

Add the missing line to aembedding() in router.py:

async def aembedding(self, model, input, is_async=True, **kwargs):
    try:
        kwargs["model"] = model
        kwargs["input"] = input
        kwargs["original_function"] = self._aembedding
        kwargs["num_retries"] = kwargs.get("num_retries", self.num_retries)  # ADD THIS
        self._update_kwargs_before_fallbacks(model=model, kwargs=kwargs)
        response = await self.async_function_with_fallbacks(**kwargs)

Steps to Reproduce

Configure LiteLLM proxy with multiple deployments of the same embedding model_name:

model_list:
  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host1:11434

  - model_name: mxbai-embed-large
    litellm_params:
      model: ollama/mxbai-embed-large:latest
      api_base: http://host2:11434

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  allowed_fails: 0
  cooldown_time: 60

Take host1 offline
Send an embedding request for mxbai-embed-large
Observe the request fails immediately with no retries — host2 is never tried
Repeat the same test with a chat completion model group — failover works correctly, confirming the bug is specific to aembedding

Additional Context

This bug was discovered while investigating embedding failover in a multi-host Ollama setup. Chat completion failover was working correctly (after separately fixing the APIConnectionError cooldown bug — see related issue #27362 ). Embedding failover was still broken despite identical config, and code inspection revealed this missing line as the cause.

All other async router methods set num_retries in kwargs before calling async_function_with_fallbacks. The aembedding method appears to have been overlooked when this pattern was established or when retries were added to other methods.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.6 (also confirmed present in v1.82.1)

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: aembedding missing num_retries kwarg causes zero retries and no failover for embedding model groups [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

Code Example

Check for existing issues

What happened?

Bug Description

Root Cause

Fix

Steps to Reproduce

Additional Context

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: aembedding missing num_retries kwarg causes zero retries and no failover for embedding model groups [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

Code Example

Check for existing issues

What happened?

Bug Description

Root Cause

Fix

Steps to Reproduce

Additional Context

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

RELATED_DISCOVERY

TRENDING