litellm - ✅(Solved) Fix [Bug]: Local model support broken by three compounding failures in model validation, provider routing, and error classification [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23054Fetched 2026-04-08 00:38:40
View on GitHub
Comments
2
Participants
2
Timeline
9
Reactions
3
Timeline (top)
labeled ×3commented ×2cross-referenced ×2mentioned ×1

Error Message

litellm.exceptions.NotFoundError: This model isn't mapped yet. model=qwen3-coder-30b-a3b-instruct

litellm.BadRequestError: LiteLLMBadRequestError - BadRequestError caused by ContentPolicyViolationError

Router cooldown: Deployment put on cooldown for 60s after ContentPolicyViolationError

Root Cause

This isn't limited to one feature. These bugs affect completions, tool/function calling, token management, and capability detection — essentially any usage of a local model through LiteLLM. Tool calling surfaces the most silently destructive symptom (supports_function_calling() defaulting to False for unmapped models, causing downstream tools to skip tool-calling code paths without any error), but the root causes are broader. Every downstream tool that depends on LiteLLM for local model support is affected (OpenHands, Aider, SWE-agent, and anything routing through a LiteLLM proxy).

Fix Action

Fixed

PR fix notes

PR #9347: fix(litellm): filter embedding models

Description (problem / solution / changelog)

Description

We want to filter embedding models for litellm.

Via the litellm api endpoint, we are not informed whether a model is an embedding model. We can use of a helper function to telll.

How Has This Been Tested?

Unit tests & manual

Additional Options

  • [Optional] Please cherry-pick this PR to the latest release version.
  • [Optional] Override Linear Check
<!-- This is an auto-generated description by cubic. -->

Summary by cubic

Filter out embedding-only models from the litellm available models list so only chat/completion-capable models are returned. This prevents embedding models from appearing in the UI and APIs that list usable LLMs.

  • Bug Fixes
    • Skip models with mode: "embedding" in get_litellm_available_models via is_embedding_model (uses litellm.get_model_info; unknown/custom models default to non-embedding).
    • Added is_embedding_model helper and unit tests covering OpenAI, Cohere, Bedrock, and common non-embedding models.

<sup>Written for commit ff942d48871d3410276d4c8dc543e0ada82a524f. Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

Changed files

  • backend/onyx/server/manage/llm/api.py (modified, +5/-0)
  • backend/onyx/server/manage/llm/utils.py (modified, +15/-0)
  • backend/tests/unit/onyx/server/manage/llm/test_llm_provider_utils.py (modified, +33/-0)

PR #132: fix(engine) : validate LLM model capabilities at startup

Description (problem / solution / changelog)

Summary

This PR adds startup-time LLM capability validation in the Engine to prevent silent degradation when an incompatible model is configured.

The change validates the configured LLM model against LiteLLM metadata and enforces required capabilities used by the agent workflow.

Problem

Previously:

The Engine did not verify whether the configured LLM_MODEL supported critical capabilities required by the agent:

supports_function_calling for tool execution supports_response_schema for structured outputs used in next-speaker decisions

This could lead to:

Unsupported models starting successfully but silently skipping tool calls The model fabricating infrastructure data instead of querying real cluster state Misconfiguration only being visible in server logs and not surfaced clearly at startup

Solution

The following improvements were implemented:

  1. Startup Capability Validation

Added validate_llm_model() in engine/src/api/config/settings.py.

Behavior:

Calls LiteLLM get_model_info for the configured LLM_MODEL Verifies supports_function_calling Verifies supports_response_schema If either is missing, raises a clear and descriptive ValueError with:

model name

missing capability list

actionable compatibility link: https://models.litellm.ai/

  1. Fail-Fast Integration in App Lifespan

Integrated validation into startup flow in engine/src/api/asgi.py.

Behavior:

Validation runs after DB initialization and before app startup completes On success, logs validated model and provider at INFO level Startup fails immediately for incompatible registry-backed models

  1. Safe Fallback for Custom/Self-Hosted Models

If get_model_info fails because the model is not in LiteLLM registry:

Logs a warning Allows startup to continue

This preserves support for self-hosted/custom deployments configured via LLM_HOST.

Tests

Local validation performed for startup behavior:

Compatible model path:

LLM_MODEL set to gemini/gemini-2.5-pro Startup succeeded Validation success log emitted with provider

Unknown/custom model path:

LLM_MODEL set to a custom model id not in LiteLLM registry Warning logged Startup continued successfully

Health/docs sanity:

GET /api/v1/health/ returned 200 GET /docs returned 200

Notes:

Root path / returns 404 by design for API service Existing deprecation warnings are unrelated to this change

Compliance

✅ Follows Conventional Commits format ✅ Scope correctly set ✅ No debug statements introduced ✅ Backward-compatible for custom/self-hosted model setups ✅ Clear error handling with actionable guidance ✅ No unrelated functional changes outside startup validation path

Changed files

  • engine/src/api/asgi.py (modified, +5/-0)
  • engine/src/api/config/settings.py (modified, +44/-1)

Code Example

litellm.exceptions.NotFoundError: This model isn't mapped yet. model=qwen3-coder-30b-a3b-instruct

litellm.BadRequestError: LiteLLMBadRequestError - BadRequestError caused by ContentPolicyViolationError

Router cooldown: Deployment put on cooldown for 60s after ContentPolicyViolationError
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Local models served via LM Studio, Ollama, or vLLM are fundamentally broken when routed through LiteLLM. After extensive debugging, I've identified three distinct failure mechanisms that compound each other. Each one has existing issues filed, but they haven't been addressed as the connected system-level problem they are — a local model has to pass all three gates to work, and failing any one kills the workflow.

This isn't limited to one feature. These bugs affect completions, tool/function calling, token management, and capability detection — essentially any usage of a local model through LiteLLM. Tool calling surfaces the most silently destructive symptom (supports_function_calling() defaulting to False for unmapped models, causing downstream tools to skip tool-calling code paths without any error), but the root causes are broader. Every downstream tool that depends on LiteLLM for local model support is affected (OpenHands, Aider, SWE-agent, and anything routing through a LiteLLM proxy).

Three compounding failure mechanisms

1. get_model_info() throws on any model not in the static map

LiteLLM maintains model_prices_and_context_window.json as a registry of known models. Functions like get_model_info(), supports_function_calling(), and supports_tool_choice() consult this file and throw exceptions when a model isn't listed rather than returning sensible defaults.

Local models use arbitrary names (e.g., qwen3-coder-30b-a3b-instruct, deepseek-r1:32b). They will never be in this map. The error: This model isn't mapped yet. model=<model_name>

The openai/ prefix partially works around this by inheriting OpenAI's default capabilities, but model-specific metadata (context window size, cost, capability flags) remains unavailable. This breaks token management, cost tracking, and — most silently — capability detection: supports_function_calling() defaults to False for unmapped models (#14067), which causes downstream tools to skip tool-calling code paths entirely without any error or warning.

Related issues: #13953, #13935 (upgrade from v1.74.7→v1.75.5 broke previously working local models), #14067

2. Provider routing overrides explicit api_base

get_llm_provider() determines the provider by parsing model name prefixes. When a model name triggers provider detection, LiteLLM's routing ignores the user-configured api_base and sends requests to the cloud API endpoint instead. This affects all request types — completions, chat, tool calls — because the base URL is wrong.

From #15726: "If I 'disguise' the model as an openai model (e.g., model: openai/my-model) while keeping the local api_base, LiteLLM appears to respect the configuration."

The prefix parsing also strips path components — openai/gpt-oss-120b becomes gpt-oss-120b, causing 404 errors at the wrong endpoint (#14807). And the /v1 path handling (#14502) means LiteLLM can send requests to http://127.0.0.1:1234 without appending /v1/chat/completions, getting silent failures from local servers.

Related issues: #16045, #15726, #14807, #14502

3. Error misclassification triggers 60-second cooldown cascades

LiteLLM maps various upstream errors to ContentPolicyViolationError, which triggers a 60-second deployment cooldown via _set_cooldown_deployments(). During cooldown, all subsequent requests are blocked with "Your request was blocked"-style errors — regardless of whether they involve tool calling or plain completions.

Non-policy errors from local servers get misclassified as content violations. Local servers return non-standard error formats that LiteLLM doesn't recognise, so they fall into the content policy bucket. A single misclassified error puts the entire deployment on ice for 60 seconds. Error details are also stripped in the mapping (#19328), making debugging nearly impossible.

Related issues: #5225, #16484, #19328

How these compound

  1. User configures openai/my-local-model with api_base=http://localhost:1234/v1
  2. get_model_info() can't find the model → token management breaks, capability detection defaults to False → tool calling silently skipped, context window unknown
  3. Even if that's worked around, provider routing may override the api_base depending on exact prefix used → requests go to the wrong endpoint entirely
  4. If the request does reach the local server but returns an unexpected format, it gets classified as a content policy violation → 60-second cooldown → every subsequent request fails with a misleading error

You have to thread all three needles simultaneously. Getting two out of three right still results in a broken setup.

Expected behavior

Local models behind OpenAI-compatible endpoints should work when api_base is explicitly configured, without needing to be in the static model map. Specifically:

  • get_model_info() should return conservative defaults (128k context, function calling enabled, zero cost) for unknown models instead of throwing. A model not being in the price map should never prevent inference.
  • api_base should always take precedence over provider routing inference when explicitly set.
  • Error classification should have a fallback category for unrecognised error formats rather than defaulting to content policy violation.
  • supports_function_calling() should default to True (or at minimum be configurable) for unknown models behind an OpenAI-compatible endpoint — these endpoints advertise tool support via their API.

LiteLLM's January 2026 incident report stated that "a missing entry never blocks a call" — but the router's _pre_call_checks code path does call get_model_info() in a way that can throw, contradicting this.

Steps to Reproduce

  1. Start a local inference server (LM Studio or Ollama) serving any model (e.g., qwen3-coder-30b-a3b-instruct)
  2. Configure LiteLLM with: model="openai/my-local-model", api_base="http://localhost:1234/v1", api_key="sk-dummy"
  3. Make any completion call (with or without tools defined)
  4. Observe: get_model_info() throws "This model isn't mapped yet" — breaking token management, context window detection, and capability checks. supports_function_calling() returns False, silently disabling tool calling in downstream tools.
  5. Depending on model name prefix used, api_base may be overridden by provider routing, sending requests to cloud endpoints instead of the local server
  6. If the request reaches the local server but returns any non-standard error, observe ContentPolicyViolationError misclassification and 60-second cooldown blocking all subsequent requests

Relevant log output

litellm.exceptions.NotFoundError: This model isn't mapped yet. model=qwen3-coder-30b-a3b-instruct

litellm.BadRequestError: LiteLLMBadRequestError - BadRequestError caused by ContentPolicyViolationError

Router cooldown: Deployment put on cooldown for 60s after ContentPolicyViolationError

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

v1.74.x - v1.75.x (persists across versions)

Twitter / LinkedIn details

https://www.linkedin.com/in/brutchsama-jean-louis-645883178/

extent analysis

Fix Plan

To address the issues with local models served via LM Studio, Ollama, or vLLM when routed through LiteLLM, we need to make the following changes:

  • Modify get_model_info() to return conservative defaults for unknown models instead of throwing exceptions.
  • Ensure api_base takes precedence over provider routing inference when explicitly set.
  • Implement a fallback category for unrecognised error formats instead of defaulting to content policy violation.
  • Configure supports_function_calling() to default to True for unknown models behind an OpenAI-compatible endpoint.

Code Changes

Here are some example code snippets to illustrate the changes:

# Modify get_model_info() to return conservative defaults
def get_model_info(model_name):
    # Check if model is in the static map
    if model_name in model_prices_and_context_window:
        return model_prices_and_context_window[model_name]
    else:
        # Return conservative defaults for unknown models
        return {
            'context_window': 128000,
            'function_calling': True,
            'cost': 0
        }

# Ensure api_base takes precedence over provider routing inference
def get_llm_provider(model_name, api_base):
    # Check if api_base is explicitly set
    if api_base:
        return api_base
    else:
        # Determine provider by parsing model name prefixes
        # ...

# Implement a fallback category for unrecognised error formats
def classify_error(error):
    # Check if error is a content policy violation
    if isinstance(error, ContentPolicyViolationError):
        return 'content_policy_violation'
    else:
        # Fallback category for unrecognised error formats
        return 'unknown_error'

# Configure supports_function_calling() to default to True for unknown models
def supports_function_calling(model_name):
    # Check if model is in the static map
    if model_name in model_prices_and_context_window:
        return model_prices_and_context_window[model_name]['function_calling']
    else:
        # Default to True for unknown models behind an OpenAI-compatible endpoint
        return True

Verification

To verify that the fixes work, you can follow these steps:

  1. Start a local inference server (LM Studio or Ollama) serving any model.
  2. Configure LiteLLM with the modified code changes.
  3. Make a completion call with the modified LiteLLM configuration.
  4. Observe that get_model_info() returns conservative defaults for unknown models.
  5. Verify that api_base takes precedence over provider routing inference when explicitly set.
  6. Test error classification with a non-standard error format and observe that it falls back to the unknown_error category.
  7. Check that supports_function_calling() defaults to True for unknown models behind an OpenAI-compatible endpoint.

Extra Tips

To prevent regressions, it's essential to:

  • Thoroughly

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Local models behind OpenAI-compatible endpoints should work when api_base is explicitly configured, without needing to be in the static model map. Specifically:

  • get_model_info() should return conservative defaults (128k context, function calling enabled, zero cost) for unknown models instead of throwing. A model not being in the price map should never prevent inference.
  • api_base should always take precedence over provider routing inference when explicitly set.
  • Error classification should have a fallback category for unrecognised error formats rather than defaulting to content policy violation.
  • supports_function_calling() should default to True (or at minimum be configurable) for unknown models behind an OpenAI-compatible endpoint — these endpoints advertise tool support via their API.

LiteLLM's January 2026 incident report stated that "a missing entry never blocks a call" — but the router's _pre_call_checks code path does call get_model_info() in a way that can throw, contradicting this.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING