litellm - 💡(How to fix) Fix [Bug]: Health check fails for vLLM classify models — always reports unhealthy [1 comments, 2 participants]

cdgraff · 2026-03-21T03:36:29Z

[litellm] When a vLLM classification model is registered in model list using the hosted vllm/ prefix, the LiteLLM health check always reports it as unhealthy ,… When a vLLM classification model is registered in `model_list` using the `hosted_vllm/` prefix, the LiteLLM health check always reports it as `unhealthy`, even though the model works correctly in production via the `/vllm/classify` endpoint. The health check sends a request to `/chat/completions` but classification models only expose a `/classify` endpoint — causing a `404 Not Found` error that incorrectly marks the deployment as unhealthy. ## Workaround Setting `disable_health_check: true` in `general_settings` does not resolve the issue in v1.82.3. The model works correctly in production — this is a false negative in the health check only. ### Description When a vLLM classification model is registered in `model_list` using the `hosted_vllm/` prefix, the LiteLLM health check always reports it as `unhealthy`, even though the model works correctly in production via the `/vllm/classify` endpoint. The health check sends a request to `/chat/completions` but classification models only expose a `/classify` endpoint — causing a `404 Not Found` error that incorrectly marks the deployment as unhealthy. ### LiteLLM Version v1.82.3-stable ### Config ```yaml model_list: - model_name: model/xlm-roberta-base-mention-classifier litellm_params: model: hosted_vllm/model/xlm-roberta-base-mention-classifier api_base: http://localhost:8095 ``` ### Steps to Reproduce 1. Register a vLLM classification model in `model_list` (e.g. XLM-RoBERTa) 2. Call `GET /health` — the model appears in `unhealthy_endpoints` 3. Call `POST /vllm/classify` with the same model — it works correctly ### Expected Behavior Health check should detect that the model's `mode` is `classify` and use the `/classify` endpoint instead of `/chat/completions`, OR skip the health check for classify models entirely. ### Actual Behavior Health check always calls `/chat/completions`, gets a `404 Not Found` from vLLM, and marks the model as unhealthy: ``` litellm.NotFoundError: Hosted_vllmException - {"detail":"Not Found"} httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://localhost:8095/chat/completions' ``` ### Workaround Setting `disable_health_check: true` in `general_settings` does not resolve the issue in v1.82.3. The model works correctly in production — this is a false negative in the health check only. ### Related Issues - #11205 — Feature request for vLLM classify endpoint support (implemented in production via `/vllm/classify`, but health check not updated accordingly)

litellm2026-03-21 03:36:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24265•Fetched 2026-04-08 01:08:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

cdgraff

Participants

ankandrew

cdgraff

Timeline (top)

commented ×1mentioned ×1subscribed ×1

When a vLLM classification model is registered in model_list using the hosted_vllm/ prefix, the LiteLLM health check always reports it as unhealthy, even though the model works correctly in production via the /vllm/classify endpoint.

The health check sends a request to /chat/completions but classification models only expose a /classify endpoint — causing a 404 Not Found error that incorrectly marks the deployment as unhealthy.

Error Message

litellm.NotFoundError: Hosted_vllmException - {"detail":"Not Found"} httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://localhost:8095/chat/completions'

Root Cause

Fix Action

Workaround

Setting disable_health_check: true in general_settings does not resolve the issue in v1.82.3.

The model works correctly in production — this is a false negative in the health check only.

Code Example

model_list:
  - model_name: model/xlm-roberta-base-mention-classifier
    litellm_params:
      model: hosted_vllm/model/xlm-roberta-base-mention-classifier
      api_base: http://localhost:8095

---

litellm.NotFoundError: Hosted_vllmException - {"detail":"Not Found"}
httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://localhost:8095/chat/completions'

RAW_BUFFERClick to expand / collapse

Description

LiteLLM Version

v1.82.3-stable

Config

model_list:
  - model_name: model/xlm-roberta-base-mention-classifier
    litellm_params:
      model: hosted_vllm/model/xlm-roberta-base-mention-classifier
      api_base: http://localhost:8095

Steps to Reproduce

Register a vLLM classification model in model_list (e.g. XLM-RoBERTa)
Call GET /health — the model appears in unhealthy_endpoints
Call POST /vllm/classify with the same model — it works correctly

Expected Behavior

Health check should detect that the model's mode is classify and use the /classify endpoint instead of /chat/completions, OR skip the health check for classify models entirely.

Actual Behavior

Health check always calls /chat/completions, gets a 404 Not Found from vLLM, and marks the model as unhealthy:

litellm.NotFoundError: Hosted_vllmException - {"detail":"Not Found"}
httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://localhost:8095/chat/completions'

Workaround

Setting disable_health_check: true in general_settings does not resolve the issue in v1.82.3.

The model works correctly in production — this is a false negative in the health check only.

Related Issues

#11205 — Feature request for vLLM classify endpoint support (implemented in production via /vllm/classify, but health check not updated accordingly)

extent analysis

Fix Plan

To resolve the issue, we need to update the health check to use the correct endpoint for classification models. We can achieve this by adding a conditional check to determine the model type and use the corresponding endpoint.

Step-by-Step Solution

Update the health check code to include a conditional statement that checks the model type.
If the model is a classification model, use the /classify endpoint instead of /chat/completions.

Example code snippet:

if model_type == "classification":
    health_check_url = f"{api_base}/vllm/classify"
else:
    health_check_url = f"{api_base}/chat/completions"

Update the model_list configuration to include the model type.

model_list:
  - model_name: model/xlm-roberta-base-mention-classifier
    model_type: classification
    litellm_params:
      model: hosted_vllm/model/xlm-roberta-base-mention-classifier
      api_base: http://localhost:8095

Verification

To verify that the fix worked, call the GET /health endpoint and check that the model is no longer reported as unhealthy. Additionally, test the POST /vllm/classify endpoint to ensure that it still works correctly.

Extra Tips

Make sure to update the documentation to reflect the changes to the health check and model configuration.
Consider adding a feature flag to enable or disable the health check for classification models.
Review related issues, such as #11205, to ensure that the fix is consistent with the overall architecture and design of the system.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #configuration error #environment variable #network issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Health check fails for vLLM classify models — always reports unhealthy [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Description

LiteLLM Version

Config

Steps to Reproduce

Expected Behavior

Actual Behavior

Workaround

Related Issues

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Health check fails for vLLM classify models — always reports unhealthy [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Description

LiteLLM Version

Config

Steps to Reproduce

Expected Behavior

Actual Behavior

Workaround

Related Issues

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING