Health checks should succeed for all supported models. Possible fixes: 1. increase default value 2. Provider-specific defaults

litellm - ✅(Solved) Fix [Bug]: Health checks use max_completion_tokens=1, causing failures for GPT-5 models [1 pull requests, 5 comments, 6 participants]

litellm2026-03-17 08:55:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23836•Fetched 2026-04-08 00:48:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×5labeled ×3cross-referenced ×1referenced ×1

Error Message

Health checks fail for models such as:

Root Cause

OpenAIException - Could not finish the message because max_tokens or model output limit was reached

PR fix notes

PR #22299: Litellm health check tokens

Repository: BerriAI/litellm
Author: Harshit28j
State: closed | merged: True
Link: https://github.com/BerriAI/litellm/pull/22299

Description (problem / solution / changelog)

Relevant issues

Address health check token overconsumption by introducing a configurable limit and a sensible default for non-wildcard models.

Pre-Submission checklist

I have Added testing in the tests/litellm/ directory (Added tests/test_litellm/proxy/test_health_check_max_tokens.py)
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai

CI (LiteLLM team)

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature

Changes

Optimization: Updated litellm/proxy/health_check.py to default max_tokens: 1 for standard health checks. This prevents health checks (like Azure OpenAI) from generating long responses, saving cost and reducing latency.
New Config Setting: Introduced health_check_max_tokens in model_info. Users can now explicitly set the token limit for health checks in their config.yaml.
Wildcard Alignment: Updated litellm/litellm_core_utils/health_check_helpers.py to respect the configurable token limit while maintaining a safe default of 10 for wildcard-route models.
Testing: Added unit tests in tests/test_litellm/proxy/test_health_check_max_tokens.py covering default behavior, custom overrides, and wildcard routing safety.

Changed files

docs/my-website/docs/proxy/health.md (modified, +16/-0)
litellm/litellm_core_utils/health_check_helpers.py (modified, +5/-4)
litellm/proxy/health_check.py (modified, +11/-1)
poetry.lock (modified, +4/-4)
tests/test_litellm/proxy/test_health_check_max_tokens.py (added, +75/-0)

Code Example

OpenAIException - Could not finish the message because max_tokens or model output limit was reached

---

{
  "model": "gpt-5-nano",
  "messages": [
    {"role": "user", "content": "ping"}
  ],
  "max_completion_tokens": 1
}

---

Error:
OpenAIException - { "error": { "message": "Invalid 'max_output_tokens': integer below minimum value. Expected a value >= 16, but got 1 instead.", "type": "invalid_request_error", "param": "max_output_tokens", "code": "integer_below_min_value" } }



curl -X POST \
  https://api.openai.com/v1/responses \
  -H 'Authorization: ***********' \
  -H 'Content-Type: application/json' \
  -d '{
  {
    "model": "gpt-5.1-codex-max",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "Hey how's it going?"
          }
        ]
      }
    ],
    "max_output_tokens": 1
  }
  }'

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The following PR introduced max_completion_tokens in health check calls with a default value of 1:

https://github.com/BerriAI/litellm/pull/22299

For some providers and models (e.g. OpenAI GPT-5 family), this value is below the minimum practical generation size accepted by the provider.

As a result:

the "Test connection" button in the UI fails
the automatic health checks mark the model as unhealthy
the error returned by the provider is typically:

OpenAIException - Could not finish the message because max_tokens or model output limit was reached

Example failing request generated by LiteLLM:

{
  "model": "gpt-5-nano",
  "messages": [
    {"role": "user", "content": "ping"}
  ],
  "max_completion_tokens": 1
}

Currently, the solution is to add in each affected model the model specific info "health_check_max_tokens": 16 (or the desired value)

The "test connection" button remains broken when adding new models.

Observed behavior

Health checks fail for models such as:

gpt-5-nano
gpt-5-mini
gpt-5.1
gpt-5.2
gpt-5.1-codex
gpt-5.2-codex
gpt-5.3-codex
possibly other providers with similar constraints

This causes the proxy to incorrectly report the model as unhealthy, even though the model works normally with regular requests.

Expected behavior

Health checks should succeed for all supported models.

Possible fixes:

increase default value
Provider-specific defaults

Steps to Reproduce

add a new model
choose openai as provider
choose gpt-5.1-codex-max as model
add credentials
click "test connect"

Relevant log output

Error:
OpenAIException - { "error": { "message": "Invalid 'max_output_tokens': integer below minimum value. Expected a value >= 16, but got 1 instead.", "type": "invalid_request_error", "param": "max_output_tokens", "code": "integer_below_min_value" } }



curl -X POST \
  https://api.openai.com/v1/responses \
  -H 'Authorization: ***********' \
  -H 'Content-Type: application/json' \
  -d '{
  {
    "model": "gpt-5.1-codex-max",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "Hey how's it going?"
          }
        ]
      }
    ],
    "max_output_tokens": 1
  }
  }'

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue, we will implement provider-specific defaults for the max_completion_tokens parameter in health check calls.

Step-by-Step Solution:

Update the default value: Increase the default value of max_completion_tokens to a minimum of 16 for OpenAI providers.
Add provider-specific defaults: Introduce a configuration option to set provider-specific defaults for max_completion_tokens.
Update the health check logic: Modify the health check logic to use the provider-specific default value if available, otherwise use the increased default value.

Example Code:

# Define provider-specific defaults
provider_defaults = {
    'openai': 16,
    # Add other providers as needed
}

# Update the health check logic
def get_max_completion_tokens(provider):
    return provider_defaults.get(provider, 16)

# Example usage:
provider = 'openai'
max_completion_tokens = get_max_completion_tokens(provider)
print(max_completion_tokens)  # Output: 16

Verification

To verify the fix, follow these steps:

Update the max_completion_tokens default value and add provider-specific defaults.
Restart the LiteLLM service.
Test the "Test connection" button for affected models (e.g., gpt-5-nano, gpt-5.1-codex).
Verify that health checks succeed for all supported models.

Extra Tips

Monitor the logs for any errors related to max_completion_tokens and adjust the default values as needed.
Consider adding a configuration option to allow users to override the provider-specific defaults.
Review the documentation to ensure that the updated default values and provider-specific defaults are properly documented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Health checks should succeed for all supported models.

Possible fixes:

increase default value
Provider-specific defaults

#api #ssr #installation #tensor shape #autograd error #network issue #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: Health checks use max_completion_tokens=1, causing failures for GPT-5 models [1 pull requests, 5 comments, 6 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #22299: Litellm health check tokens

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Changed files

Code Example

Check for existing issues

What happened?

Observed behavior

Expected behavior

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Step-by-Step Solution:

Example Code:

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING