litellm - ✅(Solved) Fix [Bug]: Health checks use max_completion_tokens=1, causing failures for GPT-5 models [1 pull requests, 5 comments, 6 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23836Fetched 2026-04-08 00:48:58
View on GitHub
Comments
5
Participants
6
Timeline
11
Reactions
6
Timeline (top)
commented ×5labeled ×3cross-referenced ×1referenced ×1

Error Message

Health checks fail for models such as:

Root Cause

OpenAIException - Could not finish the message because max_tokens or model output limit was reached

PR fix notes

PR #22299: Litellm health check tokens

Description (problem / solution / changelog)

Relevant issues

Address health check token overconsumption by introducing a configurable limit and a sensible default for non-wildcard models.

Pre-Submission checklist

  • I have Added testing in the tests/litellm/ directory (Added tests/test_litellm/proxy/test_health_check_max_tokens.py)
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai

CI (LiteLLM team)

  • Branch creation CI run
    Link:
  • CI run for the last commit
    Link:
  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

  • Optimization: Updated litellm/proxy/health_check.py to default max_tokens: 1 for standard health checks. This prevents health checks (like Azure OpenAI) from generating long responses, saving cost and reducing latency.
  • New Config Setting: Introduced health_check_max_tokens in model_info. Users can now explicitly set the token limit for health checks in their config.yaml.
  • Wildcard Alignment: Updated litellm/litellm_core_utils/health_check_helpers.py to respect the configurable token limit while maintaining a safe default of 10 for wildcard-route models.
  • Testing: Added unit tests in tests/test_litellm/proxy/test_health_check_max_tokens.py covering default behavior, custom overrides, and wildcard routing safety.

Changed files

  • docs/my-website/docs/proxy/health.md (modified, +16/-0)
  • litellm/litellm_core_utils/health_check_helpers.py (modified, +5/-4)
  • litellm/proxy/health_check.py (modified, +11/-1)
  • poetry.lock (modified, +4/-4)
  • tests/test_litellm/proxy/test_health_check_max_tokens.py (added, +75/-0)

Code Example

OpenAIException - Could not finish the message because max_tokens or model output limit was reached

---

{
  "model": "gpt-5-nano",
  "messages": [
    {"role": "user", "content": "ping"}
  ],
  "max_completion_tokens": 1
}

---

Error:
OpenAIException - { "error": { "message": "Invalid 'max_output_tokens': integer below minimum value. Expected a value >= 16, but got 1 instead.", "type": "invalid_request_error", "param": "max_output_tokens", "code": "integer_below_min_value" } }



curl -X POST \
  https://api.openai.com/v1/responses \
  -H 'Authorization: ***********' \
  -H 'Content-Type: application/json' \
  -d '{
  {
    "model": "gpt-5.1-codex-max",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "Hey how's it going?"
          }
        ]
      }
    ],
    "max_output_tokens": 1
  }
  }'
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The following PR introduced max_completion_tokens in health check calls with a default value of 1:

https://github.com/BerriAI/litellm/pull/22299

For some providers and models (e.g. OpenAI GPT-5 family), this value is below the minimum practical generation size accepted by the provider.

As a result:

  • the "Test connection" button in the UI fails
  • the automatic health checks mark the model as unhealthy
  • the error returned by the provider is typically:
OpenAIException - Could not finish the message because max_tokens or model output limit was reached

Example failing request generated by LiteLLM:

{
  "model": "gpt-5-nano",
  "messages": [
    {"role": "user", "content": "ping"}
  ],
  "max_completion_tokens": 1
}

Currently, the solution is to add in each affected model the model specific info "health_check_max_tokens": 16 (or the desired value)

The "test connection" button remains broken when adding new models.


Observed behavior

Health checks fail for models such as:

  • gpt-5-nano
  • gpt-5-mini
  • gpt-5.1
  • gpt-5.2
  • gpt-5.1-codex
  • gpt-5.2-codex
  • gpt-5.3-codex
  • possibly other providers with similar constraints

This causes the proxy to incorrectly report the model as unhealthy, even though the model works normally with regular requests.


Expected behavior

Health checks should succeed for all supported models.

Possible fixes:

  1. increase default value
  2. Provider-specific defaults

Steps to Reproduce

  1. add a new model
  2. choose openai as provider
  3. choose gpt-5.1-codex-max as model
  4. add credentials
  5. click "test connect"

Relevant log output

Error:
OpenAIException - { "error": { "message": "Invalid 'max_output_tokens': integer below minimum value. Expected a value >= 16, but got 1 instead.", "type": "invalid_request_error", "param": "max_output_tokens", "code": "integer_below_min_value" } }



curl -X POST \
  https://api.openai.com/v1/responses \
  -H 'Authorization: ***********' \
  -H 'Content-Type: application/json' \
  -d '{
  {
    "model": "gpt-5.1-codex-max",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "Hey how's it going?"
          }
        ]
      }
    ],
    "max_output_tokens": 1
  }
  }'

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue, we will implement provider-specific defaults for the max_completion_tokens parameter in health check calls.

Step-by-Step Solution:

  1. Update the default value: Increase the default value of max_completion_tokens to a minimum of 16 for OpenAI providers.
  2. Add provider-specific defaults: Introduce a configuration option to set provider-specific defaults for max_completion_tokens.
  3. Update the health check logic: Modify the health check logic to use the provider-specific default value if available, otherwise use the increased default value.

Example Code:

# Define provider-specific defaults
provider_defaults = {
    'openai': 16,
    # Add other providers as needed
}

# Update the health check logic
def get_max_completion_tokens(provider):
    return provider_defaults.get(provider, 16)

# Example usage:
provider = 'openai'
max_completion_tokens = get_max_completion_tokens(provider)
print(max_completion_tokens)  # Output: 16

Verification

To verify the fix, follow these steps:

  1. Update the max_completion_tokens default value and add provider-specific defaults.
  2. Restart the LiteLLM service.
  3. Test the "Test connection" button for affected models (e.g., gpt-5-nano, gpt-5.1-codex).
  4. Verify that health checks succeed for all supported models.

Extra Tips

  • Monitor the logs for any errors related to max_completion_tokens and adjust the default values as needed.
  • Consider adding a configuration option to allow users to override the provider-specific defaults.
  • Review the documentation to ensure that the updated default values and provider-specific defaults are properly documented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Health checks should succeed for all supported models.

Possible fixes:

  1. increase default value
  2. Provider-specific defaults

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Health checks use max_completion_tokens=1, causing failures for GPT-5 models [1 pull requests, 5 comments, 6 participants]