litellm - ✅(Solved) Fix [Feature]: Allow disabling reasoning for model health checks [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25349Fetched 2026-04-09 07:52:38
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

PR fix notes

PR #22299: Litellm health check tokens

Description (problem / solution / changelog)

Relevant issues

Address health check token overconsumption by introducing a configurable limit and a sensible default for non-wildcard models.

Pre-Submission checklist

  • I have Added testing in the tests/litellm/ directory (Added tests/test_litellm/proxy/test_health_check_max_tokens.py)
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai

CI (LiteLLM team)

  • Branch creation CI run
    Link:
  • CI run for the last commit
    Link:
  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

  • Optimization: Updated litellm/proxy/health_check.py to default max_tokens: 1 for standard health checks. This prevents health checks (like Azure OpenAI) from generating long responses, saving cost and reducing latency.
  • New Config Setting: Introduced health_check_max_tokens in model_info. Users can now explicitly set the token limit for health checks in their config.yaml.
  • Wildcard Alignment: Updated litellm/litellm_core_utils/health_check_helpers.py to respect the configurable token limit while maintaining a safe default of 10 for wildcard-route models.
  • Testing: Added unit tests in tests/test_litellm/proxy/test_health_check_max_tokens.py covering default behavior, custom overrides, and wildcard routing safety.

Changed files

  • docs/my-website/docs/proxy/health.md (modified, +16/-0)
  • litellm/litellm_core_utils/health_check_helpers.py (modified, +5/-4)
  • litellm/proxy/health_check.py (modified, +11/-1)
  • poetry.lock (modified, +4/-4)
  • tests/test_litellm/proxy/test_health_check_max_tokens.py (added, +75/-0)
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

Create a config entry that allows disabling reasoning for LLM health checks.

Motivation, pitch

Based on my default prompt of "Health check: respond ONLY 'ok' to confirm you are healthy.", Gemini correctly responds "ok" but I see a spend of ~200 additional reasoning tokens. This is wasteful, and for now we are vastly reducing the frequency of our health checks purely for this reason.

In https://github.com/BerriAI/litellm/pull/22299 I see that limiting max tokens is an issue, but this seems to have other unwanted side effects: https://github.com/BerriAI/litellm/issues/23836. For our use case, simply having LiteLLM pass a request to turn reasoning off for that specific API call would be far simpler and more predictable.

What part of LiteLLM is this about?

Proxy

extent analysis

TL;DR

Add a configuration option to disable reasoning for specific API calls, such as LLM health checks, to reduce unnecessary token spend.

Guidance

  • Identify the specific API call for LLM health checks and determine the requirements for disabling reasoning.
  • Investigate the existing codebase, particularly the Proxy component, to find a suitable location for adding the configuration option.
  • Consider adding a flag or parameter to the API call that indicates whether reasoning should be enabled or disabled.
  • Evaluate the potential impact of disabling reasoning on the overall functionality and accuracy of the LLM.

Example

No code snippet is provided due to lack of specific implementation details.

Notes

The solution may require modifications to the Proxy component and potentially other parts of the codebase. The exact implementation will depend on the existing architecture and design of LiteLLM.

Recommendation

Apply workaround: Add a configuration option to disable reasoning for specific API calls, as this seems to be a more targeted and predictable solution for the described use case.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING