litellm - ✅(Solved) Fix [Feature]: Allow disabling reasoning for model health checks [1 pull requests, 1 participants]

litellm2026-04-08 14:41:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25349•Fetched 2026-04-09 07:52:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Daan-Kruijs

Participants

Daan-Kruijs

Timeline (top)

labeled ×3

PR fix notes

PR #22299: Litellm health check tokens

Repository: BerriAI/litellm
Author: Harshit28j
State: closed | merged: True
Link: https://github.com/BerriAI/litellm/pull/22299

Description (problem / solution / changelog)

Relevant issues

Address health check token overconsumption by introducing a configurable limit and a sensible default for non-wildcard models.

Pre-Submission checklist

I have Added testing in the tests/litellm/ directory (Added tests/test_litellm/proxy/test_health_check_max_tokens.py)
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai

CI (LiteLLM team)

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature

Changes

Optimization: Updated litellm/proxy/health_check.py to default max_tokens: 1 for standard health checks. This prevents health checks (like Azure OpenAI) from generating long responses, saving cost and reducing latency.
New Config Setting: Introduced health_check_max_tokens in model_info. Users can now explicitly set the token limit for health checks in their config.yaml.
Wildcard Alignment: Updated litellm/litellm_core_utils/health_check_helpers.py to respect the configurable token limit while maintaining a safe default of 10 for wildcard-route models.
Testing: Added unit tests in tests/test_litellm/proxy/test_health_check_max_tokens.py covering default behavior, custom overrides, and wildcard routing safety.

Changed files

docs/my-website/docs/proxy/health.md (modified, +16/-0)
litellm/litellm_core_utils/health_check_helpers.py (modified, +5/-4)
litellm/proxy/health_check.py (modified, +11/-1)
poetry.lock (modified, +4/-4)
tests/test_litellm/proxy/test_health_check_max_tokens.py (added, +75/-0)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

Create a config entry that allows disabling reasoning for LLM health checks.

Motivation, pitch

Based on my default prompt of "Health check: respond ONLY 'ok' to confirm you are healthy.", Gemini correctly responds "ok" but I see a spend of ~200 additional reasoning tokens. This is wasteful, and for now we are vastly reducing the frequency of our health checks purely for this reason.

In https://github.com/BerriAI/litellm/pull/22299 I see that limiting max tokens is an issue, but this seems to have other unwanted side effects: https://github.com/BerriAI/litellm/issues/23836. For our use case, simply having LiteLLM pass a request to turn reasoning off for that specific API call would be far simpler and more predictable.

What part of LiteLLM is this about?

Proxy

extent analysis

TL;DR

Add a configuration option to disable reasoning for specific API calls, such as LLM health checks, to reduce unnecessary token spend.

Guidance

Identify the specific API call for LLM health checks and determine the requirements for disabling reasoning.
Investigate the existing codebase, particularly the Proxy component, to find a suitable location for adding the configuration option.
Consider adding a flag or parameter to the API call that indicates whether reasoning should be enabled or disabled.
Evaluate the potential impact of disabling reasoning on the overall functionality and accuracy of the LLM.

Example

No code snippet is provided due to lack of specific implementation details.

Notes

The solution may require modifications to the Proxy component and potentially other parts of the codebase. The exact implementation will depend on the existing architecture and design of LiteLLM.

Recommendation

Apply workaround: Add a configuration option to disable reasoning for specific API calls, as this seems to be a more targeted and predictable solution for the described use case.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #chain error #conversation history #tool integration #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Feature]: Allow disabling reasoning for model health checks [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #22299: Litellm health check tokens

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Changed files

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Feature]: Allow disabling reasoning for model health checks [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #22299: Litellm health check tokens

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Changed files

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING