vllm - ✅(Solved) Fix [Feature]: Improve Error Message When Max Length Reached With Tool Call=Required [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36794Fetched 2026-04-08 00:34:38
View on GitHub
Comments
2
Participants
2
Timeline
12
Reactions
0
Assignees
Timeline (top)
commented ×2cross-referenced ×2subscribed ×2assigned ×1

Error Message

ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[log_extra_fields()]]\n Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value='[\n\n{\n "name": "fin...cosmetic). Although the', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}

Root Cause

Hey, not quite sure if to put it under feature request or bug report. Anyway here is my setup:

  • tool for structured output, tool_choice="required" , because I only want structured output
  • thinking_enabled
  • qwen3.5-397B-gptq-int4 (but happens with any qwen3.5)
  • max_len=4000
  • vllm 0.17.0 What happens is that qwen3.5 thinks so much that it exhaust the 4k max len, you get a stop_reason=length and it doesn't manage to call the tool. This results in the following error message:
ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[__log_extra_fields__()]]\n  Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value=\'[\\n\\n{\\n    "name": "fin...cosmetic). Although the\', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}

Fix Action

Fixed

PR fix notes

PR #36841: [Bugfix] Fix crash when tool_choice=required exceeds max_tokens

Description (problem / solution / changelog)

<!-- markdownlint-disable -->

Purpose

FIX https://github.com/vllm-project/vllm/issues/36794

Test Plan

see e2e

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/entrypoints/openai/test_completion_with_function_calling.py (modified, +24/-0)
  • vllm/entrypoints/openai/chat_completion/serving.py (modified, +1/-1)
  • vllm/entrypoints/openai/engine/serving.py (modified, +11/-8)

Code Example

ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[__log_extra_fields__()]]\n  Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value=\'[\\n\\n{\\n    "name": "fin...cosmetic). Although the\', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Hey, not quite sure if to put it under feature request or bug report. Anyway here is my setup:

  • tool for structured output, tool_choice="required" , because I only want structured output
  • thinking_enabled
  • qwen3.5-397B-gptq-int4 (but happens with any qwen3.5)
  • max_len=4000
  • vllm 0.17.0 What happens is that qwen3.5 thinks so much that it exhaust the 4k max len, you get a stop_reason=length and it doesn't manage to call the tool. This results in the following error message:
ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[__log_extra_fields__()]]\n  Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value=\'[\\n\\n{\\n    "name": "fin...cosmetic). Although the\', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}

What fixes this is increasing the max_len to an arbitrary high number, but it took me quite some time to figure out what was happening and it wasn't because of the error message. The error above is generated by this line: https://github.com/vllm-project/vllm/blob/f3163bba6729b7bfd1e355f8b7f6670a6beb4715/vllm/entrypoints/openai/engine/serving.py#L1129 since the content is just thinking content without even a </think> token to end it.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To resolve the issue, we need to increase the max_len parameter to a higher value to accommodate the thinking content generated by the model. Here are the steps:

  • Increase the max_len parameter to a higher value, e.g., max_len=8000 or more, depending on the specific use case.
  • Alternatively, consider implementing a more dynamic approach to handle the thinking content, such as:
if tool_choice == "required" and thinking_enabled:
    max_len = 8000  # or a higher value
  • If using a configuration file, update the max_len value accordingly.

Verification

To verify that the fix worked, run the model with the increased max_len value and check for the following:

  • The model should be able to generate the thinking content without exceeding the max_len limit.
  • The stop_reason should not be length.
  • The tool should be called successfully, and the output should be in the expected format.

Extra Tips

  • Consider adding a check to handle cases where the thinking content exceeds the max_len limit, to prevent similar issues in the future.
  • Review the model's configuration and adjust the max_len value as needed to balance between performance and output quality.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING