vllm - ✅(Solved) Fix [Feature]: Improve Error Message When Max Length Reached With Tool Call=Required [1 pull requests, 2 comments, 2 participants]

vllm2026-03-11 14:24:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36794•Fetched 2026-04-08 00:34:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

commented ×2cross-referenced ×2subscribed ×2assigned ×1

Error Message

ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[log_extra_fields()]]\n Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value='[\n\n{\n "name": "fin...cosmetic). Although the', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}

Root Cause

Hey, not quite sure if to put it under feature request or bug report. Anyway here is my setup:

tool for structured output, tool_choice="required" , because I only want structured output
thinking_enabled
qwen3.5-397B-gptq-int4 (but happens with any qwen3.5)
max_len=4000
vllm 0.17.0 What happens is that qwen3.5 thinks so much that it exhaust the 4k max len, you get a stop_reason=length and it doesn't manage to call the tool. This results in the following error message:

ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[__log_extra_fields__()]]\n  Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value=\'[\\n\\n{\\n    "name": "fin...cosmetic). Although the\', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}

Fix Action

Fixed

Fixed by PR: [Bugfix] Fix crash when tool_choice=required exceeds max_tokens (https://github.com/vllm-project/vllm/pull/36841)

PR fix notes

PR #36841: [Bugfix] Fix crash when tool_choice=required exceeds max_tokens

Repository: vllm-project/vllm
Author: chaunceyjiang
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/36841

Description (problem / solution / changelog)

Purpose

FIX https://github.com/vllm-project/vllm/issues/36794

Test Plan

see e2e

Test Result

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

tests/entrypoints/openai/test_completion_with_function_calling.py (modified, +24/-0)
vllm/entrypoints/openai/chat_completion/serving.py (modified, +1/-1)
vllm/entrypoints/openai/engine/serving.py (modified, +11/-8)

Code Example

ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[__log_extra_fields__()]]\n  Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value=\'[\\n\\n{\\n    "name": "fin...cosmetic). Although the\', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}

RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Hey, not quite sure if to put it under feature request or bug report. Anyway here is my setup:

tool for structured output, tool_choice="required" , because I only want structured output
thinking_enabled
qwen3.5-397B-gptq-int4 (but happens with any qwen3.5)
max_len=4000
vllm 0.17.0 What happens is that qwen3.5 thinks so much that it exhaust the 4k max len, you get a stop_reason=length and it doesn't manage to call the tool. This results in the following error message:

ModelHTTPError: status_code: 400, model_name: ai_model, body: {'message': '1 validation error for list[function-wrap[__log_extra_fields__()]]\n  Invalid JSON: EOF while parsing a string at line 6 column 952 [type=json_invalid, input_value=\'[\\n\\n{\\n    "name": "fin...cosmetic). Although the\', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/json_invalid', 'type': 'BadRequestError', 'param': None, 'code': 400}

What fixes this is increasing the max_len to an arbitrary high number, but it took me quite some time to figure out what was happening and it wasn't because of the error message. The error above is generated by this line: https://github.com/vllm-project/vllm/blob/f3163bba6729b7bfd1e355f8b7f6670a6beb4715/vllm/entrypoints/openai/engine/serving.py#L1129 since the content is just thinking content without even a </think> token to end it.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To resolve the issue, we need to increase the max_len parameter to a higher value to accommodate the thinking content generated by the model. Here are the steps:

Increase the max_len parameter to a higher value, e.g., max_len=8000 or more, depending on the specific use case.
Alternatively, consider implementing a more dynamic approach to handle the thinking content, such as:

if tool_choice == "required" and thinking_enabled:
    max_len = 8000  # or a higher value

If using a configuration file, update the max_len value accordingly.

Verification

To verify that the fix worked, run the model with the increased max_len value and check for the following:

The model should be able to generate the thinking content without exceeding the max_len limit.
The stop_reason should not be length.
The tool should be called successfully, and the output should be in the expected format.

Extra Tips

Consider adding a check to handle cases where the thinking content exceeds the max_len limit, to prevent similar issues in the future.
Review the model's configuration and adjust the max_len value as needed to balance between performance and output quality.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #memory optimization #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Feature]: Improve Error Message When Max Length Reached With Tool Call=Required [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #36841: [Bugfix] Fix crash when tool_choice=required exceeds max_tokens

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Feature]: Improve Error Message When Max Length Reached With Tool Call=Required [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #36841: [Bugfix] Fix crash when tool_choice=required exceeds max_tokens

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING