vllm - 💡(How to fix) Fix [Bug]: qwen3.5 when enable response_format json_schema outputs garbled spaces [3 comments, 3 participants]

Your current environment

Environment
Use vllm docker images : vllm/vllm-openai:nightly ,the image id is db683b1d4dce and the create time is 2026-03-27 Service startup command vllm serve --model /data/deploy/Qwen3.5-9B/Qwen3.5-9B --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 --served-model-name qwen3.5-9b-test --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --max-model-len 32768 --limit-mm-per-prompt {"image":12}

🐛 Describe the bug

Prompt curl --location 'http://127.0.0.1:8000/v1/chat/completions'
--header 'Content-Type: application/json'
--data '{ "messages": [ { "role": "system", "content": "## 场景与角色\n\n这是一个AI监控场景。首先摄像头需要绑定CV算法（亦称“能力”）以进行事件检测，然后当事件发生时则会产生相应的告警。摄像头有名称、属地或属主，例如“xx摄像头”、“xx监控”、“银泰城的监控”。根据检测的内容，能力有相关的命名，如：人员入侵检测、烟火烟雾检测、占道经营检测等。事件发生时所产生的告警随之也有相应的类型名称，如：人员入侵、烟火烟雾等，同时事件还有时间、区域、所属摄像头、严重程度等属性。围绕AI监控场景，用户可能需要为摄像头绑定某种能力以进行事件检测，也可能需要查看某摄像头画面以了解当前情况，或者需要查看一些告警的详情，又或者需要对告警做统计分析。你负责结合对话上下文将当前用户输入分类到一个恰当的类别，并对该输入做必要改写使之能够独立完整明确地表达用户意图，以辅助下一步处理。具体地，分类如下：\n\n- 1代表事件统计分析，当用户目的是统计分析告警事件的数量、分布、趋势等情况时，属于此类。例如：“今天发生多少起xx事件”、“查询本周告警按地区（或类型）的分布情况”、“本月xx类事件的趋势”。\n- 2代表事件详情查询，当用户目的是查看告警事件的详情时，属于此类。例如：“查询今日的人员入侵”、“本周有哪些告警”、“显示xx摄像头的事件”。\n- 3代表调取监控，当用户目的是调取某摄像头的实时监控画面时，属于此类。例如：“查看大门口摄像头的画面”、“打开xx摄像头”、“调取xx摄像头”、“看下xx监控画面”。\n- 4代表创建与绑定能力，当用户目的是为摄像头创建和绑定能力时，属于此类。例如：“给门口的摄像头绑定人员入侵算法”、“xx摄像头启动对小女孩的检测”、“识别到蓝色轿车时，xx摄像头需要告警”，上述示例其实质是为某个/某些摄像头绑定某种能力，以开启相应事件的检测。\n- 5代表其它。当前用户输入可能不是一个合理有效的句子（结构混乱、语义不清、没有提供有效的信息（如“啊”“哈”）且结合上下文无法理解），或者虽然含义清楚，但不属于上述已定义的1~4有效类别，均应判为此类。\n对于类别1、2，请仔细判断，若用户意图未提及数量、分布、趋势等含义或者在两者间不易区分时，应理解为查询事件详情，即判定为类别2。\n摄像头的名称是开放的，通常会用属地或属主命名，也可能使用所绑定的能力进行命名。\n\n## 输出规则\n\n严格输出JSON对象，不要输出任何解释文字，不使用 markdown，不要加代码块等标记，只输出单纯的JSON对象，确保能够被JSON库解析。\n输出样例：\n{\n"questionType": "1", //1代表事件统计分析，2代表事件详情查询，3代表调取监控，4代表创建与绑定能力，5代表其它。\n"rewritingQuestion": "改写结果"\n}\n改写的注意事项：\n\n1. 改写结果应该语义完整明确，不依赖上下文就能够独立表达清楚用户当前的意图，并且不丢失相关信息，比如用户要绑定能力，那么为哪个/哪些摄像头绑定怎样的能力必须描述清楚。\n2. 改写结果应该仍是以用户口吻发出的。\n3. 当questionType取5时，rewritingQuestion字段不需要存在，即此时无需改写。" }, { "content": [ { "type": "text", "text": "465564867" } ], "role": "user" } ], "model": "qwen3.5-9b-test", "max_tokens": 1000, "stream": false, "temperature": 0.1, "top_p": 0.95, "presence_penalty": 1.05, "response_format": { "type": "json_schema", "json_schema": { "name": "rewritingQuestion", "schema": { "additionalProperties": false, "type": "object", "title": "QuestionResult", "required": [ "questionType" ], "properties": { "questionType": { "type": "string", "description": "问题类型" }, "rewritingQuestion": { "type": "string", "description": "改写后的问题" } } }, "strict": true } }, "chat_template_kwargs": { "enable_thinking": false } }' Problem description When I modify the user prompt word arbitrarily and call it several times(usually ten or more times), the model returns blank characters and stops outputting when the set length is reached

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The model may be experiencing an issue with handling variable-length user prompts, resulting in blank characters and premature output termination when the set length is reached.

Guidance

Verify that the max_tokens parameter is set to a sufficient value to accommodate the expected output length.
Check the model's configuration to ensure that it is properly handling variable-length input sequences.
Consider implementing a retry mechanism or increasing the timeout value to handle cases where the model takes longer to respond.
Review the model's training data to ensure that it includes a diverse range of input lengths and formats.

Example

No code snippet is provided as the issue seems to be related to the model's configuration or training data.

Notes

The issue may be specific to the qwen3.5-9b-test model or the vllm/vllm-openai:nightly image. Further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by increasing the max_tokens parameter or implementing a retry mechanism to handle cases where the model takes longer to respond. This may help mitigate the issue until a more permanent fix can be implemented.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: qwen3.5 when enable response_format json_schema outputs garbled spaces [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: qwen3.5 when enable response_format json_schema outputs garbled spaces [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING