openclaw - 💡(How to fix) Fix [Feature]: Support qwen-chat-template when hosted from vLLM -- accepted by schema but ignored -- blocks audio workflows because of thinking time. [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72329Fetched 2026-04-27 05:31:30
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Expected Behavior When a self-hosted Qwen model has compat.thinkingFormat: "qwen-chat-template", OpenClaw should inject chat_template_kwargs: { enable_thinking: <bool> } into the chat completions request body, matching the official Qwen + vLLM documentation.

Actual Behavior buildOpenAICompletionsParams in provider-stream-COLujAAo.js only handles two thinking formats:

"openrouter" → injects params.reasoning = { effort: ... } Default (OpenAI) → injects params.reasoning_effort = ... There is no branch for "qwen-chat-template", so the value is silently ignored.

Root Cause In provider-stream-COLujAAo.js:

if (compat.thinkingFormat === "openrouter" && model.reasoning && resolvedCompletionsReasoningEffort) params.reasoning = { effort: resolvedCompletionsReasoningEffort }; else if (resolvedCompletionsReasoningEffort && model.reasoning && compat.supportsReasoningEffort) params.reasoning_effort = resolvedCompletionsReasoningEffort; // No handling for "qwen-chat-template"

The Zod schema (zod-schema.core-BO_PdpIg.js) accepts "qwen-chat-template" as a valid thinkingFormat, and pi-ai internally has a code path that handles it, but OpenClaw's own request builder never reaches that code or injects the necessary fields.

Reproduction

  1. Serve Qwen/Qwen3.6-27B via vLLM on a self-hosted endpoint
  2. Configure OpenClaw provider: "5090voice": { "baseUrl": "http://<vllm-host>:1234/v1", "apiKey": "notneeded", "api": "openai-completions", "models": [{ "id": "qwen/qwen3.6-27b", "reasoning": true, "compat": { "thinkingFormat": "qwen-chat-template" } }] }
  3. Use the model — model still generates thinking tokens

Environment, OpenClaw version: 2026.4.9, vLLM version: 0.19.1rc1.dev278+ge64b39ea7, Model: Qwen/Qwen3.6-27B, Hardware: RTX 5090, 32GB VRAM

Root Cause

Root Cause In provider-stream-COLujAAo.js:

RAW_BUFFERClick to expand / collapse

Summary

Expected Behavior When a self-hosted Qwen model has compat.thinkingFormat: "qwen-chat-template", OpenClaw should inject chat_template_kwargs: { enable_thinking: <bool> } into the chat completions request body, matching the official Qwen + vLLM documentation.

Actual Behavior buildOpenAICompletionsParams in provider-stream-COLujAAo.js only handles two thinking formats:

"openrouter" → injects params.reasoning = { effort: ... } Default (OpenAI) → injects params.reasoning_effort = ... There is no branch for "qwen-chat-template", so the value is silently ignored.

Root Cause In provider-stream-COLujAAo.js:

if (compat.thinkingFormat === "openrouter" && model.reasoning && resolvedCompletionsReasoningEffort) params.reasoning = { effort: resolvedCompletionsReasoningEffort }; else if (resolvedCompletionsReasoningEffort && model.reasoning && compat.supportsReasoningEffort) params.reasoning_effort = resolvedCompletionsReasoningEffort; // No handling for "qwen-chat-template"

The Zod schema (zod-schema.core-BO_PdpIg.js) accepts "qwen-chat-template" as a valid thinkingFormat, and pi-ai internally has a code path that handles it, but OpenClaw's own request builder never reaches that code or injects the necessary fields.

Reproduction

  1. Serve Qwen/Qwen3.6-27B via vLLM on a self-hosted endpoint
  2. Configure OpenClaw provider: "5090voice": { "baseUrl": "http://<vllm-host>:1234/v1", "apiKey": "notneeded", "api": "openai-completions", "models": [{ "id": "qwen/qwen3.6-27b", "reasoning": true, "compat": { "thinkingFormat": "qwen-chat-template" } }] }
  3. Use the model — model still generates thinking tokens

Environment, OpenClaw version: 2026.4.9, vLLM version: 0.19.1rc1.dev278+ge64b39ea7, Model: Qwen/Qwen3.6-27B, Hardware: RTX 5090, 32GB VRAM

Problem to solve

Allow qwen models hosted on vllm to respect the /think flag in discord and other channels.

Proposed solution

Add a branch in buildOpenAICompletionsParams to handle qwen-chat-template:

else if (compat.thinkingFormat === "qwen-chat-template" && model.reasoning) { params.chat_template_kwargs = { enable_thinking: !!resolvedCompletionsReasoningEffort, preserve_thinking: true }; }

Alternatively, the documented extra_body / extraBody parameter in model params should merge arbitrary fields into the outbound request body, but that feature does not appear to be implemented either (not referenced in the built code).

Alternatives considered

No response

Impact

Anyone using openclaw with a qwen model hosted on vllm. blocks any sort of voice workflows as the thinking forced on makes realtime voice impossible always can't use openclaw and voice with this setup.

Evidence/examples

See example above on how to reproduce, only openrouter and openai are compatible, not qwen.

Additional information

Thanks :)

extent analysis

TL;DR

Add a branch in buildOpenAICompletionsParams to handle "qwen-chat-template" thinking format and inject chat_template_kwargs into the request body.

Guidance

  • Verify that the compat.thinkingFormat is correctly set to "qwen-chat-template" in the model configuration.
  • Add the proposed branch in buildOpenAICompletionsParams to handle the "qwen-chat-template" thinking format.
  • Test the updated code with the provided reproduction steps to ensure the chat_template_kwargs are correctly injected into the request body.
  • If the extra_body/extraBody parameter is intended to merge arbitrary fields into the outbound request body, consider implementing this feature as an alternative solution.

Example

else if (compat.thinkingFormat === "qwen-chat-template" && model.reasoning) {
    params.chat_template_kwargs = {
        enable_thinking: !!resolvedCompletionsReasoningEffort,
        preserve_thinking: true
    };
}

Notes

The proposed solution assumes that the resolvedCompletionsReasoningEffort variable is correctly set and available in the scope of the buildOpenAICompletionsParams function.

Recommendation

Apply the proposed workaround by adding the branch in buildOpenAICompletionsParams to handle the "qwen-chat-template" thinking format, as this is the most straightforward solution to address the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Support qwen-chat-template when hosted from vLLM -- accepted by schema but ignored -- blocks audio workflows because of thinking time. [1 participants]