openclaw - 💡(How to fix) Fix [Feature]: Support qwen-chat-template when hosted from vLLM -- accepted by schema but ignored -- blocks audio workflows because of thinking time. [1 participants]

openclaw2026-04-26 18:59:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72329•Fetched 2026-04-27 05:31:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

stavrostzagadouris

Participants

stavrostzagadouris

Timeline (top)

labeled ×1

Expected Behavior When a self-hosted Qwen model has compat.thinkingFormat: "qwen-chat-template", OpenClaw should inject chat_template_kwargs: { enable_thinking: <bool> } into the chat completions request body, matching the official Qwen + vLLM documentation.

Actual Behavior buildOpenAICompletionsParams in provider-stream-COLujAAo.js only handles two thinking formats:

"openrouter" → injects params.reasoning = { effort: ... } Default (OpenAI) → injects params.reasoning_effort = ... There is no branch for "qwen-chat-template", so the value is silently ignored.

Root Cause In provider-stream-COLujAAo.js:

if (compat.thinkingFormat === "openrouter" && model.reasoning && resolvedCompletionsReasoningEffort) params.reasoning = { effort: resolvedCompletionsReasoningEffort }; else if (resolvedCompletionsReasoningEffort && model.reasoning && compat.supportsReasoningEffort) params.reasoning_effort = resolvedCompletionsReasoningEffort; // No handling for "qwen-chat-template"

The Zod schema (zod-schema.core-BO_PdpIg.js) accepts "qwen-chat-template" as a valid thinkingFormat, and pi-ai internally has a code path that handles it, but OpenClaw's own request builder never reaches that code or injects the necessary fields.

Reproduction

Serve Qwen/Qwen3.6-27B via vLLM on a self-hosted endpoint
Configure OpenClaw provider: "5090voice": { "baseUrl": "http://<vllm-host>:1234/v1", "apiKey": "notneeded", "api": "openai-completions", "models": [{ "id": "qwen/qwen3.6-27b", "reasoning": true, "compat": { "thinkingFormat": "qwen-chat-template" } }] }
Use the model — model still generates thinking tokens

Environment, OpenClaw version: 2026.4.9, vLLM version: 0.19.1rc1.dev278+ge64b39ea7, Model: Qwen/Qwen3.6-27B, Hardware: RTX 5090, 32GB VRAM

Root Cause

Root Cause In provider-stream-COLujAAo.js:

RAW_BUFFERClick to expand / collapse

Summary

Actual Behavior buildOpenAICompletionsParams in provider-stream-COLujAAo.js only handles two thinking formats:

"openrouter" → injects params.reasoning = { effort: ... } Default (OpenAI) → injects params.reasoning_effort = ... There is no branch for "qwen-chat-template", so the value is silently ignored.

Root Cause In provider-stream-COLujAAo.js:

Reproduction

Serve Qwen/Qwen3.6-27B via vLLM on a self-hosted endpoint
Configure OpenClaw provider: "5090voice": { "baseUrl": "http://<vllm-host>:1234/v1", "apiKey": "notneeded", "api": "openai-completions", "models": [{ "id": "qwen/qwen3.6-27b", "reasoning": true, "compat": { "thinkingFormat": "qwen-chat-template" } }] }
Use the model — model still generates thinking tokens

Environment, OpenClaw version: 2026.4.9, vLLM version: 0.19.1rc1.dev278+ge64b39ea7, Model: Qwen/Qwen3.6-27B, Hardware: RTX 5090, 32GB VRAM

Problem to solve

Allow qwen models hosted on vllm to respect the /think flag in discord and other channels.

Proposed solution

Add a branch in buildOpenAICompletionsParams to handle qwen-chat-template:

else if (compat.thinkingFormat === "qwen-chat-template" && model.reasoning) { params.chat_template_kwargs = { enable_thinking: !!resolvedCompletionsReasoningEffort, preserve_thinking: true }; }

Alternatively, the documented extra_body / extraBody parameter in model params should merge arbitrary fields into the outbound request body, but that feature does not appear to be implemented either (not referenced in the built code).

Alternatives considered

No response

Impact

Anyone using openclaw with a qwen model hosted on vllm. blocks any sort of voice workflows as the thinking forced on makes realtime voice impossible always can't use openclaw and voice with this setup.

Evidence/examples

See example above on how to reproduce, only openrouter and openai are compatible, not qwen.

Additional information

Thanks :)

extent analysis

TL;DR

Add a branch in buildOpenAICompletionsParams to handle "qwen-chat-template" thinking format and inject chat_template_kwargs into the request body.

Guidance

Verify that the compat.thinkingFormat is correctly set to "qwen-chat-template" in the model configuration.
Add the proposed branch in buildOpenAICompletionsParams to handle the "qwen-chat-template" thinking format.
Test the updated code with the provided reproduction steps to ensure the chat_template_kwargs are correctly injected into the request body.
If the extra_body/extraBody parameter is intended to merge arbitrary fields into the outbound request body, consider implementing this feature as an alternative solution.

Example

else if (compat.thinkingFormat === "qwen-chat-template" && model.reasoning) {
    params.chat_template_kwargs = {
        enable_thinking: !!resolvedCompletionsReasoningEffort,
        preserve_thinking: true
    };
}

Notes

The proposed solution assumes that the resolvedCompletionsReasoningEffort variable is correctly set and available in the scope of the buildOpenAICompletionsParams function.

Recommendation

Apply the proposed workaround by adding the branch in buildOpenAICompletionsParams to handle the "qwen-chat-template" thinking format, as this is the most straightforward solution to address the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Feature]: Support qwen-chat-template when hosted from vLLM -- accepted by schema but ignored -- blocks audio workflows because of thinking time. [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Support qwen-chat-template when hosted from vLLM -- accepted by schema but ignored -- blocks audio workflows because of thinking time. [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING