openclaw - ✅(Solved) Fix feat: Add vLLM reasoning/thinking toggle support (enable_thinking parameter) [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54983Fetched 2026-04-08 01:33:55
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
2
Participants
Timeline (top)
cross-referenced ×2commented ×1

Fix Action

Fix / Workaround

Suggested: Pass chat_template_kwargs: { enable_thinking: boolean } in OpenAI-compatible requests, similar to how Claude reasoning is handled in sessions-patch.ts

PR fix notes

PR #56674: feat(openresponses): return reasoning/thinking content in /v1/responses output

Description (problem / solution / changelog)

Summary

  • Problem: The /v1/responses endpoint does not return model reasoning/thinking content. When models produce chain-of-thought reasoning (Anthropic Claude, OpenAI o-series, Gemini thinking, Ollama thinking models), the reasoning output is silently discarded by the HTTP API. Callers have no way to observe model reasoning through the OpenResponses interface.
  • Why it matters: The Open Responses spec defines type: "reasoning" as a first-class output item (ReasoningBody) and response.reasoning.delta/response.reasoning.done as streaming events. Not implementing these means OpenClaw's OpenResponses endpoint is incomplete relative to the spec, and API callers (dashboards, orchestrators, observability tools) cannot access reasoning data.
  • What changed: The endpoint now captures reasoning content from agent thinking events and returns it as type: "reasoning" output items (non-stream) or response.reasoning.delta/response.reasoning.done SSE events (stream). When no assistant text is produced but reasoning exists, the reasoning text is used as a fallback response to avoid empty replies from thinking-only models.
  • What did NOT change (scope boundary): No config changes required. No new dependencies. Existing behavior for non-reasoning models is unchanged. The onReasoningStream callback path for channel-specific reasoning delivery (Discord, Telegram, etc.) is unmodified. This change only adds reasoning to the HTTP API surface.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Related #54496
  • Related #44513
  • Related #8364
  • Partially addresses #31449 (OpenResponses reasoning output/events only; full tool I/O/context parity still pending)
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

N/A — this is a new feature implementing an existing spec item.

Regression Test Plan (if applicable)

N/A

User-visible / Behavior Changes

Non-streaming responses

When reasoning is produced, the response output array now includes a type: "reasoning" item before the assistant message:

{
  "output": [
    {
      "type": "reasoning",
      "id": "reasoning_abc123",
      "content": "Let me think about this step by step..."
    },
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "The answer is 42." }]
    }
  ]
}

Streaming responses

Two new SSE event types are emitted during streaming:

  • response.reasoning.delta — cumulative reasoning text as it streams
  • response.reasoning.done — final reasoning text when complete

Fallback behavior

When a model produces reasoning but no assistant text (common with thinking-only model configurations), the reasoning content is used as the response text instead of returning "No response from OpenClaw."

Diagram (if applicable)

Model produces reasoning + answer:

  [thinking events] ──→ collect reasoning ──→ output[0]: reasoning item
  [assistant events] ──→ collect text     ──→ output[1]: assistant message

Model produces reasoning only (no assistant text):

  [thinking events] ──→ collect reasoning ──→ output[0]: reasoning item
  (no assistant text)                     ──→ output[1]: assistant = reasoning fallback

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No — reasoning content was already produced by the model, it was just being discarded

Repro + Verification

Environment

  • OS: macOS 15.4 (Apple Silicon)
  • Runtime: Node 22, Bun 1.2
  • Model: any reasoning-capable model (Claude, o-series, Gemini thinking)

Steps

  1. Send a POST to /v1/responses with a reasoning-capable model
  2. Observe the response output array

Expected

Reasoning content appears as type: "reasoning" output item.

Actual

Same as expected.

Evidence

  • pnpm build green
  • pnpm check green (lint, format, types)
  • src/gateway/openresponses-http.test.ts — 12/12 tests pass
  • src/agents/pi-embedded-subscribe.* — 21/30 pass (9 pre-existing upstream failures, verified same count on clean upstream/main)

Human Verification (required)

  • Verified: non-stream reasoning collection, stream reasoning events, fallback from thinking to text, cleanup on error
  • Edge cases: no reasoning produced (no-op), reasoning-only response (fallback), tool-call + reasoning combined output
  • Not verified: production load, all provider-specific thinking formats

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes — adds new output items, does not remove or change existing ones
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: reasoning content could be large (10k+ chars) for complex prompts, increasing response payload size
    • Mitigation: This mirrors the model's actual output; callers can ignore the reasoning item if not needed
  • Risk: double emitAgentEvent calls (raw + formatted) for channel-streaming reasoning mode
    • Mitigation: The raw-only event fires first for HTTP capture; the formatted event fires only when streamReasoning is active. HTTP consumers filter by rawText presence.

Made with Cursor

Changed files

  • src/agents/pi-embedded-subscribe.handlers.messages.ts (modified, +16/-8)
  • src/agents/pi-embedded-subscribe.handlers.types.ts (modified, +1/-0)
  • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts (modified, +45/-0)
  • src/agents/pi-embedded-subscribe.ts (modified, +26/-9)
  • src/gateway/open-responses.schema.ts (modified, +15/-1)
  • src/gateway/openresponses-http.test.ts (modified, +131/-0)
  • src/gateway/openresponses-http.ts (modified, +182/-15)
  • src/gateway/server-chat.agent-events.test.ts (modified, +41/-0)
  • src/gateway/server-chat.ts (modified, +3/-2)
RAW_BUFFERClick to expand / collapse

Add support for toggling reasoning/thinking on vLLM-compatible models (e.g., Nemotron with reasoning parser enabled).

Currently vLLM models have reasoning ON by default with no way to toggle. Claude models support /reasoning on|off, but vLLM doesn't have equivalent.

Suggested: Pass chat_template_kwargs: { enable_thinking: boolean } in OpenAI-compatible requests, similar to how Claude reasoning is handled in sessions-patch.ts

extent analysis

Fix Plan

To add support for toggling reasoning on vLLM-compatible models, we will modify the request payload to include a chat_template_kwargs object with an enable_thinking boolean property.

Steps

  • Update the sessions-patch.ts file to include the enable_thinking property in the chat_template_kwargs object.
  • Modify the API request to accept a enable_thinking parameter and pass it to the chat_template_kwargs object.

Example Code

// sessions-patch.ts
const chatTemplateKwargs = {
  enable_thinking: enableThinking, // enableThinking is a boolean variable
};

// API request
const response = await openai.createChatCompletion({
 ...otherParams,
  chat_template_kwargs: chatTemplateKwargs,
});

In the API request, enableThinking can be set to true or false to toggle reasoning on or off.

Verification

To verify the fix, test the API request with enable_thinking set to both true and false and check the response to ensure that reasoning is toggled correctly.

Extra Tips

  • Make sure to update the API documentation to reflect the new enable_thinking parameter.
  • Consider adding error handling to ensure that the enable_thinking parameter is a valid boolean value.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix feat: Add vLLM reasoning/thinking toggle support (enable_thinking parameter) [1 pull requests, 1 comments, 1 participants]