openclaw - ✅(Solved) Fix feat: Add vLLM reasoning/thinking toggle support (enable_thinking parameter) [1 pull requests, 1 comments, 1 participants]

dennis-lynch · 2026-03-26T07:02:33Z

[openclaw] PR 56674: feat openresponses : return reasoning/thinking content in /v1/responses output - Repository: openclaw/openclaw - Author: tonga54 - State:… # PR #56674: feat(openresponses): return reasoning/thinking content in /v1/responses output - Repository: openclaw/openclaw - Author: tonga54 - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/56674 ## Description (problem / solution / changelog) ## Summary - **Problem:** The `/v1/responses` endpoint does not return model reasoning/thinking content. When models produce chain-of-thought reasoning (Anthropic Claude, OpenAI o-series, Gemini thinking, Ollama thinking models), the reasoning output is silently discarded by the HTTP API. Callers have no way to observe model reasoning through the OpenResponses interface. - **Why it matters:** The [Open Responses spec](https://www.openresponses.org/reference) defines `type: "reasoning"` as a first-class output item (`ReasoningBody`) and `response.reasoning.delta`/`response.reasoning.done` as streaming events. Not implementing these means OpenClaw's OpenResponses endpoint is incomplete relative to the spec, and API callers (dashboards, orchestrators, observability tools) cannot access reasoning data. - **What changed:** The endpoint now captures reasoning content from agent thinking events and returns it as `type: "reasoning"` output items (non-stream) or `response.reasoning.delta`/`response.reasoning.done` SSE events (stream). When no assistant text is produced but reasoning exists, the reasoning text is used as a fallback response to avoid empty replies from thinking-only models. - **What did NOT change (scope boundary):** No config changes required. No new dependencies. Existing behavior for non-reasoning models is unchanged. The `onReasoningStream` callback path for channel-specific reasoning delivery (Discord, Telegram, etc.) is unmodified. This change only adds reasoning to the HTTP API surface. ## Change Type (select all) - [ ] Bug fix - [x] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Related #54496 - Related #44513 - Related #8364 - Partially addresses #31449 (OpenResponses reasoning output/events only; full tool I/O/context parity still pending) - [ ] This PR fixes a bug or regression ## Root Cause / Regression History (if applicable) N/A — this is a new feature implementing an existing spec item. ## Regression Test Plan (if applicable) N/A ## User-visible / Behavior Changes ### Non-streaming responses When reasoning is produced, the response `output` array now includes a `type: "reasoning"` item before the assistant message: ```json { "output": [ { "type": "reasoning", "id": "reasoning_abc123", "content": "Let me think about this step by step..." }, { "type": "message", "role": "assistant", "content": [{ "type": "output_text", "text": "The answer is 42." }] } ] } ``` ### Streaming responses Two new SSE event types are emitted during streaming: - `response.reasoning.delta` — cumulative reasoning text as it streams - `response.reasoning.done` — final reasoning text when complete ### Fallback behavior When a model produces reasoning but no assistant text (common with thinking-only model configurations), the reasoning content is used as the response text instead of returning "No response from OpenClaw." ## Diagram (if applicable) ``` Model produces reasoning + answer: [thinking events] ──→ collect reasoning ──→ output[0]: reasoning item [assistant events] ──→ collect text ──→ output[1]: assistant message Model produces reasoning only (no assistant text): [thinking events] ──→ collect reasoning ──→ output[0]: reasoning item (no assistant text) ──→ output[1]: assistant = reasoning fallback ``` ## Security Impact (required) - New permissions/capabilities? No - Secrets/tokens handling changed? No - New/changed network calls? No - Command/tool execution surface changed? No - Data access scope changed? No — reasoning content was already produced by the model, it was just being discarded ## Repro + Verification ### Environment - OS: macOS 15.4 (Apple Silicon) - Runtime: Node 22, Bun 1.2 - Model: any reasoning-capable model (Claude, o-series, Gemini thinking) ### Steps 1. Send a POST to `/v1/responses` with a reasoning-capable model 2. Observe the response `output` array ### Expected Reasoning content appears as `type: "reasoning"` output item. ### Actual Same as expected. ## Evidence - [x] `pnpm build` green - [x] `pnpm check` green (lint, format, types) - [x] `src/gateway/openresponses-http.test.ts` — 12/12 tests pass - [x] `src/agents/pi-embedded-subscribe.*` — 21/30 pass (9 pre-existing upstream failures, verified same count on clean `upstream/main`) ## Human Verification (required) - Verified: n

openclaw2026-03-26 07:02:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#54983•Fetched 2026-04-08 01:33:55

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dennis-lynch

Participants

dennis-lynch

Timeline (top)

cross-referenced ×2commented ×1

Fix Action

Fix / Workaround

Suggested: Pass chat_template_kwargs: { enable_thinking: boolean } in OpenAI-compatible requests, similar to how Claude reasoning is handled in sessions-patch.ts

PR fix notes

PR #56674: feat(openresponses): return reasoning/thinking content in /v1/responses output

Repository: openclaw/openclaw
Author: tonga54
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/56674

Description (problem / solution / changelog)

Summary

Problem: The /v1/responses endpoint does not return model reasoning/thinking content. When models produce chain-of-thought reasoning (Anthropic Claude, OpenAI o-series, Gemini thinking, Ollama thinking models), the reasoning output is silently discarded by the HTTP API. Callers have no way to observe model reasoning through the OpenResponses interface.
Why it matters: The Open Responses spec defines type: "reasoning" as a first-class output item (ReasoningBody) and response.reasoning.delta/response.reasoning.done as streaming events. Not implementing these means OpenClaw's OpenResponses endpoint is incomplete relative to the spec, and API callers (dashboards, orchestrators, observability tools) cannot access reasoning data.
What changed: The endpoint now captures reasoning content from agent thinking events and returns it as type: "reasoning" output items (non-stream) or response.reasoning.delta/response.reasoning.done SSE events (stream). When no assistant text is produced but reasoning exists, the reasoning text is used as a fallback response to avoid empty replies from thinking-only models.
What did NOT change (scope boundary): No config changes required. No new dependencies. Existing behavior for non-reasoning models is unchanged. The onReasoningStream callback path for channel-specific reasoning delivery (Discord, Telegram, etc.) is unmodified. This change only adds reasoning to the HTTP API surface.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Related #54496
Related #44513
Related #8364
Partially addresses #31449 (OpenResponses reasoning output/events only; full tool I/O/context parity still pending)
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

N/A — this is a new feature implementing an existing spec item.

Regression Test Plan (if applicable)

N/A

User-visible / Behavior Changes

Non-streaming responses

When reasoning is produced, the response output array now includes a type: "reasoning" item before the assistant message:

{
  "output": [
    {
      "type": "reasoning",
      "id": "reasoning_abc123",
      "content": "Let me think about this step by step..."
    },
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "The answer is 42." }]
    }
  ]
}

Streaming responses

Two new SSE event types are emitted during streaming:

response.reasoning.delta — cumulative reasoning text as it streams
response.reasoning.done — final reasoning text when complete

Fallback behavior

When a model produces reasoning but no assistant text (common with thinking-only model configurations), the reasoning content is used as the response text instead of returning "No response from OpenClaw."

Diagram (if applicable)

Model produces reasoning + answer:

  [thinking events] ──→ collect reasoning ──→ output[0]: reasoning item
  [assistant events] ──→ collect text     ──→ output[1]: assistant message

Model produces reasoning only (no assistant text):

  [thinking events] ──→ collect reasoning ──→ output[0]: reasoning item
  (no assistant text)                     ──→ output[1]: assistant = reasoning fallback

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No — reasoning content was already produced by the model, it was just being discarded

Repro + Verification

Environment

OS: macOS 15.4 (Apple Silicon)
Runtime: Node 22, Bun 1.2
Model: any reasoning-capable model (Claude, o-series, Gemini thinking)

Steps

Send a POST to /v1/responses with a reasoning-capable model
Observe the response output array

Expected

Reasoning content appears as type: "reasoning" output item.

Actual

Same as expected.

Evidence

pnpm build green
pnpm check green (lint, format, types)
src/gateway/openresponses-http.test.ts — 12/12 tests pass
src/agents/pi-embedded-subscribe.* — 21/30 pass (9 pre-existing upstream failures, verified same count on clean upstream/main)

Human Verification (required)

Verified: non-stream reasoning collection, stream reasoning events, fallback from thinking to text, cleanup on error
Edge cases: no reasoning produced (no-op), reasoning-only response (fallback), tool-call + reasoning combined output
Not verified: production load, all provider-specific thinking formats

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes — adds new output items, does not remove or change existing ones
Config/env changes? No
Migration needed? No

Risks and Mitigations

Risk: reasoning content could be large (10k+ chars) for complex prompts, increasing response payload size
- Mitigation: This mirrors the model's actual output; callers can ignore the reasoning item if not needed
Risk: double emitAgentEvent calls (raw + formatted) for channel-streaming reasoning mode
- Mitigation: The raw-only event fires first for HTTP capture; the formatted event fires only when streamReasoning is active. HTTP consumers filter by rawText presence.

Made with Cursor

Changed files

src/agents/pi-embedded-subscribe.handlers.messages.ts (modified, +16/-8)
src/agents/pi-embedded-subscribe.handlers.types.ts (modified, +1/-0)
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts (modified, +45/-0)
src/agents/pi-embedded-subscribe.ts (modified, +26/-9)
src/gateway/open-responses.schema.ts (modified, +15/-1)
src/gateway/openresponses-http.test.ts (modified, +131/-0)
src/gateway/openresponses-http.ts (modified, +182/-15)
src/gateway/server-chat.agent-events.test.ts (modified, +41/-0)
src/gateway/server-chat.ts (modified, +3/-2)

RAW_BUFFERClick to expand / collapse

Add support for toggling reasoning/thinking on vLLM-compatible models (e.g., Nemotron with reasoning parser enabled).

Currently vLLM models have reasoning ON by default with no way to toggle. Claude models support /reasoning on|off, but vLLM doesn't have equivalent.

Suggested: Pass chat_template_kwargs: { enable_thinking: boolean } in OpenAI-compatible requests, similar to how Claude reasoning is handled in sessions-patch.ts

extent analysis

Fix Plan

To add support for toggling reasoning on vLLM-compatible models, we will modify the request payload to include a chat_template_kwargs object with an enable_thinking boolean property.

Steps

Update the sessions-patch.ts file to include the enable_thinking property in the chat_template_kwargs object.
Modify the API request to accept a enable_thinking parameter and pass it to the chat_template_kwargs object.

Example Code

// sessions-patch.ts
const chatTemplateKwargs = {
  enable_thinking: enableThinking, // enableThinking is a boolean variable
};

// API request
const response = await openai.createChatCompletion({
 ...otherParams,
  chat_template_kwargs: chatTemplateKwargs,
});

In the API request, enableThinking can be set to true or false to toggle reasoning on or off.

Verification

To verify the fix, test the API request with enable_thinking set to both true and false and check the response to ensure that reasoning is toggled correctly.

Extra Tips

Make sure to update the API documentation to reflect the new enable_thinking parameter.
Consider adding error handling to ensure that the enable_thinking parameter is a valid boolean value.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#generation error #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix feat: Add vLLM reasoning/thinking toggle support (enable_thinking parameter) [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #56674: feat(openresponses): return reasoning/thinking content in /v1/responses output

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause / Regression History (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Non-streaming responses

Streaming responses

Fallback behavior

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

extent analysis

Fix Plan

Steps

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING