openclaw - ✅(Solved) Fix [Bug]: Gemma 4 on OpenAI-compat transports — `reasoning_content` is re-sent in conversation history, violating Gemma's documented contract [1 pull requests, 1 comments, 2 participants]

chip-snomo · 2026-04-18T22:13:38Z

[openclaw] When using Gemma 4 models google/gemma-4-26b-a4b, google/gemma-4-31b, etc. via an OpenAI-compatible endpoint such as LM Studio, OpenClaw re-sends th… When using Gemma 4 models (google/gemma-4-26b-a4b, google/gemma-4-31b, etc.) via an OpenAI-compatible endpoint such as LM Studio, OpenClaw re-sends the model's prior-turn `reasoning_content` on subsequent turns. Google's Gemma 4 documentation explicitly states this should NOT be done: > "You must remove (strip) the model's generated thoughts from the previous turn before passing the conversation history back to the model for the next turn." — [Gemma 4 Thinking Mode docs](https://ai.google.dev/gemma/docs/capabilities/thinking) The exception (function-calling within a single turn, where thoughts must be preserved between tool calls in the same turn) is correctly handled for Anthropic's signed-thinking contract, but it is not the right default for Gemma on OpenAI-compatible transports. # PR #68763: fix: strip reasoning_content from conversation history for Gemma 4 models - Repository: openclaw/openclaw - Author: Kailigithub - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/68763 ## Description (problem / solution / changelog) ## Problem Closes #68704 When using Gemma 4 models via LM Studio, Ollama, or vLLM (OpenAI-compatible endpoints), `reasoning_content` from prior turns is re-sent in the conversation history. Google Gemma 4 documentation explicitly states that thinking/reasoning content must NOT be included in history. This degrades response quality because the model receives internal reasoning from previous turns as visible context, confusing the conversation flow. ## Root Cause The transport layer captures `reasoning_content` as `thinking` blocks and persists them in the transcript. The existing `dropThinkingBlocks` mechanism: 1. Only applies to **Anthropic Claude** models (via `modelId.includes("claude")`) 2. Preserves thinking in the **latest** turn for signed-thinking replay cache matching Neither of these is correct for Gemma 4, which needs **all** reasoning stripped from **all** turns. ## Fix Add a new `dropReasoningFromHistory` policy flag that strips ALL thinking blocks from ALL assistant messages (unlike `dropThinkingBlocks` which preserves the latest): 1. **`src/agents/pi-embedded-helpers/google.ts`** — New `isGemmaModelRequiringReasoningStrip()` helper detecting `gemma-4+` model IDs 2. **`src/agents/transcript-policy.ts`** — Add `dropReasoningFromHistory` to `TranscriptPolicy` type, default, and merge logic; auto-enable for Gemma 4+ in `buildUnownedProviderTransportReplayFallback` 3. **`src/plugins/types.ts`** — Add `dropReasoningFromHistory` to `ProviderReplayPolicy` type 4. **`src/agents/pi-embedded-runner/thinking.ts`** — Export `stripAllThinkingBlocks()` (was module-private) 5. **`src/agents/pi-embedded-runner/replay-history.ts`** — Apply `stripAllThinkingBlocks` when `dropReasoningFromHistory` is true ## Testing Models that trigger the new policy: - `gemma-4-27b-it`, `gemma-4-12b-it` (Ollama) - `google/gemma-4-E2B-it` (HuggingFace) - Future `gemma-5+` models via the `gemma-[4-9]|gemma-\d{2,}` pattern Models that do NOT trigger it: - `gemma-2b`, `gemma-3-4b` (not reasoning models) - Claude, GPT, other models (unchanged behavior) ## Impact - 5 files changed, 31 insertions(+), 5 deletions(-) - Only affects transcript replay policy for Gemma 4+ models - No changes to non-Gemma model behavior ## Changed files - `src/agents/pi-embedded-helpers/google.ts` (modified, +14/-0) - `src/agents/pi-embedded-runner/replay-history.ts` (modified, +6/-4) - `src/agents/pi-embedded-runner/thinking.ts` (modified, +1/-1) - `src/agents/transcript-policy.ts` (modified, +9/-0) - `src/plugins/types.ts` (modified, +1/-0) ## Fixed - Fixed by PR: fix: strip reasoning_content from conversation history for Gemma 4 models (https://github.com/openclaw/openclaw/pull/68763) ### Bug type Behavior bug (incorrect outbound payload) ### Beta release blocker No ### Summary When using Gemma 4 models (google/gemma-4-26b-a4b, google/gemma-4-31b, etc.) via an OpenAI-compatible endpoint such as LM Studio, OpenClaw re-sends the model's prior-turn `reasoning_content` on subsequent turns. Google's Gemma 4 documentation explicitly states this should NOT be done: > "You must remove (strip) the model's generated thoughts from the previous turn before passing the conversation history back to the model for the next turn." — [Gemma 4 Thinking Mode docs](https://ai.google.dev/gemma/docs/capabilities/thinking) The exception (function-calling within a single turn, where thoughts must be preserved between tool calls in the same turn) is correctly handled for Anthropic's signed-thinking contract, but it is not the right default for Gemma on OpenAI-compatible transports. ### Steps to reproduce 1. Configure LM Studio (or any llama.cpp / vLLM / OpenAI-compat endpoint) serving a Gemma 4 model with thinking enabled at the chat-template level. 2. In `openclaw.json`, ad

openclaw2026-04-18 22:13:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68704•Fetched 2026-04-19 15:08:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

chip-snomo

Participants

chip-snomo

gongli0929

Timeline (top)

commented ×1cross-referenced ×1referenced ×1

When using Gemma 4 models (google/gemma-4-26b-a4b, google/gemma-4-31b, etc.) via an OpenAI-compatible endpoint such as LM Studio, OpenClaw re-sends the model's prior-turn reasoning_content on subsequent turns. Google's Gemma 4 documentation explicitly states this should NOT be done:

"You must remove (strip) the model's generated thoughts from the previous turn before passing the conversation history back to the model for the next turn." — Gemma 4 Thinking Mode docs

Error Message

The exception (function-calling within a single turn, where thoughts must be preserved between tool calls in the same turn) is correctly handled for Anthropic's signed-thinking contract, but it is not the right default for Gemma on OpenAI-compatible transports. Provider-aware stripping: when the target model is a Gemma 4 family model (id prefix gemma-4, or with a flag on the model definition such as stripReasoningInHistory: true), do not re-attach the thinking block to history. Preserving thought blocks between tool calls within a single turn should still be allowed, because Google's guidance makes that explicit exception.

Root Cause

Provider-aware stripping: when the target model is a Gemma 4 family model (id prefix gemma-4, or with a flag on the model definition such as stripReasoningInHistory: true), do not re-attach the thinking block to history. Preserving thought blocks between tool calls within a single turn should still be allowed, because Google's guidance makes that explicit exception.

Fix Action

Fixed

Fixed by PR: fix: strip reasoning_content from conversation history for Gemma 4 models (https://github.com/openclaw/openclaw/pull/68763)

PR fix notes

PR #68763: fix: strip reasoning_content from conversation history for Gemma 4 models

Repository: openclaw/openclaw
Author: Kailigithub
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/68763

Description (problem / solution / changelog)

Problem

Closes #68704

When using Gemma 4 models via LM Studio, Ollama, or vLLM (OpenAI-compatible endpoints), reasoning_content from prior turns is re-sent in the conversation history. Google Gemma 4 documentation explicitly states that thinking/reasoning content must NOT be included in history.

This degrades response quality because the model receives internal reasoning from previous turns as visible context, confusing the conversation flow.

Root Cause

The transport layer captures reasoning_content as thinking blocks and persists them in the transcript. The existing dropThinkingBlocks mechanism:

Only applies to Anthropic Claude models (via modelId.includes("claude"))
Preserves thinking in the latest turn for signed-thinking replay cache matching

Neither of these is correct for Gemma 4, which needs all reasoning stripped from all turns.

Fix

Add a new dropReasoningFromHistory policy flag that strips ALL thinking blocks from ALL assistant messages (unlike dropThinkingBlocks which preserves the latest):

src/agents/pi-embedded-helpers/google.ts — New isGemmaModelRequiringReasoningStrip() helper detecting gemma-4+ model IDs
src/agents/transcript-policy.ts — Add dropReasoningFromHistory to TranscriptPolicy type, default, and merge logic; auto-enable for Gemma 4+ in buildUnownedProviderTransportReplayFallback
src/plugins/types.ts — Add dropReasoningFromHistory to ProviderReplayPolicy type
src/agents/pi-embedded-runner/thinking.ts — Export stripAllThinkingBlocks() (was module-private)
src/agents/pi-embedded-runner/replay-history.ts — Apply stripAllThinkingBlocks when dropReasoningFromHistory is true

Testing

Models that trigger the new policy:

gemma-4-27b-it, gemma-4-12b-it (Ollama)
google/gemma-4-E2B-it (HuggingFace)
Future gemma-5+ models via the gemma-[4-9]|gemma-\d{2,} pattern

Models that do NOT trigger it:

gemma-2b, gemma-3-4b (not reasoning models)
Claude, GPT, other models (unchanged behavior)

Impact

5 files changed, 31 insertions(+), 5 deletions(-)
Only affects transcript replay policy for Gemma 4+ models
No changes to non-Gemma model behavior

Changed files

src/agents/pi-embedded-helpers/google.ts (modified, +14/-0)
src/agents/pi-embedded-runner/replay-history.ts (modified, +6/-4)
src/agents/pi-embedded-runner/thinking.ts (modified, +1/-1)
src/agents/transcript-policy.ts (modified, +9/-0)
src/plugins/types.ts (modified, +1/-0)

Code Example

"lmstudio": {
     "baseUrl": "http://<host>:1234/v1",
     "apiKey": "lmstudio",
     "api": "openai-responses",
     "models": [
       { "id": "google/gemma-4-26b-a4b", "name": "Gemma 4 26B A4B",
         "contextWindow": 256000, "maxTokens": 4096,
         "reasoning": true, "input": ["text", "image"],
         "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0} }
     ]
   }

---

const thinkingBlocks = msg.content.filter((b) => b.type === "thinking");
const nonEmptyThinkingBlocks = thinkingBlocks.filter((b) => b.thinking && b.thinking.trim().length > 0);
if (nonEmptyThinkingBlocks.length > 0) {
    ...
    const signature = nonEmptyThinkingBlocks[0].thinkingSignature;  // "reasoning_content" for Gemma/llama.cpp
    if (signature && signature.length > 0) {
        assistantMsg[signature] = nonEmptyThinkingBlocks.map((b) => b.thinking).join("\n");
    }
}

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect outbound payload)

Beta release blocker

Summary

"You must remove (strip) the model's generated thoughts from the previous turn before passing the conversation history back to the model for the next turn." — Gemma 4 Thinking Mode docs

Steps to reproduce

Configure LM Studio (or any llama.cpp / vLLM / OpenAI-compat endpoint) serving a Gemma 4 model with thinking enabled at the chat-template level.

In openclaw.json, add a provider entry pointing at that endpoint:

"lmstudio": {
  "baseUrl": "http://<host>:1234/v1",
  "apiKey": "lmstudio",
  "api": "openai-responses",
  "models": [
    { "id": "google/gemma-4-26b-a4b", "name": "Gemma 4 26B A4B",
      "contextWindow": 256000, "maxTokens": 4096,
      "reasoning": true, "input": ["text", "image"],
      "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0} }
  ]
}

Wire the model into an agent and have a multi-turn conversation without tool calls.
Capture the second-turn outbound request body (e.g. LM Studio's request log, or an mitm proxy).

Expected behavior

The assistant message from turn 1 replayed in turn 2's history should contain only the visible content, not the prior reasoning, for Gemma-family models on OpenAI-compat transports.

Actual behavior

The outbound assistant message in history includes reasoning_content: "<previous thinking>". Every subsequent turn accumulates more historical thinking in the payload.

Code reference

The re-attachment happens in the bundled @mariozechner/pi-ai library:

node_modules/@mariozechner/pi-ai/dist/providers/openai-completions.js — convertMessages, approximately:

const thinkingBlocks = msg.content.filter((b) => b.type === "thinking");
const nonEmptyThinkingBlocks = thinkingBlocks.filter((b) => b.thinking && b.thinking.trim().length > 0);
if (nonEmptyThinkingBlocks.length > 0) {
    ...
    const signature = nonEmptyThinkingBlocks[0].thinkingSignature;  // "reasoning_content" for Gemma/llama.cpp
    if (signature && signature.length > 0) {
        assistantMsg[signature] = nonEmptyThinkingBlocks.map((b) => b.thinking).join("\n");
    }
}

The signature round-trip is correct for providers that require reasoning-in-history (Anthropic signed thinking, some gpt-oss models). It is incorrect for Gemma 4, which explicitly forbids it.

Suggested fix

A conservative version would add a model-level opt-in (stripReasoningInHistory) that users can set in models.providers.<p>.models[].stripReasoningInHistory = true for Gemma. A more complete version would default it on based on model-id pattern match.

Impact

Context bloat that grows every turn (Gemma 4 thought channels routinely run 500-2000 tokens).
Violates Gemma's documented contract; may subtly degrade multi-turn quality as the model re-sees its own stale reasoning.
Affects anyone running Gemma 4 via LM Studio, llama.cpp server, Ollama, vLLM, or any OpenAI-compat gateway.

OpenClaw version

2026.4.15 (041266a)

Operating system

macOS 15.x (Darwin 25.3.0)

Install method

curl -fsSL https://openclaw.ai/install.sh | bash

Model

google/gemma-4-26b-a4b (also reproducible on gemma-4-31b)

Provider / routing chain

openclaw → lmstudio provider (openai-responses api) → LM Studio server → Gemma 4

Additional notes

Google's official guidance: https://ai.google.dev/gemma/docs/capabilities/thinking — "Pass back the conversation history... without the thought channel content"
Related but distinct: #65533 (MiniMax, which actually needs reasoning preserved — inverse case). This highlights that the right behavior is provider-specific.
Related: #61995 (MiniMax thinking suppression), #62127/#62411 (Gemma 4 native Google provider thinking-off semantics).

extent analysis

TL;DR

The issue can be fixed by modifying the convertMessages function in @mariozechner/pi-ai library to strip the reasoning_content from the message history for Gemma 4 models.

Guidance

Identify the model type and check if it's a Gemma 4 model before re-attaching the thinking block to history.
Add a model-level opt-in stripReasoningInHistory that users can set in models.providers.<p>.models[].stripReasoningInHistory = true for Gemma 4 models.
Modify the convertMessages function to check for the stripReasoningInHistory flag and strip the reasoning_content if it's set to true.
Test the fix by running a multi-turn conversation with a Gemma 4 model and verifying that the reasoning_content is not included in the message history.

Example

const isGemma4Model = (modelId) => modelId.startsWith('gemma-4');
const stripReasoningContent = (msg, modelId) => {
  if (isGemma4Model(modelId)) {
    delete msg.reasoning_content;
  }
  return msg;
};

Notes

The fix should be applied to the @mariozechner/pi-ai library, and users should be able to opt-in to the fix by setting the stripReasoningInHistory flag on their Gemma 4 models.

Recommendation

Apply the workaround by modifying the convertMessages function to strip the reasoning_content from the message history for Gemma 4 models, as this is a more targeted and efficient solution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The assistant message from turn 1 replayed in turn 2's history should contain only the visible content, not the prior reasoning, for Gemma-family models on OpenAI-compat transports.

#api #conversation history #API middleware #SSR setup #ISR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Gemma 4 on OpenAI-compat transports — `reasoning_content` is re-sent in conversation history, violating Gemma's documented contract [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #68763: fix: strip reasoning_content from conversation history for Gemma 4 models

Description (problem / solution / changelog)

Problem

Root Cause

Fix

Testing

Impact

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

Code reference

Suggested fix

Impact

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING