openclaw - ✅(Solved) Fix [Bug]: Gemma 4 on OpenAI-compat transports — `reasoning_content` is re-sent in conversation history, violating Gemma's documented contract [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68704Fetched 2026-04-19 15:08:28
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
commented ×1cross-referenced ×1referenced ×1

When using Gemma 4 models (google/gemma-4-26b-a4b, google/gemma-4-31b, etc.) via an OpenAI-compatible endpoint such as LM Studio, OpenClaw re-sends the model's prior-turn reasoning_content on subsequent turns. Google's Gemma 4 documentation explicitly states this should NOT be done:

"You must remove (strip) the model's generated thoughts from the previous turn before passing the conversation history back to the model for the next turn." — Gemma 4 Thinking Mode docs

The exception (function-calling within a single turn, where thoughts must be preserved between tool calls in the same turn) is correctly handled for Anthropic's signed-thinking contract, but it is not the right default for Gemma on OpenAI-compatible transports.

Error Message

The exception (function-calling within a single turn, where thoughts must be preserved between tool calls in the same turn) is correctly handled for Anthropic's signed-thinking contract, but it is not the right default for Gemma on OpenAI-compatible transports. Provider-aware stripping: when the target model is a Gemma 4 family model (id prefix gemma-4, or with a flag on the model definition such as stripReasoningInHistory: true), do not re-attach the thinking block to history. Preserving thought blocks between tool calls within a single turn should still be allowed, because Google's guidance makes that explicit exception.

Root Cause

Provider-aware stripping: when the target model is a Gemma 4 family model (id prefix gemma-4, or with a flag on the model definition such as stripReasoningInHistory: true), do not re-attach the thinking block to history. Preserving thought blocks between tool calls within a single turn should still be allowed, because Google's guidance makes that explicit exception.

Fix Action

Fixed

PR fix notes

PR #68763: fix: strip reasoning_content from conversation history for Gemma 4 models

Description (problem / solution / changelog)

Problem

Closes #68704

When using Gemma 4 models via LM Studio, Ollama, or vLLM (OpenAI-compatible endpoints), reasoning_content from prior turns is re-sent in the conversation history. Google Gemma 4 documentation explicitly states that thinking/reasoning content must NOT be included in history.

This degrades response quality because the model receives internal reasoning from previous turns as visible context, confusing the conversation flow.

Root Cause

The transport layer captures reasoning_content as thinking blocks and persists them in the transcript. The existing dropThinkingBlocks mechanism:

  1. Only applies to Anthropic Claude models (via modelId.includes("claude"))
  2. Preserves thinking in the latest turn for signed-thinking replay cache matching

Neither of these is correct for Gemma 4, which needs all reasoning stripped from all turns.

Fix

Add a new dropReasoningFromHistory policy flag that strips ALL thinking blocks from ALL assistant messages (unlike dropThinkingBlocks which preserves the latest):

  1. src/agents/pi-embedded-helpers/google.ts — New isGemmaModelRequiringReasoningStrip() helper detecting gemma-4+ model IDs
  2. src/agents/transcript-policy.ts — Add dropReasoningFromHistory to TranscriptPolicy type, default, and merge logic; auto-enable for Gemma 4+ in buildUnownedProviderTransportReplayFallback
  3. src/plugins/types.ts — Add dropReasoningFromHistory to ProviderReplayPolicy type
  4. src/agents/pi-embedded-runner/thinking.ts — Export stripAllThinkingBlocks() (was module-private)
  5. src/agents/pi-embedded-runner/replay-history.ts — Apply stripAllThinkingBlocks when dropReasoningFromHistory is true

Testing

Models that trigger the new policy:

  • gemma-4-27b-it, gemma-4-12b-it (Ollama)
  • google/gemma-4-E2B-it (HuggingFace)
  • Future gemma-5+ models via the gemma-[4-9]|gemma-\d{2,} pattern

Models that do NOT trigger it:

  • gemma-2b, gemma-3-4b (not reasoning models)
  • Claude, GPT, other models (unchanged behavior)

Impact

  • 5 files changed, 31 insertions(+), 5 deletions(-)
  • Only affects transcript replay policy for Gemma 4+ models
  • No changes to non-Gemma model behavior

Changed files

  • src/agents/pi-embedded-helpers/google.ts (modified, +14/-0)
  • src/agents/pi-embedded-runner/replay-history.ts (modified, +6/-4)
  • src/agents/pi-embedded-runner/thinking.ts (modified, +1/-1)
  • src/agents/transcript-policy.ts (modified, +9/-0)
  • src/plugins/types.ts (modified, +1/-0)

Code Example

"lmstudio": {
     "baseUrl": "http://<host>:1234/v1",
     "apiKey": "lmstudio",
     "api": "openai-responses",
     "models": [
       { "id": "google/gemma-4-26b-a4b", "name": "Gemma 4 26B A4B",
         "contextWindow": 256000, "maxTokens": 4096,
         "reasoning": true, "input": ["text", "image"],
         "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0} }
     ]
   }

---

const thinkingBlocks = msg.content.filter((b) => b.type === "thinking");
const nonEmptyThinkingBlocks = thinkingBlocks.filter((b) => b.thinking && b.thinking.trim().length > 0);
if (nonEmptyThinkingBlocks.length > 0) {
    ...
    const signature = nonEmptyThinkingBlocks[0].thinkingSignature;  // "reasoning_content" for Gemma/llama.cpp
    if (signature && signature.length > 0) {
        assistantMsg[signature] = nonEmptyThinkingBlocks.map((b) => b.thinking).join("\n");
    }
}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect outbound payload)

Beta release blocker

No

Summary

When using Gemma 4 models (google/gemma-4-26b-a4b, google/gemma-4-31b, etc.) via an OpenAI-compatible endpoint such as LM Studio, OpenClaw re-sends the model's prior-turn reasoning_content on subsequent turns. Google's Gemma 4 documentation explicitly states this should NOT be done:

"You must remove (strip) the model's generated thoughts from the previous turn before passing the conversation history back to the model for the next turn." — Gemma 4 Thinking Mode docs

The exception (function-calling within a single turn, where thoughts must be preserved between tool calls in the same turn) is correctly handled for Anthropic's signed-thinking contract, but it is not the right default for Gemma on OpenAI-compatible transports.

Steps to reproduce

  1. Configure LM Studio (or any llama.cpp / vLLM / OpenAI-compat endpoint) serving a Gemma 4 model with thinking enabled at the chat-template level.
  2. In openclaw.json, add a provider entry pointing at that endpoint:
    "lmstudio": {
      "baseUrl": "http://<host>:1234/v1",
      "apiKey": "lmstudio",
      "api": "openai-responses",
      "models": [
        { "id": "google/gemma-4-26b-a4b", "name": "Gemma 4 26B A4B",
          "contextWindow": 256000, "maxTokens": 4096,
          "reasoning": true, "input": ["text", "image"],
          "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0} }
      ]
    }
  3. Wire the model into an agent and have a multi-turn conversation without tool calls.
  4. Capture the second-turn outbound request body (e.g. LM Studio's request log, or an mitm proxy).

Expected behavior

The assistant message from turn 1 replayed in turn 2's history should contain only the visible content, not the prior reasoning, for Gemma-family models on OpenAI-compat transports.

Actual behavior

The outbound assistant message in history includes reasoning_content: "<previous thinking>". Every subsequent turn accumulates more historical thinking in the payload.

Code reference

The re-attachment happens in the bundled @mariozechner/pi-ai library:

node_modules/@mariozechner/pi-ai/dist/providers/openai-completions.jsconvertMessages, approximately:

const thinkingBlocks = msg.content.filter((b) => b.type === "thinking");
const nonEmptyThinkingBlocks = thinkingBlocks.filter((b) => b.thinking && b.thinking.trim().length > 0);
if (nonEmptyThinkingBlocks.length > 0) {
    ...
    const signature = nonEmptyThinkingBlocks[0].thinkingSignature;  // "reasoning_content" for Gemma/llama.cpp
    if (signature && signature.length > 0) {
        assistantMsg[signature] = nonEmptyThinkingBlocks.map((b) => b.thinking).join("\n");
    }
}

The signature round-trip is correct for providers that require reasoning-in-history (Anthropic signed thinking, some gpt-oss models). It is incorrect for Gemma 4, which explicitly forbids it.

Suggested fix

Provider-aware stripping: when the target model is a Gemma 4 family model (id prefix gemma-4, or with a flag on the model definition such as stripReasoningInHistory: true), do not re-attach the thinking block to history. Preserving thought blocks between tool calls within a single turn should still be allowed, because Google's guidance makes that explicit exception.

A conservative version would add a model-level opt-in (stripReasoningInHistory) that users can set in models.providers.<p>.models[].stripReasoningInHistory = true for Gemma. A more complete version would default it on based on model-id pattern match.

Impact

  • Context bloat that grows every turn (Gemma 4 thought channels routinely run 500-2000 tokens).
  • Violates Gemma's documented contract; may subtly degrade multi-turn quality as the model re-sees its own stale reasoning.
  • Affects anyone running Gemma 4 via LM Studio, llama.cpp server, Ollama, vLLM, or any OpenAI-compat gateway.

OpenClaw version

2026.4.15 (041266a)

Operating system

macOS 15.x (Darwin 25.3.0)

Install method

curl -fsSL https://openclaw.ai/install.sh | bash

Model

google/gemma-4-26b-a4b (also reproducible on gemma-4-31b)

Provider / routing chain

openclaw → lmstudio provider (openai-responses api) → LM Studio server → Gemma 4

Additional notes

  • Google's official guidance: https://ai.google.dev/gemma/docs/capabilities/thinking — "Pass back the conversation history... without the thought channel content"
  • Related but distinct: #65533 (MiniMax, which actually needs reasoning preserved — inverse case). This highlights that the right behavior is provider-specific.
  • Related: #61995 (MiniMax thinking suppression), #62127/#62411 (Gemma 4 native Google provider thinking-off semantics).

extent analysis

TL;DR

The issue can be fixed by modifying the convertMessages function in @mariozechner/pi-ai library to strip the reasoning_content from the message history for Gemma 4 models.

Guidance

  • Identify the model type and check if it's a Gemma 4 model before re-attaching the thinking block to history.
  • Add a model-level opt-in stripReasoningInHistory that users can set in models.providers.<p>.models[].stripReasoningInHistory = true for Gemma 4 models.
  • Modify the convertMessages function to check for the stripReasoningInHistory flag and strip the reasoning_content if it's set to true.
  • Test the fix by running a multi-turn conversation with a Gemma 4 model and verifying that the reasoning_content is not included in the message history.

Example

const isGemma4Model = (modelId) => modelId.startsWith('gemma-4');
const stripReasoningContent = (msg, modelId) => {
  if (isGemma4Model(modelId)) {
    delete msg.reasoning_content;
  }
  return msg;
};

Notes

The fix should be applied to the @mariozechner/pi-ai library, and users should be able to opt-in to the fix by setting the stripReasoningInHistory flag on their Gemma 4 models.

Recommendation

Apply the workaround by modifying the convertMessages function to strip the reasoning_content from the message history for Gemma 4 models, as this is a more targeted and efficient solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The assistant message from turn 1 replayed in turn 2's history should contain only the visible content, not the prior reasoning, for Gemma-family models on OpenAI-compat transports.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Gemma 4 on OpenAI-compat transports — `reasoning_content` is re-sent in conversation history, violating Gemma's documented contract [1 pull requests, 1 comments, 2 participants]