openclaw - 💡(How to fix) Fix Ollama provider: missing response-level reasoning stripper for Kimi models causes inline reasoning leak to chat

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When using ollama/kimi-k2.6:cloud (and likely kimi-k2.5:cloud) with Think: off, the model's inline reasoning text leaks into the visible chat output. The gateway correctly sends think: false (native Ollama) and thinking: { type: "disabled" } (Moonshot wrapper) on outgoing requests, but the model still emits reasoning text inline — separated from the actual response by a boundary delimiter. The Ollama provider has no response-level stripper for this inline reasoning, unlike the opencode-go provider which has stripOpencodeGoKimiReasoningPayload.

Root Cause

  1. Request side is correctcreateConfiguredOllamaCompatStreamWrapper applies both:

    • createOllamaThinkingWrapper(..., false) → sets think: false on native Ollama payload
    • createMoonshotThinkingWrapper(..., "disabled") → sets thinking: { type: "disabled" }
  2. Model ignores the disable signalkimi-k2.6 still outputs reasoning inline, likely because the Ollama API passthrough doesn't propagate the disable parameter correctly to the underlying model, or the model inherently emits reasoning regardless.

  3. Missing response stripper — The opencode-go provider has stripOpencodeGoKimiReasoningPayload which:

    • Deletes reasoning, reasoning_details, reasoning_content, reasoning_text fields
    • Filters out type: "thinking" / type: "reasoning" content parts from messages
    • Replaces stripped content with [assistant reasoning omitted]

    The ollama provider has no equivalent response sanitizer for Kimi models.

Fix Action

Fix / Workaround

Workarounds

Code Example

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://192.168.1.72:11434",
        api: "ollama",
        models: [
          {
            id: "kimi-k2.6:cloud",
            name: "kimi-k2.6:cloud",
            reasoning: false,
            params: {
              num_ctx: 262144
            }
          }
        ]
      }
    }
  },
  agents: {
    defaults: {
      model: { primary: "ollama/kimi-k2.6:cloud" }
    }
  }
}

---

"The user is asking what projects we've worked on so far. Based on my memory files...

...Let me provide a clear summary.  So far we've got **3 active projects** tracked:"
RAW_BUFFERClick to expand / collapse

Summary

When using ollama/kimi-k2.6:cloud (and likely kimi-k2.5:cloud) with Think: off, the model's inline reasoning text leaks into the visible chat output. The gateway correctly sends think: false (native Ollama) and thinking: { type: "disabled" } (Moonshot wrapper) on outgoing requests, but the model still emits reasoning text inline — separated from the actual response by a boundary delimiter. The Ollama provider has no response-level stripper for this inline reasoning, unlike the opencode-go provider which has stripOpencodeGoKimiReasoningPayload.

Environment

  • OpenClaw version: 2026.5.22 (a374c3a)
  • Provider: ollama
  • Model: ollama/kimi-k2.6:cloud (also observed with kimi-k2.5:cloud)
  • Runtime: Think: off
  • OS: Windows 10.0.26200 (x64)
  • Ollama base URL: http://192.168.1.72:11434

Reproduction

Config (relevant excerpt)

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://192.168.1.72:11434",
        api: "ollama",
        models: [
          {
            id: "kimi-k2.6:cloud",
            name: "kimi-k2.6:cloud",
            reasoning: false,
            params: {
              num_ctx: 262144
            }
          }
        ]
      }
    }
  },
  agents: {
    defaults: {
      model: { primary: "ollama/kimi-k2.6:cloud" }
    }
  }
}

Steps

  1. Set primary model to ollama/kimi-k2.6:cloud
  2. Ensure thinking / Think is off
  3. Send any message that triggers the agent
  4. Observe the assistant response in chat

Actual behavior

The visible response contains the model's internal reasoning monologue, followed by a boundary delimiter, then the actual response. Example from session history:

"The user is asking what projects we've worked on so far. Based on my memory files...

...Let me provide a clear summary. ️ So far we've got **3 active projects** tracked:"

The text before is raw reasoning that should be internal only.

Expected behavior

Only the text after the reasoning delimiter should be visible to the user. The reasoning block should be stripped, discarded, or stored as internal metadata — never rendered as chat content.

Root cause analysis

  1. Request side is correctcreateConfiguredOllamaCompatStreamWrapper applies both:

    • createOllamaThinkingWrapper(..., false) → sets think: false on native Ollama payload
    • createMoonshotThinkingWrapper(..., "disabled") → sets thinking: { type: "disabled" }
  2. Model ignores the disable signalkimi-k2.6 still outputs reasoning inline, likely because the Ollama API passthrough doesn't propagate the disable parameter correctly to the underlying model, or the model inherently emits reasoning regardless.

  3. Missing response stripper — The opencode-go provider has stripOpencodeGoKimiReasoningPayload which:

    • Deletes reasoning, reasoning_details, reasoning_content, reasoning_text fields
    • Filters out type: "thinking" / type: "reasoning" content parts from messages
    • Replaces stripped content with [assistant reasoning omitted]

    The ollama provider has no equivalent response sanitizer for Kimi models.

Cross-references

  • #81988opencode-go/kimi-k2.6: reasoning field leaks through passthrough replay policy (same family, different provider)
  • #83812opencode-go/kimi-k2.6 sends unsupported reasoning_details in replayed messages (request-side fix for opencode-go)
  • #6470 — Discord: reasoning content posted as regular messages (general reasoning leak class)
  • ollama/ollama#10456 — Ollama-level discussion on disabling thinking mode

Suggested fix

Add a Kimi-specific response sanitizer in the Ollama provider, analogous to what exists for opencode-go:

Option A (provider-level): In createConfiguredOllamaCompatStreamWrapper, when isOllamaCloudKimiModelRef(modelId) is true, wrap the stream with a response interceptor that strips inline reasoning text from assistant message content before it reaches the user.

Option B (gateway-level): Add a general stripInlineReasoningFromAssistantText utility in the message processing pipeline that recognizes the (or equivalent) delimiter and splits/omits the reasoning portion.

Option C (model registry): Mark kimi-k2.6:cloud and kimi-k2.5:cloud under the Ollama provider as reasoning: true with a reasoningOutputMode: "inline" so the gateway knows to apply stripping regardless of what the model claims.

Workarounds

  • Switch to a non-reasoning model (e.g., llama3.2:3b, gemma4)
  • Modify the model's Ollama Modelfile to inject a system prompt forbidding reasoning output

Impact

  • Severity: Medium-High — leaks internal decision-making and planning to user-visible chat
  • Affected channels: All (webchat, Discord, Telegram, etc.)
  • Frequency: Every multi-turn assistant message with ollama/kimi-k2.6:cloud
<img width="452" height="679" alt="Image" src="https://github.com/user-attachments/assets/648db7c4-abd0-478a-b53f-d785091202b8" />

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Only the text after the reasoning delimiter should be visible to the user. The reasoning block should be stripped, discarded, or stored as internal metadata — never rendered as chat content.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Ollama provider: missing response-level reasoning stripper for Kimi models causes inline reasoning leak to chat