openclaw - 💡(How to fix) Fix bug: openai-codex-responses provider leaks internal reasoning into assistantTexts (gpt-5.5) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72738Fetched 2026-04-28 06:32:47
View on GitHub
Comments
2
Participants
2
Timeline
3
Reactions
0
Timeline (top)
commented ×2closed ×1

Internal reasoning from the openai-codex-responses provider (gpt-5.5) appears in trajectory assistantTexts and gets forwarded to user-facing channels (e.g. Feishu groups via feishu_im_user_message). End users see English reasoning fragments mid-Chinese conversation.

This is structurally similar to the deepseek thinking leak from #72044, but for a different provider (openai-codex-responses, not openai-completions).

Related issues (different root causes):

  • #72044 — deepseek-v4-pro reasoning_content fix incomplete (sister issue, different provider)
  • #39032 — subagent tool-failure reasoning leak (different scope: subagent, tool-error trigger)
  • #67092 — </think> parsing failure (different mechanism)
  • #45041 — output-level filter feature request (orthogonal: this issue is a provider-level sanitization bug)

Error Message

  • #39032 — subagent tool-failure reasoning leak (different scope: subagent, tool-error trigger)

Root Cause

Related issues (different root causes):

  • #72044 — deepseek-v4-pro reasoning_content fix incomplete (sister issue, different provider)
  • #39032 — subagent tool-failure reasoning leak (different scope: subagent, tool-error trigger)
  • #67092 — </think> parsing failure (different mechanism)
  • #45041 — output-level filter feature request (orthogonal: this issue is a provider-level sanitization bug)

Fix Action

Fix / Workaround

Workaround (none satisfactory)

RAW_BUFFERClick to expand / collapse

Summary

Internal reasoning from the openai-codex-responses provider (gpt-5.5) appears in trajectory assistantTexts and gets forwarded to user-facing channels (e.g. Feishu groups via feishu_im_user_message). End users see English reasoning fragments mid-Chinese conversation.

This is structurally similar to the deepseek thinking leak from #72044, but for a different provider (openai-codex-responses, not openai-completions).

Related issues (different root causes):

  • #72044 — deepseek-v4-pro reasoning_content fix incomplete (sister issue, different provider)
  • #39032 — subagent tool-failure reasoning leak (different scope: subagent, tool-error trigger)
  • #67092 — </think> parsing failure (different mechanism)
  • #45041 — output-level filter feature request (orthogonal: this issue is a provider-level sanitization bug)

Environment

  • OpenClaw v2026.4.24
  • macOS 25.4.0 (Apple Silicon)
  • Provider: openai-codex (Responses API)
  • Model: gpt-5.5
  • Channel: feishu_im_user_message → Feishu group

Symptom

Main agent (agentId: main) using gpt-5.5 over openai-codex-responses produces user-facing assistant text where internal reasoning fragments appear unsanitized. The fragments have a distinctive telegraphic style: terse English with omitted articles ("Need write", "Need maybe", "first run") — clearly internal model reasoning, not normal assistant output style.

Reproduction

  1. Run main agent with provider: openai-codex + modelId: gpt-5.5
  2. Drive a multi-turn task involving spawning subagents and chained tool calls
  3. Inspect trajectory data.assistantTexts — internal reasoning fragments appear

Evidence (session 9aa089ac-ebb8-4fba-a8ca-f8a5676d66f8)

Trajectory file:

~/.openclaw/agents/main/sessions/9aa089ac-ebb8-4fba-a8ca-f8a5676d66f8.trajectory.jsonl

7 leaked reasoning fragments captured in trajectory assistantTexts:

"Need write SESSION updated? already perhaps but new choice. Need write not use edit. Then handoff files. Need include outdated GPT-5.3-codex maybe user supplied; but requires live? ..."

"Need maybe also fetch codex page for basic; but enough. Need maybe domestic preparation details user asks: network, OpenAI account, payment, install. ..."

"Need spawn. Need use agents_list? for reasoning available? probably. But can call sessions_spawn agentId reasoning. Explicit model. Use task concise but include handoff path ..."

"Need since note says don't poll, wait completions. Need maybe update plan completed spawn. Then sessions_yield."

"Need progress update every 2 min; for now immediately maybe user-visible? final? Since in tool phase, can send normal? But final later. ..."

"Need update records. TopicPipeline decision_status maybe 已发布 or 已完成? Skill says Joulian 反馈已发布后 update TopicPipeline decision_status=已发布 ..."

"Need maybe update SESSION-STATE. Also maybe update memory daily? User confirms publication; valuable. AGENTS says WAL if user message has decision/preference/key status ..."

Expected

Internal reasoning content (whatever the codex Responses API emits as "reasoning") should be filtered out of assistantTexts and out of any messaging-tool content the agent produces, similar to how deepseek's reasoning_content is intended to be sanitized (per the partial fix discussed in #72044).

Impact

  • User in Feishu sees English reasoning fragments mid-Chinese conversation (jarring UX in a Chinese-only content workflow)
  • Erodes trust in agent output when "thinking out loud" leaks into visible messages
  • Affects content-factory production workflow (assistantTexts → feishu_im_user_message → end user)
  • Also pollutes trajectories used for audit/debugging

Workaround (none satisfactory)

Prompt-level instructions cannot suppress provider-level reasoning emission (per #45041 research notes — "more reasoning steps = more leakage", and prompts work only ~80% reliably). This needs runtime sanitization in the openai-codex-responses provider — symmetric to whatever path handles deepseek's reasoning_content separation.

extent analysis

TL;DR

The issue can be fixed by implementing runtime sanitization in the openai-codex-responses provider to filter out internal reasoning content from assistantTexts.

Guidance

  • Review the openai-codex-responses provider code to identify where internal reasoning content is being emitted and how it can be filtered out.
  • Investigate the deepseek provider's reasoning_content sanitization mechanism (mentioned in #72044) as a potential reference for implementing similar sanitization in the openai-codex-responses provider.
  • Consider adding a post-processing step to assistantTexts to remove any internal reasoning fragments before they are sent to user-facing channels.
  • Verify that the sanitization mechanism is working correctly by inspecting the assistantTexts trajectory and checking for the presence of internal reasoning fragments.

Example

No code example is provided as the issue does not contain sufficient information about the provider's implementation details.

Notes

The issue is specific to the openai-codex-responses provider and the gpt-5.5 model, and the fix may need to be tailored to this specific configuration. Additionally, the issue is similar to #72044, but the root cause and solution may differ.

Recommendation

Apply a workaround by implementing runtime sanitization in the openai-codex-responses provider, as prompt-level instructions are not sufficient to suppress provider-level reasoning emission.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING