openclaw - ✅(Solved) Fix [Bug]: Telegram/Codex still leaks commentary-phase tool-call trace text with multilingual garbage tokens in 2026.3.13 [1 pull requests, 5 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52084Fetched 2026-04-08 01:15:52
View on GitHub
Comments
5
Participants
5
Timeline
10
Reactions
0
Author
Timeline (top)
commented ×5cross-referenced ×4closed ×1

OpenClaw can still leak internal commentary / tool-call trace text into Telegram user-visible messages in 2026.3.13 when using openai-codex/gpt-5.4.

The leaked text includes:

  • raw tool-call-like JSON arguments
  • internal routing fragments like to=functions.exec / to=functions.process
  • internal-looking labels like analysis / wait
  • random multilingual garbage tokens such as 久久免费, 大香蕉, 平台主管

This looks very similar to earlier issues in the same family, but with a few important differences:

  • reproduced on Telegram DM, not just Discord / Feishu / Olvid
  • reproduced on current OpenClaw 2026.3.13
  • reproduced with openai-codex/gpt-5.4, not just older Codex variants
  • local transcript evidence shows the leaked string is already stored as assistant text with phase: "commentary", next to a valid structured tool call

Root Cause

Root cause hypothesis

PR fix notes

PR #59643: fix(agents): preserve commentary/final_answer phase separation

Description (problem / solution / changelog)

Summary

  • Problem: assistant turns that contain both commentary and final_answer text can be flattened into one visible output, which leaks commentary into user-facing replies and can produce duplicate or malformed final delivery.
  • Why it matters: this breaks the expected final-only user experience, causes duplicate replies after tool/send paths, and corrupts replay/context because mixed-phase text is persisted and replayed ambiguously.
  • What changed: preserved phase separation end-to-end across stored-message conversion, replay/input-item rebuilding, WebSocket partial phase propagation, and visible extraction/delivery so user-visible output prefers final_answer while still falling back safely when no final text exists.
  • What did NOT change (scope boundary): this does not globally redefine every text extractor in the repo, does not change tool-call semantics, and does not attempt a broader phase-aware audit outside the main OpenAI WS -> embedded subscribe -> visible delivery path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #59150
  • Related #56198
  • Related #58892
  • Related #52084
  • Related #25592
  • Related #44467
  • Related PR #30479
  • Related PR #57484
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: assistant text phase truth already existed at block level, but several layers still flattened mixed-phase text. Stored assistant messages could carry both commentary and final-answer blocks under one misleading top-level phase; replay collapsed them back together; stream partials could lose item-phase attribution; and visible extraction/delivery then consumed flattened text instead of phase-aware text.
  • Missing detection / guardrail: there was no focused regression coverage for mixed-phase stored messages, phase-aware replay splitting, signature-only phased partials, or text_end / message_end delivery interactions where commentary previews must be suppressed/replaced by final output.
  • Prior context (git blame, prior PR, issue, or refactor if known): this bug family overlaps longstanding leakage/duplication reports in #59150, #56198, #58892, #52084, #25592, and #44467. Adjacent prior art includes #30479 (stripping raw user-facing protocol leakage) and #57484 (commentary-delivery semantics on a channel path), but those did not fix mixed-phase persistence/replay/delivery end-to-end.
  • Why this regressed now: the issue is not a single recent regression; it is an accumulated phase-separation gap that became more visible once commentary, block replies, tool sends, and final-answer delivery all coexisted on the same assistant turn path.
  • If unknown, what was ruled out: ruled out “display-only” root cause. Investigation confirmed the problem begins upstream in stored-message conversion/replay semantics, not only in final rendering.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/agents/openai-ws-stream.test.ts
    • src/agents/pi-embedded-utils.test.ts
    • src/agents/pi-embedded-subscribe.handlers.messages.test.ts
    • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts
    • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts
    • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.ts
  • Scenario the test should lock in: mixed commentary/final stored messages stay phase-separated on replay; stream partials preserve phase; visible extraction prefers final_answer -> commentary -> legacy/unphased; commentary text_end block replies are suppressed until final delivery; final replacement at message_end works; and duplicate/prefix-extension regressions remain fixed.
  • Why this is the smallest reliable guardrail: the bug spans stored-message conversion, stream partial attribution, and delivery seams. Unit-only coverage at one layer would miss the cross-layer collapse that caused the visible duplication/leak.
  • Existing test that already covers this (if any): none before this branch for the mixed-phase end-to-end path.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Visible assistant output now prefers final_answer text when both commentary and final-answer phases exist in one turn.
  • Commentary-only previews are no longer allowed to leak through as the final visible reply in the main embedded delivery path.
  • When commentary streamed first and final text arrives later, the final visible reply replaces the preview instead of duplicating it.
  • If no final-answer text exists, commentary/unphased fallback still works instead of producing an empty reply.

Diagram (if applicable)

Before:
[mixed commentary + final_answer blocks]
  -> [stored/replayed as flattened assistant text]
  -> [visible extractor concatenates all text]
  -> [commentary leak and/or duplicate final reply]

After:
[mixed commentary + final_answer blocks]
  -> [phase preserved in storage/replay/partials]
  -> [visible extractor prefers final_answer]
  -> [commentary preview suppressed/replaced]
  -> [single intended final visible reply]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Ubuntu (local dev host)
  • Runtime/container: local Node/pnpm repo checkout
  • Model/provider: OpenAI WS / embedded subscribe path
  • Integration/channel (if any): embedded delivery path with Telegram/Discord-adjacent visible-output semantics
  • Relevant config (redacted): default OpenAI WS / embedded subscribe test harnesses; no special secrets required

Steps

  1. Produce or replay an assistant turn containing both commentary and final_answer text blocks.
  2. Observe stored/replayed assistant content and the user-visible delivery path.
  3. Confirm whether visible output leaks commentary, duplicates final delivery, or collapses mixed phases.

Expected

  • Stored and replayed assistant content preserves phase boundaries.
  • User-visible extraction prefers final_answer when present.
  • Commentary previews are suppressed or replaced rather than duplicated.

Actual

  • Before this fix: mixed-phase turns could flatten into one visible assistant reply, leak commentary, or produce duplicate final delivery.
  • On this branch: phase separation is preserved through replay and delivery, and the targeted duplicate/leak regressions are covered by tests.

Evidence

Attach at least one:

  • Failing test/log before + passing after

  • Trace/log snippets

  • Screenshot/recording

  • Perf numbers (if relevant)

  • local transcript evidence showed assistant turns with both commentary and final_answer blocks in a single message

  • targeted regression slice passed: 8 suites / 164 tests

  • fresh independent review loop passed for storage/replay, stream-phase, visible-delivery, and holistic closure before branch submission

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • inspected mixed-phase transcript evidence and confirmed commentary + final-answer coexistence in one assistant turn
    • verified stored-message/replay, stream partial propagation, and visible delivery changes in the touched files
    • ran the targeted regression slice and confirmed 8 suites / 164 tests passed
    • confirmed the branch diff remains scoped to the intended 10 files
  • Edge cases checked:
    • commentary leaking through text_end block replies
    • final replacement at message_end after commentary streamed first
    • legacy/unphased + phased replay collapse
    • signature-only phased partials without top-level partial.phase
    • prefix-extension and duplicate text_end regressions
  • What you did not verify:
    • full-repo tsc --noEmit on this host (prior typecheck attempts were memory-constrained)
    • a broader follow-up audit of other phase-blind helper paths such as src/agents/tools/sessions-helpers.ts

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: other assistant-text consumers outside the main embedded delivery path may still use phase-blind flattening and show adjacent inconsistencies.
    • Mitigation: this PR keeps scope tight to the verified main issue path and leaves sessions-helpers parity as explicit follow-up watchpoint rather than silently broadening behavior.
  • Risk: replay/delivery edge cases could regress around partial/final transitions.
    • Mitigation: regression coverage now locks in mixed-phase replay splitting, phase-aware partial attribution, commentary suppression, final replacement, and duplicate/text-end edge cases.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/openai-ws-message-conversion.ts (modified, +35/-13)
  • src/agents/openai-ws-stream.test.ts (modified, +490/-30)
  • src/agents/openai-ws-stream.ts (modified, +145/-12)
  • src/agents/pi-embedded-subscribe.handlers.messages.test.ts (modified, +169/-0)
  • src/agents/pi-embedded-subscribe.handlers.messages.ts (modified, +84/-32)
  • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts (modified, +1/-0)
  • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.ts (modified, +4/-1)
  • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts (modified, +468/-1)
  • src/agents/pi-embedded-utils.test.ts (modified, +76/-1)
  • src/agents/pi-embedded-utils.ts (modified, +108/-6)

Code Example

{"action":"poll","sessionId":"glow-mist","timeout":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait

---

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "{\"action\":\"poll\",\"sessionId\":\"glow-mist\",\"timeout\":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait",
      "textSignature": "{\"v\":1,\"id\":\"...\",\"phase\":\"commentary\"}"
    },
    {
      "type": "toolCall",
      "name": "process",
      "arguments": {
        "action": "poll",
        "sessionId": "glow-mist",
        "timeout": 30000
      }
    }
  ]
}
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw can still leak internal commentary / tool-call trace text into Telegram user-visible messages in 2026.3.13 when using openai-codex/gpt-5.4.

The leaked text includes:

  • raw tool-call-like JSON arguments
  • internal routing fragments like to=functions.exec / to=functions.process
  • internal-looking labels like analysis / wait
  • random multilingual garbage tokens such as 久久免费, 大香蕉, 平台主管

This looks very similar to earlier issues in the same family, but with a few important differences:

  • reproduced on Telegram DM, not just Discord / Feishu / Olvid
  • reproduced on current OpenClaw 2026.3.13
  • reproduced with openai-codex/gpt-5.4, not just older Codex variants
  • local transcript evidence shows the leaked string is already stored as assistant text with phase: "commentary", next to a valid structured tool call

Steps to reproduce

  1. Use OpenClaw in a Telegram direct chat
  2. Use openai-codex/gpt-5.4 as the runtime model
  3. Trigger a multi-step task involving long-running local tools, e.g.
    • exec to download media / prepare files
    • background exec
    • repeated process poll
  4. Wait for the task to run through several tool rounds
  5. Observe that an occasional malformed internal-looking message can be sent to Telegram before the final answer

In the observed case, this happened during a local transcription workflow (download media → extract audio → run local MLX Whisper → poll background process).

Expected behavior

Only intentional user-facing natural language should reach Telegram.

The following should never be visible to the user:

  • commentary-only text
  • raw tool-call traces
  • to=functions.*
  • analysis / wait
  • malformed / garbage multilingual tokens

Actual behavior

Telegram received malformed internal-looking messages such as:

{"action":"poll","sessionId":"glow-mist","timeout":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait

Other similar leaked messages also included:

  • to=functions.exec plus mixed-script garbage between tool-routing fragments
  • another process poll leak with 平台总代理 and a different random multilingual token before analysis
  • raw JSON tool arguments appearing as plain text immediately before a valid structured toolCall

Key evidence

The important part is that the malformed string was already present in the assistant message content, not in the tool result.

A simplified structure from the local transcript looked like this:

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "{\"action\":\"poll\",\"sessionId\":\"glow-mist\",\"timeout\":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait",
      "textSignature": "{\"v\":1,\"id\":\"...\",\"phase\":\"commentary\"}"
    },
    {
      "type": "toolCall",
      "name": "process",
      "arguments": {
        "action": "poll",
        "sessionId": "glow-mist",
        "timeout": 30000
      }
    }
  ]
}

Additional observations:

  • the adjacent toolCall is valid and structured normally
  • surrounding toolResult entries were clean
  • the first malformed text blocks appeared before any suspicious tool output that could explain them
  • the garbage tokens did not appear to come from the media being processed or the tool outputs themselves

Root cause hypothesis

This looks like two problems compounding:

  1. Model/output-side failure

    • during tool-calling, the model sometimes emits malformed commentary text instead of staying purely in structured tool-call mode
    • the malformed commentary text can contain raw tool-call transcript fragments plus random multilingual junk tokens
  2. OpenClaw delivery/sanitization failure

    • OpenClaw stores that malformed text as assistant text / phase: "commentary"
    • that commentary text is later forwarded to Telegram instead of being suppressed or sanitized

So this looks less like a compromised tool and more like:

  • malformed model tool-call/commentary output
  • plus a runtime/outbound filtering gap

Why this report may still be useful even if related issues exist

This report adds:

  • Telegram confirmation for the same bug family
  • 2026.3.13 confirmation that the issue is still present
  • gpt-5.4 / Codex runtime confirmation
  • direct evidence that the leaked payload is stored as assistant commentary text before delivery

Related issues

This seems related to:

  • #30441 — Codex subagent emits raw function-call syntax and hallucinated tokens to user chat
  • #24376 — Telegram stream + reasoning leaks toolUse intermediate status text into user-visible replies
  • #25592 — Text between tool calls leaks to messaging channels
  • #30704 — Internal tool-call trace leaked into user-visible chat message
  • #41435 — Internal tool-call/commentary payload leaked into user-visible Feishu DM messages
  • #44905 — Discord leaks internal tool-call traces (commentary, to=functions.*, raw JSON)
  • #45271 — Model does tool-calling narrations since 2026.3.7

If maintainers think this is a duplicate, happy to close it in favor of the best canonical issue.

Environment

  • OpenClaw: 2026.3.13
  • OS: macOS arm64
  • Channel: Telegram DM
  • Runtime model: openai-codex/gpt-5.4
  • Trigger pattern: multi-step tool workflow with long-running exec + repeated process poll

(Deliberately omitting any personal paths, chat IDs, usernames, or private transcript locations from this public report.)

extent analysis

Fix Plan

To address the issue of leaked internal commentary and tool-call traces in Telegram messages, we need to modify the OpenClaw code to properly sanitize and filter out unwanted text before sending it to the user. Here are the steps:

  • Filter commentary text: Before sending any text to Telegram, check if the text has a phase of "commentary". If so, do not send it.
  • Remove tool-call traces: Use a regular expression to remove any text that matches the pattern of a tool-call trace (e.g., to=functions.*, raw JSON tool arguments).
  • Sanitize text: Remove any unwanted characters or tokens from the text, such as multilingual garbage tokens.

Example code:

import re

def sanitize_text(text):
    # Remove commentary text
    if text.get("phase") == "commentary":
        return None
    
    # Remove tool-call traces
    text = re.sub(r'to=functions\..*', '', text)
    text = re.sub(r'\{.*\}', '', text)
    
    # Sanitize text
    text = re.sub(r'[^\x00-\x7F]+', '', text)  # remove non-ASCII characters
    
    return text

# Example usage:
text = "{\"action\":\"poll\",\"sessionId\":\"glow-mist\",\"timeout\":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait"
sanitized_text = sanitize_text(text)
if sanitized_text:
    # Send sanitized_text to Telegram
    print(sanitized_text)

Verification

To verify that the fix worked, test the OpenClaw system with the same trigger pattern that caused the issue. Check that the leaked internal commentary and tool-call traces are no longer visible in the Telegram messages.

Extra Tips

  • Regularly review and update the sanitization and filtering logic to ensure that it remains effective against new types of leaked text.
  • Consider implementing additional logging and monitoring to detect and respond to any future instances of leaked text.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Only intentional user-facing natural language should reach Telegram.

The following should never be visible to the user:

  • commentary-only text
  • raw tool-call traces
  • to=functions.*
  • analysis / wait
  • malformed / garbage multilingual tokens

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING