Only intentional user-facing natural language should reach Telegram. The following should never be visible to the user: - commentary-only text - raw tool-call traces - `to=functions.*` - `analysis` / `wait` - malformed / garbage multilingual tokens

openclaw - ✅(Solved) Fix [Bug]: Telegram/Codex still leaks commentary-phase tool-call trace text with multilingual garbage tokens in 2026.3.13 [1 pull requests, 5 comments, 5 participants]

hyspacex · 2026-03-22T04:53:23Z

[openclaw] OpenClaw can still leak internal commentary / tool-call trace text into Telegram user-visible messages in 2026.3.13 when using openai-codex/gpt-5.4… OpenClaw can still leak internal commentary / tool-call trace text into **Telegram** user-visible messages in **2026.3.13** when using **`openai-codex/gpt-5.4`**. The leaked text includes: - raw tool-call-like JSON arguments - internal routing fragments like `to=functions.exec` / `to=functions.process` - internal-looking labels like `analysis` / `wait` - random multilingual garbage tokens such as `久久免费`, `大香蕉`, `平台主管` This looks very similar to earlier issues in the same family, but with a few important differences: - reproduced on **Telegram DM**, not just Discord / Feishu / Olvid - reproduced on **current OpenClaw 2026.3.13** - reproduced with **`openai-codex/gpt-5.4`**, not just older Codex variants - local transcript evidence shows the leaked string is already stored as assistant **text** with `phase: "commentary"`, next to a valid structured tool call # PR #59643: fix(agents): preserve commentary/final_answer phase separation - Repository: openclaw/openclaw - Author: ringlochid - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/59643 ## Description (problem / solution / changelog) ## Summary - Problem: assistant turns that contain both `commentary` and `final_answer` text can be flattened into one visible output, which leaks commentary into user-facing replies and can produce duplicate or malformed final delivery. - Why it matters: this breaks the expected final-only user experience, causes duplicate replies after tool/send paths, and corrupts replay/context because mixed-phase text is persisted and replayed ambiguously. - What changed: preserved phase separation end-to-end across stored-message conversion, replay/input-item rebuilding, WebSocket partial phase propagation, and visible extraction/delivery so user-visible output prefers `final_answer` while still falling back safely when no final text exists. - What did NOT change (scope boundary): this does not globally redefine every text extractor in the repo, does not change tool-call semantics, and does not attempt a broader phase-aware audit outside the main OpenAI WS -> embedded subscribe -> visible delivery path. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [x] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [x] Memory / storage - [x] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #59150 - Related #56198 - Related #58892 - Related #52084 - Related #25592 - Related #44467 - Related PR #30479 - Related PR #57484 - [x] This PR fixes a bug or regression ## Root Cause / Regression History (if applicable) - Root cause: assistant text phase truth already existed at block level, but several layers still flattened mixed-phase text. Stored assistant messages could carry both commentary and final-answer blocks under one misleading top-level phase; replay collapsed them back together; stream partials could lose item-phase attribution; and visible extraction/delivery then consumed flattened text instead of phase-aware text. - Missing detection / guardrail: there was no focused regression coverage for mixed-phase stored messages, phase-aware replay splitting, signature-only phased partials, or `text_end` / `message_end` delivery interactions where commentary previews must be suppressed/replaced by final output. - Prior context (`git blame`, prior PR, issue, or refactor if known): this bug family overlaps longstanding leakage/duplication reports in #59150, #56198, #58892, #52084, #25592, and #44467. Adjacent prior art includes #30479 (stripping raw user-facing protocol leakage) and #57484 (commentary-delivery semantics on a channel path), but those did not fix mixed-phase persistence/replay/delivery end-to-end. - Why this regressed now: the issue is not a single recent regression; it is an accumulated phase-separation gap that became more visible once commentary, block replies, tool sends, and final-answer delivery all coexisted on the same assistant turn path. - If unknown, what was ruled out: ruled out “display-only” root cause. Investigation confirmed the problem begins upstream in stored-message conversion/replay semantics, not only in final rendering. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [x] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: - `src/agents/openai-ws-stream.test.ts` - `src/agents/pi-embedded-utils.test.ts` - `src/agents/pi-embedded-subscribe.handlers.messages.test.ts` - `src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts` - `src/agents/pi-embedded-subscribe.s

openclaw2026-03-22 04:53:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#52084•Fetched 2026-04-08 01:15:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×5cross-referenced ×4closed ×1

OpenClaw can still leak internal commentary / tool-call trace text into Telegram user-visible messages in 2026.3.13 when using openai-codex/gpt-5.4.

The leaked text includes:

raw tool-call-like JSON arguments
internal routing fragments like to=functions.exec / to=functions.process
internal-looking labels like analysis / wait
random multilingual garbage tokens such as 久久免费, 大香蕉, 平台主管

This looks very similar to earlier issues in the same family, but with a few important differences:

reproduced on Telegram DM, not just Discord / Feishu / Olvid
reproduced on current OpenClaw 2026.3.13
reproduced with openai-codex/gpt-5.4, not just older Codex variants
local transcript evidence shows the leaked string is already stored as assistant text with phase: "commentary", next to a valid structured tool call

Root Cause

Root cause hypothesis

PR fix notes

PR #59643: fix(agents): preserve commentary/final_answer phase separation

Repository: openclaw/openclaw
Author: ringlochid
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/59643

Description (problem / solution / changelog)

Summary

Problem: assistant turns that contain both commentary and final_answer text can be flattened into one visible output, which leaks commentary into user-facing replies and can produce duplicate or malformed final delivery.
Why it matters: this breaks the expected final-only user experience, causes duplicate replies after tool/send paths, and corrupts replay/context because mixed-phase text is persisted and replayed ambiguously.
What changed: preserved phase separation end-to-end across stored-message conversion, replay/input-item rebuilding, WebSocket partial phase propagation, and visible extraction/delivery so user-visible output prefers final_answer while still falling back safely when no final text exists.
What did NOT change (scope boundary): this does not globally redefine every text extractor in the repo, does not change tool-call semantics, and does not attempt a broader phase-aware audit outside the main OpenAI WS -> embedded subscribe -> visible delivery path.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #59150
Related #56198
Related #58892
Related #52084
Related #25592
Related #44467
Related PR #30479
Related PR #57484
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

Root cause: assistant text phase truth already existed at block level, but several layers still flattened mixed-phase text. Stored assistant messages could carry both commentary and final-answer blocks under one misleading top-level phase; replay collapsed them back together; stream partials could lose item-phase attribution; and visible extraction/delivery then consumed flattened text instead of phase-aware text.
Missing detection / guardrail: there was no focused regression coverage for mixed-phase stored messages, phase-aware replay splitting, signature-only phased partials, or text_end / message_end delivery interactions where commentary previews must be suppressed/replaced by final output.
Prior context (git blame, prior PR, issue, or refactor if known): this bug family overlaps longstanding leakage/duplication reports in #59150, #56198, #58892, #52084, #25592, and #44467. Adjacent prior art includes #30479 (stripping raw user-facing protocol leakage) and #57484 (commentary-delivery semantics on a channel path), but those did not fix mixed-phase persistence/replay/delivery end-to-end.
Why this regressed now: the issue is not a single recent regression; it is an accumulated phase-separation gap that became more visible once commentary, block replies, tool sends, and final-answer delivery all coexisted on the same assistant turn path.
If unknown, what was ruled out: ruled out “display-only” root cause. Investigation confirmed the problem begins upstream in stored-message conversion/replay semantics, not only in final rendering.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- src/agents/openai-ws-stream.test.ts
- src/agents/pi-embedded-utils.test.ts
- src/agents/pi-embedded-subscribe.handlers.messages.test.ts
- src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts
- src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts
- src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.ts
Scenario the test should lock in: mixed commentary/final stored messages stay phase-separated on replay; stream partials preserve phase; visible extraction prefers final_answer -> commentary -> legacy/unphased; commentary text_end block replies are suppressed until final delivery; final replacement at message_end works; and duplicate/prefix-extension regressions remain fixed.
Why this is the smallest reliable guardrail: the bug spans stored-message conversion, stream partial attribution, and delivery seams. Unit-only coverage at one layer would miss the cross-layer collapse that caused the visible duplication/leak.
Existing test that already covers this (if any): none before this branch for the mixed-phase end-to-end path.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Visible assistant output now prefers final_answer text when both commentary and final-answer phases exist in one turn.
Commentary-only previews are no longer allowed to leak through as the final visible reply in the main embedded delivery path.
When commentary streamed first and final text arrives later, the final visible reply replaces the preview instead of duplicating it.
If no final-answer text exists, commentary/unphased fallback still works instead of producing an empty reply.

Diagram (if applicable)

Before:
[mixed commentary + final_answer blocks]
  -> [stored/replayed as flattened assistant text]
  -> [visible extractor concatenates all text]
  -> [commentary leak and/or duplicate final reply]

After:
[mixed commentary + final_answer blocks]
  -> [phase preserved in storage/replay/partials]
  -> [visible extractor prefers final_answer]
  -> [commentary preview suppressed/replaced]
  -> [single intended final visible reply]

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Ubuntu (local dev host)
Runtime/container: local Node/pnpm repo checkout
Model/provider: OpenAI WS / embedded subscribe path
Integration/channel (if any): embedded delivery path with Telegram/Discord-adjacent visible-output semantics
Relevant config (redacted): default OpenAI WS / embedded subscribe test harnesses; no special secrets required

Steps

Produce or replay an assistant turn containing both commentary and final_answer text blocks.
Observe stored/replayed assistant content and the user-visible delivery path.
Confirm whether visible output leaks commentary, duplicates final delivery, or collapses mixed phases.

Expected

Stored and replayed assistant content preserves phase boundaries.
User-visible extraction prefers final_answer when present.
Commentary previews are suppressed or replaced rather than duplicated.

Actual

Before this fix: mixed-phase turns could flatten into one visible assistant reply, leak commentary, or produce duplicate final delivery.
On this branch: phase separation is preserved through replay and delivery, and the targeted duplicate/leak regressions are covered by tests.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)
local transcript evidence showed assistant turns with both commentary and final_answer blocks in a single message
targeted regression slice passed: 8 suites / 164 tests
fresh independent review loop passed for storage/replay, stream-phase, visible-delivery, and holistic closure before branch submission

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- inspected mixed-phase transcript evidence and confirmed commentary + final-answer coexistence in one assistant turn
- verified stored-message/replay, stream partial propagation, and visible delivery changes in the touched files
- ran the targeted regression slice and confirmed 8 suites / 164 tests passed
- confirmed the branch diff remains scoped to the intended 10 files
Edge cases checked:
- commentary leaking through text_end block replies
- final replacement at message_end after commentary streamed first
- legacy/unphased + phased replay collapse
- signature-only phased partials without top-level partial.phase
- prefix-extension and duplicate text_end regressions
What you did not verify:
- full-repo tsc --noEmit on this host (prior typecheck attempts were memory-constrained)
- a broader follow-up audit of other phase-blind helper paths such as src/agents/tools/sessions-helpers.ts

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: other assistant-text consumers outside the main embedded delivery path may still use phase-blind flattening and show adjacent inconsistencies.
- Mitigation: this PR keeps scope tight to the verified main issue path and leaves sessions-helpers parity as explicit follow-up watchpoint rather than silently broadening behavior.
Risk: replay/delivery edge cases could regress around partial/final transitions.
- Mitigation: regression coverage now locks in mixed-phase replay splitting, phase-aware partial attribution, commentary suppression, final replacement, and duplicate/text-end edge cases.

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/openai-ws-message-conversion.ts (modified, +35/-13)
src/agents/openai-ws-stream.test.ts (modified, +490/-30)
src/agents/openai-ws-stream.ts (modified, +145/-12)
src/agents/pi-embedded-subscribe.handlers.messages.test.ts (modified, +169/-0)
src/agents/pi-embedded-subscribe.handlers.messages.ts (modified, +84/-32)
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts (modified, +1/-0)
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.ts (modified, +4/-1)
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts (modified, +468/-1)
src/agents/pi-embedded-utils.test.ts (modified, +76/-1)
src/agents/pi-embedded-utils.ts (modified, +108/-6)

Code Example

{"action":"poll","sessionId":"glow-mist","timeout":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait

---

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "{\"action\":\"poll\",\"sessionId\":\"glow-mist\",\"timeout\":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait",
      "textSignature": "{\"v\":1,\"id\":\"...\",\"phase\":\"commentary\"}"
    },
    {
      "type": "toolCall",
      "name": "process",
      "arguments": {
        "action": "poll",
        "sessionId": "glow-mist",
        "timeout": 30000
      }
    }
  ]
}

RAW_BUFFERClick to expand / collapse

Summary

OpenClaw can still leak internal commentary / tool-call trace text into Telegram user-visible messages in 2026.3.13 when using openai-codex/gpt-5.4.

The leaked text includes:

raw tool-call-like JSON arguments
internal routing fragments like to=functions.exec / to=functions.process
internal-looking labels like analysis / wait
random multilingual garbage tokens such as 久久免费, 大香蕉, 平台主管

This looks very similar to earlier issues in the same family, but with a few important differences:

reproduced on Telegram DM, not just Discord / Feishu / Olvid
reproduced on current OpenClaw 2026.3.13
reproduced with openai-codex/gpt-5.4, not just older Codex variants
local transcript evidence shows the leaked string is already stored as assistant text with phase: "commentary", next to a valid structured tool call

Steps to reproduce

Use OpenClaw in a Telegram direct chat
Use openai-codex/gpt-5.4 as the runtime model
Trigger a multi-step task involving long-running local tools, e.g.
- exec to download media / prepare files
- background exec
- repeated process poll
Wait for the task to run through several tool rounds
Observe that an occasional malformed internal-looking message can be sent to Telegram before the final answer

In the observed case, this happened during a local transcription workflow (download media → extract audio → run local MLX Whisper → poll background process).

Expected behavior

Only intentional user-facing natural language should reach Telegram.

The following should never be visible to the user:

commentary-only text
raw tool-call traces
to=functions.*
analysis / wait
malformed / garbage multilingual tokens

Actual behavior

Telegram received malformed internal-looking messages such as:

{"action":"poll","sessionId":"glow-mist","timeout":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait

Other similar leaked messages also included:

to=functions.exec plus mixed-script garbage between tool-routing fragments
another process poll leak with 平台总代理 and a different random multilingual token before analysis
raw JSON tool arguments appearing as plain text immediately before a valid structured toolCall

Key evidence

The important part is that the malformed string was already present in the assistant message content, not in the tool result.

A simplified structure from the local transcript looked like this:

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "{\"action\":\"poll\",\"sessionId\":\"glow-mist\",\"timeout\":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait",
      "textSignature": "{\"v\":1,\"id\":\"...\",\"phase\":\"commentary\"}"
    },
    {
      "type": "toolCall",
      "name": "process",
      "arguments": {
        "action": "poll",
        "sessionId": "glow-mist",
        "timeout": 30000
      }
    }
  ]
}

Additional observations:

the adjacent toolCall is valid and structured normally
surrounding toolResult entries were clean
the first malformed text blocks appeared before any suspicious tool output that could explain them
the garbage tokens did not appear to come from the media being processed or the tool outputs themselves

Root cause hypothesis

This looks like two problems compounding:

Model/output-side failure
- during tool-calling, the model sometimes emits malformed commentary text instead of staying purely in structured tool-call mode
- the malformed commentary text can contain raw tool-call transcript fragments plus random multilingual junk tokens
OpenClaw delivery/sanitization failure
- OpenClaw stores that malformed text as assistant text / phase: "commentary"
- that commentary text is later forwarded to Telegram instead of being suppressed or sanitized

So this looks less like a compromised tool and more like:

malformed model tool-call/commentary output
plus a runtime/outbound filtering gap

Why this report may still be useful even if related issues exist

This report adds:

Telegram confirmation for the same bug family
2026.3.13 confirmation that the issue is still present
gpt-5.4 / Codex runtime confirmation
direct evidence that the leaked payload is stored as assistant commentary text before delivery

Related issues

This seems related to:

#30441 — Codex subagent emits raw function-call syntax and hallucinated tokens to user chat
#24376 — Telegram stream + reasoning leaks toolUse intermediate status text into user-visible replies
#25592 — Text between tool calls leaks to messaging channels
#30704 — Internal tool-call trace leaked into user-visible chat message
#41435 — Internal tool-call/commentary payload leaked into user-visible Feishu DM messages
#44905 — Discord leaks internal tool-call traces (commentary, to=functions.*, raw JSON)
#45271 — Model does tool-calling narrations since 2026.3.7

If maintainers think this is a duplicate, happy to close it in favor of the best canonical issue.

Environment

OpenClaw: 2026.3.13
OS: macOS arm64
Channel: Telegram DM
Runtime model: openai-codex/gpt-5.4
Trigger pattern: multi-step tool workflow with long-running exec + repeated process poll

(Deliberately omitting any personal paths, chat IDs, usernames, or private transcript locations from this public report.)

extent analysis

Fix Plan

To address the issue of leaked internal commentary and tool-call traces in Telegram messages, we need to modify the OpenClaw code to properly sanitize and filter out unwanted text before sending it to the user. Here are the steps:

Filter commentary text: Before sending any text to Telegram, check if the text has a phase of "commentary". If so, do not send it.
Remove tool-call traces: Use a regular expression to remove any text that matches the pattern of a tool-call trace (e.g., to=functions.*, raw JSON tool arguments).
Sanitize text: Remove any unwanted characters or tokens from the text, such as multilingual garbage tokens.

Example code:

import re

def sanitize_text(text):
    # Remove commentary text
    if text.get("phase") == "commentary":
        return None
    
    # Remove tool-call traces
    text = re.sub(r'to=functions\..*', '', text)
    text = re.sub(r'\{.*\}', '', text)
    
    # Sanitize text
    text = re.sub(r'[^\x00-\x7F]+', '', text)  # remove non-ASCII characters
    
    return text

# Example usage:
text = "{\"action\":\"poll\",\"sessionId\":\"glow-mist\",\"timeout\":30000} to=functions.process 久久免费热在线精品functions.process კომენტary to=functions.process ,大香蕉analysis to=functions.process wait"
sanitized_text = sanitize_text(text)
if sanitized_text:
    # Send sanitized_text to Telegram
    print(sanitized_text)

Verification

To verify that the fix worked, test the OpenClaw system with the same trigger pattern that caused the issue. Check that the leaked internal commentary and tool-call traces are no longer visible in the Telegram messages.

Extra Tips

Regularly review and update the sanitization and filtering logic to ensure that it remains effective against new types of leaked text.
Consider implementing additional logging and monitoring to detect and respond to any future instances of leaked text.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Only intentional user-facing natural language should reach Telegram.

The following should never be visible to the user:

commentary-only text
raw tool-call traces
to=functions.*
analysis / wait
malformed / garbage multilingual tokens

#tokenizer error #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Telegram/Codex still leaks commentary-phase tool-call trace text with multilingual garbage tokens in 2026.3.13 [1 pull requests, 5 comments, 5 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root cause hypothesis

PR fix notes

PR #59643: fix(agents): preserve commentary/final_answer phase separation

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause / Regression History (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Summary

Steps to reproduce

Expected behavior

Actual behavior

Key evidence

Root cause hypothesis

Why this report may still be useful even if related issues exist

Related issues

Environment

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING