One of the following should happen: 1. If the model intends to run a command, the runtime must produce a real tool call and corresponding tool lifecycle. 2. If execution cannot start, the assistant should fail fast with a blocked/error reply instead of printing an `exec`-looking code block. 3. A user-visible reply that claims to be “doing” or “checking” should not be accepted as a successful operational turn unless an action event actually exists.

openclaw - ✅(Solved) Fix Telegram direct session can emit exec-looking replies with zero tool calls [1 pull requests, 1 comments, 2 participants]

rennesdailylife-oss · 2026-04-04T16:36:35Z

[openclaw] In a Telegram direct/main session, OpenClaw can reply with exec -looking text such as “I will install/check this now” and a fenced bash block, but t… In a Telegram direct/main session, OpenClaw can reply with `exec`-looking text such as “I will install/check this now” and a fenced bash block, but the turn contains **zero real tool calls** and **zero tool results**. This is not just a generic “assistant acknowledged but did nothing” case. In this failure mode, the chat surface makes the reply look operationally real, but the session log shows that nothing actually executed. This looks related to #40631 and overlaps with the broader “action illusion” concern in #47213, but this report adds a Telegram-specific reproduction with concrete session evidence. # PR #61132: fix(agents): block unbacked exec-looking replies - Repository: openclaw/openclaw - Author: neeravmakwana - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/61132 ## Description (problem / solution / changelog) ## Summary - Problem: assistant turns could emit first-person execution-intent text plus runnable fenced shell blocks even when no real tool call or tool result existed. - Why it matters: Telegram/direct replies could look operationally real, which creates false execution confidence and hidden idle time. - What changed: added a narrow guard in the embedded assistant reply pipeline that rewrites strong unbacked execution-intent shell replies to a truthful fallback unless the turn already has real tool activity, and added regression coverage for blocked and tool-backed cases. - What did NOT change (scope boundary): this does not change legitimate tool-backed exec/approval replies, Telegram formatting, or general command/example rendering. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #60955 - Related #40631 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - Root cause: the assistant visible-text path stripped downgraded/internal tool markers, but it still treated ordinary first-person shell fences as safe user-facing text even when the turn had no tool lifecycle at all. - Missing detection / guardrail: there was no guard tying execution-intent shell text to actual tool activity before the reply was emitted to chat surfaces. - Contributing context (if known): Telegram renders fenced shell blocks as code blocks, which made the no-tool reply look especially operationally real. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [ ] Unit test - [x] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: - `src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts` - `src/agents/pi-embedded-utils.test.ts` - Scenario the test should lock in: block a no-tool assistant turn that says it will run/install/check something and includes a runnable fenced shell block, while preserving similar text once real tool activity exists. - Why this is the smallest reliable guardrail: the bug lives in the embedded assistant reply seam that decides what reaches user-facing chat delivery, so a seam-level test exercises the exact emission path without requiring a full Telegram e2e harness. - Existing test that already covers this (if any): none. - If no new test is added, why not: N/A. ## User-visible / Behavior Changes - Assistant turns that claim they are running a shell command and include a runnable fenced shell block without any real tool activity now return a truthful fallback instead of fake execution-looking text. - Legitimate tool-backed exec/approval flows continue to render normally. ## Diagram (if applicable) ```text Before: [user asks to run something] -> [assistant emits "I'll run/check this" + fenced shell block] -> [reply delivered with no tool activity] After: [user asks to run something] -> [assistant emits execution-intent shell text without tool activity] -> [guard rewrites reply] -> [truthful fallback delivered] ``` ## Security Impact (required) - New permissions/capabilities? (`No`) - Secrets/tokens handling changed? (`No`) - New/changed network calls? (`No`) - Command/tool execution surface changed? (`No`) - Data access scope changed? (`No`) - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS 15 / Darwin 25 - Runtime/container: local Node 22 + pnpm workspace - Model/provider: simulated embedded assistant session - Integration/channel (if any): assistant reply pipeline with Telegram-style shell-fence rendering behavior reproduced locally - Relevant config (redacted): default loc

openclaw2026-04-04 16:36:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#60955•Fetched 2026-04-08 02:45:11

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rennesdailylife-oss

Participants

neeravmakwana

rennesdailylife-oss

Timeline (top)

cross-referenced ×2commented ×1

In a Telegram direct/main session, OpenClaw can reply with exec-looking text such as “I will install/check this now” and a fenced bash block, but the turn contains zero real tool calls and zero tool results.

This is not just a generic “assistant acknowledged but did nothing” case. In this failure mode, the chat surface makes the reply look operationally real, but the session log shows that nothing actually executed.

This looks related to #40631 and overlaps with the broader “action illusion” concern in #47213, but this report adds a Telegram-specific reproduction with concrete session evidence.

Error Message

If execution cannot start, the assistant should fail fast with a blocked/error reply instead of printing an exec-looking code block.

a structured blocked/error state explaining why execution could not start

Root Cause

Why this is especially problematic

This is worse than a vague acknowledgement because the message format strongly implies that execution has already been delegated to the tool layer.

PR fix notes

PR #61132: fix(agents): block unbacked exec-looking replies

Repository: openclaw/openclaw
Author: neeravmakwana
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/61132

Description (problem / solution / changelog)

Summary

Problem: assistant turns could emit first-person execution-intent text plus runnable fenced shell blocks even when no real tool call or tool result existed.
Why it matters: Telegram/direct replies could look operationally real, which creates false execution confidence and hidden idle time.
What changed: added a narrow guard in the embedded assistant reply pipeline that rewrites strong unbacked execution-intent shell replies to a truthful fallback unless the turn already has real tool activity, and added regression coverage for blocked and tool-backed cases.
What did NOT change (scope boundary): this does not change legitimate tool-backed exec/approval replies, Telegram formatting, or general command/example rendering.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #60955
Related #40631
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: the assistant visible-text path stripped downgraded/internal tool markers, but it still treated ordinary first-person shell fences as safe user-facing text even when the turn had no tool lifecycle at all.
Missing detection / guardrail: there was no guard tying execution-intent shell text to actual tool activity before the reply was emitted to chat surfaces.
Contributing context (if known): Telegram renders fenced shell blocks as code blocks, which made the no-tool reply look especially operationally real.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts
- src/agents/pi-embedded-utils.test.ts
Scenario the test should lock in: block a no-tool assistant turn that says it will run/install/check something and includes a runnable fenced shell block, while preserving similar text once real tool activity exists.
Why this is the smallest reliable guardrail: the bug lives in the embedded assistant reply seam that decides what reaches user-facing chat delivery, so a seam-level test exercises the exact emission path without requiring a full Telegram e2e harness.
Existing test that already covers this (if any): none.
If no new test is added, why not: N/A.

User-visible / Behavior Changes

Assistant turns that claim they are running a shell command and include a runnable fenced shell block without any real tool activity now return a truthful fallback instead of fake execution-looking text.
Legitimate tool-backed exec/approval flows continue to render normally.

Diagram (if applicable)

Before:
[user asks to run something] -> [assistant emits "I'll run/check this" + fenced shell block] -> [reply delivered with no tool activity]

After:
[user asks to run something] -> [assistant emits execution-intent shell text without tool activity] -> [guard rewrites reply] -> [truthful fallback delivered]

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS 15 / Darwin 25
Runtime/container: local Node 22 + pnpm workspace
Model/provider: simulated embedded assistant session
Integration/channel (if any): assistant reply pipeline with Telegram-style shell-fence rendering behavior reproduced locally
Relevant config (redacted): default local test config

Steps

Simulate an assistant turn with text like I will install/check this now. followed by a runnable fenced bash block.
Do not emit any tool_execution_start or tool_execution_end events for that turn.
Observe the emitted assistant reply payload.

Expected

No execution-looking reply should be delivered as if work started successfully without a real tool event.

Actual

Before this fix, the reply pipeline emitted the execution-intent text and fenced shell block as user-visible output even though no tool activity existed.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios:
- Reproduced the bad no-tool assistant turn locally and confirmed the current code emitted the fake execution-looking reply before the fix.
- Verified the new fallback path with focused tests and local reproduction against the patched reply pipeline.
- Verified a similar reply still passes when the turn has real tool activity.
Edge cases checked:
- tool-backed replies are not blocked
- fenced shell examples without first-person execution-intent wording are not matched by the detector
- both streaming/message-end and final assistant text paths are covered by the guard
What you did not verify:
- a live Telegram roundtrip in a real chat session
- the full pnpm test suite did not complete cleanly in this branch because it failed in untouched ACP coverage at src/acp/control-plane/manager.test.ts (tracks parented direct ACP turns in the task registry)

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:

Risks and Mitigations

Risk: the guard could over-match benign instructional shell examples.
- Mitigation: detection is intentionally narrow and requires both first-person execution-intent wording and a runnable fenced shell block, and tool-backed turns bypass it.

AI-assisted: yes.

Made with Cursor

Changed files

src/agents/pi-embedded-subscribe.handlers.messages.ts (modified, +36/-5)
src/agents/pi-embedded-subscribe.handlers.tools.test.ts (modified, +4/-0)
src/agents/pi-embedded-subscribe.handlers.tools.ts (modified, +7/-0)
src/agents/pi-embedded-subscribe.handlers.types.ts (modified, +11/-0)
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts (modified, +182/-0)
src/agents/pi-embedded-subscribe.ts (modified, +64/-1)
src/agents/pi-embedded-utils.test.ts (modified, +30/-0)
src/agents/pi-embedded-utils.ts (modified, +26/-0)
src/agents/pi-tool-handler-state.test-helpers.ts (modified, +4/-0)

RAW_BUFFERClick to expand / collapse

Summary

This looks related to #40631 and overlaps with the broader “action illusion” concern in #47213, but this report adds a Telegram-specific reproduction with concrete session evidence.

Environment

OpenClaw: 2026.4.2 (d74a122)
Channel: Telegram direct chat
Session key: agent:main:telegram:direct:6840724503
Session id: 51bab8c2-9264-41e2-8cae-ff04fee105fe
Model: kimi/k2p5
Provider: kimi-coding
Host timezone for the observed session: Asia/Shanghai
Observed on: 2026-04-05 (session messages were recorded on 2026-04-04 UTC / 2026-04-05 Beijing time)

Reproduction

Real session log: /Users/openclaw/.openclaw/agents/main/sessions/51bab8c2-9264-41e2-8cae-ff04fee105fe.jsonl

Timeline from the JSONL

The user asked the Telegram-session agent to install a CLI via: npx -y @tencent-weixin/openclaw-weixin-cli@latest install
The assistant replied with natural-language confirmation plus a fenced bash block containing: exec npx -y @tencent-weixin/openclaw-weixin-cli@latest install
Later, when asked whether installation had completed, the assistant again replied with a “let me check” style message and another fenced bash block containing a command.
The assistant even acknowledged that it had previously only printed the command rather than actually running it.
Across the entire session segment, there were still:
- no assistant_toolcalls
- no toolCall
- no toolresults
- no toolResult
- no execution artifact indicating that exec was actually invoked

Concrete evidence

From the session log:

Message id a7303949: assistant says it will install and prints an exec ... install block
Message id 7a10c165: assistant says it had only printed the command before, then prints another exec ... block to “really check”

Local inspection of the session file showed:

assistant_toolcalls = []
toolresults = []

So this was not a hidden successful execution with missing user feedback. It was a zero-action turn sequence that looked like execution from the user side.

Expected behavior

One of the following should happen:

If the model intends to run a command, the runtime must produce a real tool call and corresponding tool lifecycle.
If execution cannot start, the assistant should fail fast with a blocked/error reply instead of printing an exec-looking code block.
A user-visible reply that claims to be “doing” or “checking” should not be accepted as a successful operational turn unless an action event actually exists.

Actual behavior

Telegram chat receives a confident operational reply.
The reply contains an exec-looking command block.
The runtime produces no actual tool call.
The session later remains effectively idle while the user believes work may be in progress.

Why this is especially problematic

This is worse than a vague acknowledgement because the message format strongly implies that execution has already been delegated to the tool layer.

In practice it creates:

false execution confidence
hidden idle time
extra user polling
session trust loss
ambiguity about whether the problem is transport, model, approval, or runtime orchestration

Suggested fix direction

Add an output-validation gate for execution-intent replies in chat channels, especially Telegram direct/main sessions.
If assistant text contains execution-intent patterns like “I’ll install/check/run” or fenced exec ... blocks, require one of:
- an actual tool call in the same turn, or
- a structured blocked/error state explaining why execution could not start
Do not deliver exec-looking placeholder text to the user if there is no backing tool event.
Emit explicit telemetry for “execution-intent text without tool call” so this can be detected automatically.

Notes

I can provide sanitized excerpts from the JSONL if needed, but the key point is straightforward: in this Telegram direct session, the runtime accepted exec-looking assistant text as a completed reply even though no real tool execution occurred.

extent analysis

TL;DR

Implement an output-validation gate for execution-intent replies in Telegram direct/main sessions to ensure actual tool calls or structured error states.

Guidance

Verify the session log for assistant_toolcalls and toolresults to confirm the absence of real tool execution.
Check the assistant's reply for execution-intent patterns like "I'll install/check/run" or fenced exec ... blocks.
Require an actual tool call or a structured blocked/error state explaining why execution could not start for execution-intent replies.
Emit explicit telemetry for "execution-intent text without tool call" to detect this issue automatically.

Example

A potential code snippet to validate execution-intent replies could involve checking for specific patterns in the assistant's text and verifying the presence of tool calls:

def validate_execution_intent(reply_text, tool_calls):
    execution_intent_patterns = ["I'll install", "I'll check", "exec ..."]
    if any(pattern in reply_text for pattern in execution_intent_patterns):
        if not tool_calls:
            # Emit telemetry for "execution-intent text without tool call"
            # Return an error or blocked state
            pass
    return True

Notes

The provided guidance and example are based on the information given in the issue and may require adjustments based on the actual implementation details of the OpenClaw system.

Recommendation

Apply a workaround by implementing the suggested output-validation gate for execution-intent replies in Telegram direct/main sessions to prevent false execution confidence and hidden idle time. This will help ensure that the assistant's replies accurately reflect the actual execution of tools.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

One of the following should happen:

If the model intends to run a command, the runtime must produce a real tool call and corresponding tool lifecycle.
If execution cannot start, the assistant should fail fast with a blocked/error reply instead of printing an exec-looking code block.
A user-visible reply that claims to be “doing” or “checking” should not be accepted as a successful operational turn unless an action event actually exists.

#installation #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Telegram direct session can emit exec-looking replies with zero tool calls [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Why this is especially problematic

PR fix notes

PR #61132: fix(agents): block unbacked exec-looking replies

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Summary

Environment

Reproduction

Timeline from the JSONL

Concrete evidence

Expected behavior

Actual behavior

Why this is especially problematic

Suggested fix direction

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING