openclaw - ✅(Solved) Fix Telegram direct session can emit exec-looking replies with zero tool calls [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60955Fetched 2026-04-08 02:45:11
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
cross-referenced ×2commented ×1

In a Telegram direct/main session, OpenClaw can reply with exec-looking text such as “I will install/check this now” and a fenced bash block, but the turn contains zero real tool calls and zero tool results.

This is not just a generic “assistant acknowledged but did nothing” case. In this failure mode, the chat surface makes the reply look operationally real, but the session log shows that nothing actually executed.

This looks related to #40631 and overlaps with the broader “action illusion” concern in #47213, but this report adds a Telegram-specific reproduction with concrete session evidence.

Error Message

  1. If execution cannot start, the assistant should fail fast with a blocked/error reply instead of printing an exec-looking code block.
  • a structured blocked/error state explaining why execution could not start

Root Cause

Why this is especially problematic

This is worse than a vague acknowledgement because the message format strongly implies that execution has already been delegated to the tool layer.

PR fix notes

PR #61132: fix(agents): block unbacked exec-looking replies

Description (problem / solution / changelog)

Summary

  • Problem: assistant turns could emit first-person execution-intent text plus runnable fenced shell blocks even when no real tool call or tool result existed.
  • Why it matters: Telegram/direct replies could look operationally real, which creates false execution confidence and hidden idle time.
  • What changed: added a narrow guard in the embedded assistant reply pipeline that rewrites strong unbacked execution-intent shell replies to a truthful fallback unless the turn already has real tool activity, and added regression coverage for blocked and tool-backed cases.
  • What did NOT change (scope boundary): this does not change legitimate tool-backed exec/approval replies, Telegram formatting, or general command/example rendering.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #60955
  • Related #40631
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the assistant visible-text path stripped downgraded/internal tool markers, but it still treated ordinary first-person shell fences as safe user-facing text even when the turn had no tool lifecycle at all.
  • Missing detection / guardrail: there was no guard tying execution-intent shell text to actual tool activity before the reply was emitted to chat surfaces.
  • Contributing context (if known): Telegram renders fenced shell blocks as code blocks, which made the no-tool reply look especially operationally real.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts
    • src/agents/pi-embedded-utils.test.ts
  • Scenario the test should lock in: block a no-tool assistant turn that says it will run/install/check something and includes a runnable fenced shell block, while preserving similar text once real tool activity exists.
  • Why this is the smallest reliable guardrail: the bug lives in the embedded assistant reply seam that decides what reaches user-facing chat delivery, so a seam-level test exercises the exact emission path without requiring a full Telegram e2e harness.
  • Existing test that already covers this (if any): none.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

  • Assistant turns that claim they are running a shell command and include a runnable fenced shell block without any real tool activity now return a truthful fallback instead of fake execution-looking text.
  • Legitimate tool-backed exec/approval flows continue to render normally.

Diagram (if applicable)

Before:
[user asks to run something] -> [assistant emits "I'll run/check this" + fenced shell block] -> [reply delivered with no tool activity]

After:
[user asks to run something] -> [assistant emits execution-intent shell text without tool activity] -> [guard rewrites reply] -> [truthful fallback delivered]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS 15 / Darwin 25
  • Runtime/container: local Node 22 + pnpm workspace
  • Model/provider: simulated embedded assistant session
  • Integration/channel (if any): assistant reply pipeline with Telegram-style shell-fence rendering behavior reproduced locally
  • Relevant config (redacted): default local test config

Steps

  1. Simulate an assistant turn with text like I will install/check this now. followed by a runnable fenced bash block.
  2. Do not emit any tool_execution_start or tool_execution_end events for that turn.
  3. Observe the emitted assistant reply payload.

Expected

  • No execution-looking reply should be delivered as if work started successfully without a real tool event.

Actual

  • Before this fix, the reply pipeline emitted the execution-intent text and fenced shell block as user-visible output even though no tool activity existed.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • Reproduced the bad no-tool assistant turn locally and confirmed the current code emitted the fake execution-looking reply before the fix.
    • Verified the new fallback path with focused tests and local reproduction against the patched reply pipeline.
    • Verified a similar reply still passes when the turn has real tool activity.
  • Edge cases checked:
    • tool-backed replies are not blocked
    • fenced shell examples without first-person execution-intent wording are not matched by the detector
    • both streaming/message-end and final assistant text paths are covered by the guard
  • What you did not verify:
    • a live Telegram roundtrip in a real chat session
    • the full pnpm test suite did not complete cleanly in this branch because it failed in untouched ACP coverage at src/acp/control-plane/manager.test.ts (tracks parented direct ACP turns in the task registry)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: the guard could over-match benign instructional shell examples.
    • Mitigation: detection is intentionally narrow and requires both first-person execution-intent wording and a runnable fenced shell block, and tool-backed turns bypass it.

AI-assisted: yes.

Made with Cursor

Changed files

  • src/agents/pi-embedded-subscribe.handlers.messages.ts (modified, +36/-5)
  • src/agents/pi-embedded-subscribe.handlers.tools.test.ts (modified, +4/-0)
  • src/agents/pi-embedded-subscribe.handlers.tools.ts (modified, +7/-0)
  • src/agents/pi-embedded-subscribe.handlers.types.ts (modified, +11/-0)
  • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts (modified, +182/-0)
  • src/agents/pi-embedded-subscribe.ts (modified, +64/-1)
  • src/agents/pi-embedded-utils.test.ts (modified, +30/-0)
  • src/agents/pi-embedded-utils.ts (modified, +26/-0)
  • src/agents/pi-tool-handler-state.test-helpers.ts (modified, +4/-0)
RAW_BUFFERClick to expand / collapse

Summary

In a Telegram direct/main session, OpenClaw can reply with exec-looking text such as “I will install/check this now” and a fenced bash block, but the turn contains zero real tool calls and zero tool results.

This is not just a generic “assistant acknowledged but did nothing” case. In this failure mode, the chat surface makes the reply look operationally real, but the session log shows that nothing actually executed.

This looks related to #40631 and overlaps with the broader “action illusion” concern in #47213, but this report adds a Telegram-specific reproduction with concrete session evidence.

Environment

  • OpenClaw: 2026.4.2 (d74a122)
  • Channel: Telegram direct chat
  • Session key: agent:main:telegram:direct:6840724503
  • Session id: 51bab8c2-9264-41e2-8cae-ff04fee105fe
  • Model: kimi/k2p5
  • Provider: kimi-coding
  • Host timezone for the observed session: Asia/Shanghai
  • Observed on: 2026-04-05 (session messages were recorded on 2026-04-04 UTC / 2026-04-05 Beijing time)

Reproduction

Real session log: /Users/openclaw/.openclaw/agents/main/sessions/51bab8c2-9264-41e2-8cae-ff04fee105fe.jsonl

Timeline from the JSONL

  1. The user asked the Telegram-session agent to install a CLI via: npx -y @tencent-weixin/openclaw-weixin-cli@latest install
  2. The assistant replied with natural-language confirmation plus a fenced bash block containing: exec npx -y @tencent-weixin/openclaw-weixin-cli@latest install
  3. Later, when asked whether installation had completed, the assistant again replied with a “let me check” style message and another fenced bash block containing a command.
  4. The assistant even acknowledged that it had previously only printed the command rather than actually running it.
  5. Across the entire session segment, there were still:
    • no assistant_toolcalls
    • no toolCall
    • no toolresults
    • no toolResult
    • no execution artifact indicating that exec was actually invoked

Concrete evidence

From the session log:

  • Message id a7303949: assistant says it will install and prints an exec ... install block
  • Message id 7a10c165: assistant says it had only printed the command before, then prints another exec ... block to “really check”

Local inspection of the session file showed:

  • assistant_toolcalls = []
  • toolresults = []

So this was not a hidden successful execution with missing user feedback. It was a zero-action turn sequence that looked like execution from the user side.

Expected behavior

One of the following should happen:

  1. If the model intends to run a command, the runtime must produce a real tool call and corresponding tool lifecycle.
  2. If execution cannot start, the assistant should fail fast with a blocked/error reply instead of printing an exec-looking code block.
  3. A user-visible reply that claims to be “doing” or “checking” should not be accepted as a successful operational turn unless an action event actually exists.

Actual behavior

  • Telegram chat receives a confident operational reply.
  • The reply contains an exec-looking command block.
  • The runtime produces no actual tool call.
  • The session later remains effectively idle while the user believes work may be in progress.

Why this is especially problematic

This is worse than a vague acknowledgement because the message format strongly implies that execution has already been delegated to the tool layer.

In practice it creates:

  • false execution confidence
  • hidden idle time
  • extra user polling
  • session trust loss
  • ambiguity about whether the problem is transport, model, approval, or runtime orchestration

Suggested fix direction

  • Add an output-validation gate for execution-intent replies in chat channels, especially Telegram direct/main sessions.
  • If assistant text contains execution-intent patterns like “I’ll install/check/run” or fenced exec ... blocks, require one of:
    • an actual tool call in the same turn, or
    • a structured blocked/error state explaining why execution could not start
  • Do not deliver exec-looking placeholder text to the user if there is no backing tool event.
  • Emit explicit telemetry for “execution-intent text without tool call” so this can be detected automatically.

Notes

I can provide sanitized excerpts from the JSONL if needed, but the key point is straightforward: in this Telegram direct session, the runtime accepted exec-looking assistant text as a completed reply even though no real tool execution occurred.

extent analysis

TL;DR

Implement an output-validation gate for execution-intent replies in Telegram direct/main sessions to ensure actual tool calls or structured error states.

Guidance

  • Verify the session log for assistant_toolcalls and toolresults to confirm the absence of real tool execution.
  • Check the assistant's reply for execution-intent patterns like "I'll install/check/run" or fenced exec ... blocks.
  • Require an actual tool call or a structured blocked/error state explaining why execution could not start for execution-intent replies.
  • Emit explicit telemetry for "execution-intent text without tool call" to detect this issue automatically.

Example

A potential code snippet to validate execution-intent replies could involve checking for specific patterns in the assistant's text and verifying the presence of tool calls:

def validate_execution_intent(reply_text, tool_calls):
    execution_intent_patterns = ["I'll install", "I'll check", "exec ..."]
    if any(pattern in reply_text for pattern in execution_intent_patterns):
        if not tool_calls:
            # Emit telemetry for "execution-intent text without tool call"
            # Return an error or blocked state
            pass
    return True

Notes

The provided guidance and example are based on the information given in the issue and may require adjustments based on the actual implementation details of the OpenClaw system.

Recommendation

Apply a workaround by implementing the suggested output-validation gate for execution-intent replies in Telegram direct/main sessions to prevent false execution confidence and hidden idle time. This will help ensure that the assistant's replies accurately reflect the actual execution of tools.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

One of the following should happen:

  1. If the model intends to run a command, the runtime must produce a real tool call and corresponding tool lifecycle.
  2. If execution cannot start, the assistant should fail fast with a blocked/error reply instead of printing an exec-looking code block.
  3. A user-visible reply that claims to be “doing” or “checking” should not be accepted as a successful operational turn unless an action event actually exists.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Telegram direct session can emit exec-looking replies with zero tool calls [1 pull requests, 1 comments, 2 participants]