openclaw - 💡(How to fix) Fix Silent text-reply drops with visibleReplies="message_tool" — improve observability [2 comments, 3 participants]

openclaw2026-05-06 10:04:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#78405•Fetched 2026-05-07 03:37:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×2cross-referenced ×1subscribed ×1

This makes it look exactly like a delivery infrastructure bug (websocket dropped, rate limit, payload malformed, etc.) and sent us down a 5+ hour debug rabbit hole on what is, in retrospect, a working-as-designed config gate.

Error Message

When messages.groupChat.visibleReplies = "message_tool" is set (a sensible config for tool-driven workflows), any agent that emits a normal text content reply on a guild/group chat is silently dropped at dispatch time. No log line, no error reaction, no entry in delivery-queue/. Only the typing indicator briefly flashes. The session ends with status: success, the agent's reply is in the session jsonl, but the user receives nothing. 4. Nothing arrives on Discord. No file in ~/.openclaw/delivery-queue/. No error in gateway.log or gateway.err.log. And on top, dispatch-C8IbdPmU.js:~898 has if (suppressDelivery) return; (the message_tool_only gate). Each of these is a silent early-return. There's no central dropPayload(reason, sessionKey, payload) helper that emits a structured log. Adding one — even at WARN once-per-session-per-reason — would make the entire class of "I configured the gate, why is it silent?" bugs trivially diagnosable. 5. Typing indicator lifecycle is decoupled from delivery success. agent-runner.runtime (~line 1516) arms typing on dispatch start, and runPreparedChannelTurnCore clears it at the end regardless of whether dispatchResult.queuedFinal is true. So the user sees "writing…" then nothing, which strongly implies "agent crashed during generation" rather than "agent succeeded but reply was suppressed". Tying the typing-cleared signal to "actually queued a final" (or surfacing the difference via the error reaction in 9e97cdb2) would change the UX from "broken" to "configured-out".

Log a single WARN when a text payload is suppressed because visibleReplies = "message_tool" and the agent didn't emit a message tool call. Concretely, in the source equivalent of dist/dispatch-C8IbdPmU.js:~898: logOnce(sessionKey, "reply.suppressed", "warn", { logOnce(sessionKey, key, ...) would dedupe per-session-per-reason to avoid spam. The same pattern applied at the four isReasoning drop sites (with reason: "isReasoning") would also resolve the broader observability gap. Even a single console.warn without dedup would be a 100x improvement over the current silent return.
Discord error reaction for dropped finals — 9e97cdb2 fix(discord): fail dropped final reply delivery (on main, not yet released as of 2026.5.5) is exactly this. Strong +1 to ship it.

Root Cause

Drop logic is duplicated at 4+ sites with no shared "this payload was suppressed because X" channel. For isReasoning alone we found:
- src/auto-reply/reply/dispatch-from-config.ts:~1390 — if (payload.isReasoning === true) return; after markProgress() (so the turn looks alive externally).
- src/auto-reply/reply/agent-runner.ts:~806, 815 — .filter(p => !p.isReasoning).
- extensions/discord/src/monitor/message-handler.process.ts:~431 — if (payload.isReasoning) return; in the Discord deliver callback, no log.
- src/auto-reply/reply/agent-runner-execution.ts:~2221 — finalization filter (p) => !p.isError && !p.isReasoning && hasOutboundReplyContent(p).

Fix Action

Fix / Workaround

~6h of live debugging on a working day, with the user (a busy CIO with limited windows) reposting test messages every few minutes and reporting "writing… still nothing" 15+ times in a row.
2 unnecessary gateway restarts with public service-interruption notices on Discord.
2 npm upgrades chasing the wrong fix (2026.5.3-1 → 2026.5.4, then briefly investigating 2026.5.5).
Touched OpenRouter billing to bump a key limit chasing a false-positive 429 on the compaction model — that turned out to be a real but unrelated problem that piled on confusion.
Five separate hypotheses bisected and discarded before finding the actual gate (see "Why it was hard to diagnose" below). Each one looked plausible, each one cost an hour-ish of repro + log inspection.
One spawned coding agent run + one repository clone + grep through minified dist/ to finally locate dispatch-C8IbdPmU.js:898's if (suppressDelivery) return;. Without source access this would have been unfixable.

Workaround: messages.groupChat.visibleReplies = "automatic". Done — replies flow normally.

Code Example

{
  "messages": { "groupChat": { "visibleReplies": "message_tool" } },
  "channels": {
    "discord": {
      "guilds": { "<guild>": { "channels": { "<channel>": { "requireMention": false } } } }
    }
  },
  "bindings": [
    { "agentId": "maitre-de-jeu",
      "match": { "channel": "discord", "peer": { "kind": "channel", "id": "<channel>" } } }
  ],
  "agents": {
    "list": [{ "id": "maitre-de-jeu", "model": "ollama/gemma4:26b-nvfp4", ... }]
  }
}

---

// Before:
   if (suppressDelivery) return;

   // Suggested:
   if (suppressDelivery) {
     logOnce(sessionKey, "reply.suppressed", "warn", {
       agentId,
       chatType,
       reason: "visibleReplies=message_tool but no message tool call emitted",
       payloadKind: payload.type,
       hint: 'Set messages.groupChat.visibleReplies="automatic" or instruct agent to use the `message` tool.',
     });
     return;
   }

RAW_BUFFERClick to expand / collapse

Silent text-reply drops with `messages.groupChat.visibleReplies = "message_tool"` — improve observability

Summary

The pain (please read this before triage)

I want to be transparent about what this cost so the priority of the suggested fixes is clear:

~6h of live debugging on a working day, with the user (a busy CIO with limited windows) reposting test messages every few minutes and reporting "writing… still nothing" 15+ times in a row.
2 unnecessary gateway restarts with public service-interruption notices on Discord.
2 npm upgrades chasing the wrong fix (2026.5.3-1 → 2026.5.4, then briefly investigating 2026.5.5).
Touched OpenRouter billing to bump a key limit chasing a false-positive 429 on the compaction model — that turned out to be a real but unrelated problem that piled on confusion.
Five separate hypotheses bisected and discarded before finding the actual gate (see "Why it was hard to diagnose" below). Each one looked plausible, each one cost an hour-ish of repro + log inspection.
One spawned coding agent run + one repository clone + grep through minified dist/ to finally locate dispatch-C8IbdPmU.js:898's if (suppressDelivery) return;. Without source access this would have been unfixable.

The frustrating part is that the user had configured the system correctly per the docs. The config knob visibleReplies = "message_tool" is documented, but its silent-drop semantics for text-only replies on group chats aren't called out, and the runtime gives zero hint when it fires. The fix on the user side is a one-line config change. The fix on the OpenClaw side could be one log line. That asymmetry is what hurts.

Repro

OpenClaw 2026.5.4 / 2026.5.5 (probably older too).

openclaw.json (relevant):

{
  "messages": { "groupChat": { "visibleReplies": "message_tool" } },
  "channels": {
    "discord": {
      "guilds": { "<guild>": { "channels": { "<channel>": { "requireMention": false } } } }
    }
  },
  "bindings": [
    { "agentId": "maitre-de-jeu",
      "match": { "channel": "discord", "peer": { "kind": "channel", "id": "<channel>" } } }
  ],
  "agents": {
    "list": [{ "id": "maitre-de-jeu", "model": "ollama/gemma4:26b-nvfp4", ... }]
  }
}

The MJ agent is a freeform Q&A assistant — it returns plain text content, never calls the message tool itself.

User posts a question on #jdr.
Discord typing indicator appears briefly.
Trajectory shows: session.started → prompt.submitted → model.completed (stopReason: stop, content: [{type: text, len: 285}]) → trace.artifacts (finalStatus: success) → session.ended (status: success).
Nothing arrives on Discord. No file in ~/.openclaw/delivery-queue/. No error in gateway.log or gateway.err.log.

Workaround: messages.groupChat.visibleReplies = "automatic". Done — replies flow normally.

Why it was hard to diagnose

I bisected through (in order):

Discord WebSocket inbound dead → upgraded 2026.5.3-1 → 2026.5.4 (fixed). Red herring (real issue, but not the delivery one).
OpenRouter compaction 429 → bumped key limit (real but unrelated; only triggered for sessions > seuil).
isReasoning=true filter at 4 sites in src/auto-reply/ and extensions/discord/ — added thinkingDefault: "off" → didn't fix it.
Compared MJ vs ops (which works): both have model string + thinking off. Diff was tools.profile, params.maxTokens, tools.fs.workspaceOnly — none of those gate delivery, red herrings.
Per-agent messages.groupChat.visibleReplies override → schema rejects (Unrecognized key: "messages" in agents.list[].messages).
Finally found dispatch-C8IbdPmU.js:898 → if (suppressDelivery) return; and source-reply-delivery-mode-BtZkiZoZ.js:8 where chatType === "group" || chatType === "channel" flips mode to "message_tool_only" unless automatic. Then suppressAutomaticSourceDelivery = sourceReplyDeliveryMode === "message_tool_only".

The reason ops, bourse, second-brain work in the same config: their workflows are scripted/Lobster pipelines that explicitly call the message tool. MJ uses freeform LLM completion → silently dropped.

Code review observations (from grepping `dist/` and reasoning about flow)

While bisecting we read enough of the bundled code to spot a few patterns that, taken together, explain why this class of bug stays invisible. None of these are critical on their own, but the compounding effect is what produced the 6h debug. Sharing in case useful for refactor priorities:

Drop logic is duplicated at 4+ sites with no shared "this payload was suppressed because X" channel. For isReasoning alone we found:
- src/auto-reply/reply/dispatch-from-config.ts:~1390 — if (payload.isReasoning === true) return; after markProgress() (so the turn looks alive externally).
- src/auto-reply/reply/agent-runner.ts:~806, 815 — .filter(p => !p.isReasoning).
- extensions/discord/src/monitor/message-handler.process.ts:~431 — if (payload.isReasoning) return; in the Discord deliver callback, no log.
- src/auto-reply/reply/agent-runner-execution.ts:~2221 — finalization filter (p) => !p.isError && !p.isReasoning && hasOutboundReplyContent(p).
And on top, dispatch-C8IbdPmU.js:~898 has if (suppressDelivery) return; (the message_tool_only gate). Each of these is a silent early-return. There's no central dropPayload(reason, sessionKey, payload) helper that emits a structured log. Adding one — even at WARN once-per-session-per-reason — would make the entire class of "I configured the gate, why is it silent?" bugs trivially diagnosable.
observeOnly admission appears to be dead code for Discord. src/channels/turn/types.ts declares the four admission kinds (dispatch | observeOnly | handled | drop) and the docs at docs/plugins/sdk-channel-turn.md describe observeOnly as a generic mechanism. But searching extensions/discord/src/monitor/message-handler.process.ts (the resolveTurn callback at ~line 623), nothing ever returns admission: observeOnly. We initially thought MJ was being silently moved to observeOnly — would have been a great signal if that path actually existed. If it's WhatsApp-broadcast-only by design, the doc should say so explicitly to save future debuggers.
messages.statusReactions.enabled and messages.ackReaction interact non-obviously. dist/prepare-BS4Cqtja.js:~1019-1034 shows that when both are enabled, statusReactionsWillHandle = true skips the standalone ack write, and the ackReaction value is reused as initialEmoji for the queued slot. We figured this out by reading dist/channel-feedback-B3QykyPj.js:5-16 (the DEFAULT_EMOJIS table) plus pipeline.runtime-BhQgUds4.js:636. The interaction isn't documented; users who set both expect either two reactions or independent control. Worth a doc table or a runtime log line "ackReaction is being delegated to statusReactions.queued".
Schema strictness on agents.list[].messages. Top-level messages.groupChat.visibleReplies exists but the per-agent override is rejected (Unrecognized key: "messages"). This forced us to flip the global default automatic for a single freeform agent, when keeping message_tool global and exempting MJ would have been cleaner. Allowing a small subset of messages.* overrides per-agent (at minimum groupChat.visibleReplies, ackReaction, statusReactions) seems both useful and low-risk given the rest of the config is already deeply per-agent customizable.
Typing indicator lifecycle is decoupled from delivery success. agent-runner.runtime (~line 1516) arms typing on dispatch start, and runPreparedChannelTurnCore clears it at the end regardless of whether dispatchResult.queuedFinal is true. So the user sees "writing…" then nothing, which strongly implies "agent crashed during generation" rather than "agent succeeded but reply was suppressed". Tying the typing-cleared signal to "actually queued a final" (or surfacing the difference via the error reaction in 9e97cdb2) would change the UX from "broken" to "configured-out".

None of these are bugs in isolation — but together they form a "many silent gates, no single observability surface" pattern. A single log channel + a doctor heuristic for the most common signature (status=success / 0 visible final) would close 80% of the gap.

Suggested improvements (any of these would have saved the time)

Log a single WARN when a text payload is suppressed because visibleReplies = "message_tool" and the agent didn't emit a message tool call. Concretely, in the source equivalent of dist/dispatch-C8IbdPmU.js:~898:
```
// Before:
if (suppressDelivery) return;

// Suggested:
if (suppressDelivery) {
  logOnce(sessionKey, "reply.suppressed", "warn", {
    agentId,
    chatType,
    reason: "visibleReplies=message_tool but no message tool call emitted",
    payloadKind: payload.type,
    hint: 'Set messages.groupChat.visibleReplies="automatic" or instruct agent to use the `message` tool.',
  });
  return;
}
```
logOnce(sessionKey, key, ...) would dedupe per-session-per-reason to avoid spam. The same pattern applied at the four isReasoning drop sites (with reason: "isReasoning") would also resolve the broader observability gap. Even a single console.warn without dedup would be a 100x improvement over the current silent return.
Discord error reaction for dropped finals — 9e97cdb2 fix(discord): fail dropped final reply delivery (on main, not yet released as of 2026.5.5) is exactly this. Strong +1 to ship it.
Doc — /docs/gateway/config-agents.md mentions visibleReplies but doesn't say "in message_tool_only mode, text-only completions are dropped". Add an explicit warning box, e.g.:

⚠️ When visibleReplies = "message_tool" (and chat type is group/channel), agents that don't call the message tool will have their text replies silently suppressed. Use "automatic" if you mix scripted (tool-call) and freeform (text-content) agents on the same gateway.
openclaw doctor heuristic — flag agents whose recent sessions ended with status=success, contained text content, but had 0 visible replies queued. That's the smoking gun signature.
Per-agent messages.groupChat.visibleReplies override — currently agents.list[].messages is rejected by the schema. Allowing per-agent override would let users keep message_tool global (correct for ops/bourse/etc.) while exempting the freeform agents. Cleaner than flipping the global default.

Environment

OpenClaw 2026.5.4 → 2026.5.5
Node 25.9
macOS 26.3.1 (arm64), Mac mini M4 Pro
Ollama 0.23.0 (gemma4:26b-nvfp4)
Discord channel guild + bot, requireMention: false
12 agents total, only the freeform Q&A ones (MJ on Discord, career on WhatsApp earlier today) hit this; cron / Lobster agents were unaffected.

Happy to provide trajectory dumps or further repro details. Thanks for the project!

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#task chaining #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Silent text-reply drops with visibleReplies="message_tool" — improve observability [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Silent text-reply drops with `messages.groupChat.visibleReplies = "message_tool"` — improve observability

Summary

The pain (please read this before triage)

Repro

Why it was hard to diagnose

Code review observations (from grepping `dist/` and reasoning about flow)

Suggested improvements (any of these would have saved the time)

Environment

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Silent text-reply drops with visibleReplies="message_tool" — improve observability [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Silent text-reply drops with messages.groupChat.visibleReplies = "message_tool" — improve observability

Summary

The pain (please read this before triage)

Repro

Why it was hard to diagnose

Code review observations (from grepping dist/ and reasoning about flow)

Suggested improvements (any of these would have saved the time)

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Silent text-reply drops with `messages.groupChat.visibleReplies = "message_tool"` — improve observability

Code review observations (from grepping `dist/` and reasoning about flow)