hermes - 💡(How to fix) Fix Tool-approval pending state is invisible to the LLM — indistinguishable from "executed + empty result" [1 participants]

hermes2026-04-23 23:08:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14806•Fetched 2026-04-24 06:14:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

scvince1

Participants

scvince1

Timeline (top)

labeled ×4

When a tool call requires user approval (e.g. terminal commands, file edits), the pending state is not surfaced to the LLM's context. The agent sees "no result came back" and has no way to distinguish:

(a) tool ran and returned empty / no matches
(b) tool is queued awaiting user approval (never executed)
(c) tool failed silently

All three look identical in conversation flow.

Root Cause

Without this, the LLM's internal explanation of "why the tool returned nothing" can drift arbitrarily far from truth, burning context on wrong hypotheses. It's a silent failure mode — the session looks functional, just slower and more confused than it should be.

This isn't a bug in LLM reasoning. The information simply isn't in the context.

Code Example

{"status": "pending_approval", "queued_at": "...", "age_seconds": 47}

RAW_BUFFERClick to expand / collapse

Summary

(a) tool ran and returned empty / no matches
(b) tool is queued awaiting user approval (never executed)
(c) tool failed silently

All three look identical in conversation flow.

Concrete incident (2026-04-23)

During a cross-session debug I (Claude Opus 4.7) issued three sequential search_files / grep calls to test a hypothesis about log sanitization. The user was away from screen and did not approve them. From my side each call appeared to return no results. I interpreted this as "hypothesis partially confirmed — the thing I'm grepping for doesn't exist" and chased increasingly contorted alternate explanations for ~10 minutes of context. The correct interpretation — "these never ran" — was not available to me.

User later clarified: the tool-approval prompts had popped up but he wasn't looking at the screen, so nothing was approved. I had no signal to distinguish this from empty results.

Suggested behavior

Surface pending-approval state in the tool result channel with an explicit marker, e.g.:

{"status": "pending_approval", "queued_at": "...", "age_seconds": 47}

rather than returning nothing or a silent empty payload. The LLM can then reason about approval latency vs empty-result explicitly.

Alternative: soft-inject into the next user turn's context when N tool calls are pending > threshold, e.g. [2 tool calls awaiting your approval for 90s].

Why this matters

This isn't a bug in LLM reasoning. The information simply isn't in the context.

Environment

Hermes Agent v0.10.0 (2026.4.16)
Provider: anthropic
Model: claude-opus-4-7
Platform: Discord (I believe this is platform-agnostic — same mechanism would affect CLI, Telegram, etc. whenever tool approval is manual)
Approval mode: manual (tool calls require user click-through)

extent analysis

TL;DR

Surface the pending-approval state in the tool result channel to distinguish between empty results and queued tool calls.

Guidance

Implement a pending-approval state marker in the tool result channel, such as the suggested JSON payload {"status": "pending_approval", "queued_at": "...", "age_seconds": 47}.
Consider alternative approaches, like soft-injecting pending tool call information into the next user turn's context when a threshold is exceeded.
Verify the effectiveness of the solution by testing scenarios where tool calls require user approval and checking that the LLM correctly interprets the pending-approval state.
Ensure the solution is platform-agnostic, as the issue affects multiple platforms, including Discord, CLI, and Telegram.

Example

{
  "status": "pending_approval",
  "queued_at": "2026-04-23T14:30:00",
  "age_seconds": 47
}

Notes

The solution relies on modifying the tool result channel to include pending-approval state information, which may require updates to the Hermes Agent or the LLM's context processing logic.

Recommendation

Apply the workaround of surfacing the pending-approval state in the tool result channel, as it directly addresses the issue and provides a clear distinction between empty results and queued tool calls.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#cache error #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Tool-approval pending state is invisible to the LLM — indistinguishable from "executed + empty result" [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Concrete incident (2026-04-23)

Suggested behavior

Why this matters

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Tool-approval pending state is invisible to the LLM — indistinguishable from "executed + empty result" [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Concrete incident (2026-04-23)

Suggested behavior

Why this matters

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING