Either: 1. `enqueueSystemEvent` deduplicates queued exec approval events by `(agentId, contextKey)` or `(runId, contextKey)`, coalescing retries into the already-pending prompt; or 2. When a run fails over, any exec approval events it queued are cancelled before the replacement run is allowed to enqueue new ones. Today, neither happens. [bug-30-log-excerpt-clean.txt](https://github.com/user-attachments/files/26913097/bug-30-log-excerpt-clean.txt)

openclaw - ✅(Solved) Fix [Bug]: enqueueSystemEvent not deduplicated by runId/contextKey — agents cascade duplicate exec approval prompts under new IDs, locking ecosystem [1 pull requests]

StepCodex · 2026-04-20T22:13:23Z

[openclaw] Under load, enqueueSystemEvent does not deduplicate queued exec approval requests by runId or contextKey . When a heartbeat run times out and the ga… Under load, `enqueueSystemEvent` does not deduplicate queued exec approval requests by `runId` or `contextKey`. When a heartbeat run times out and the gateway fails over, the replacement attempt re-queues the **same exec call** with a fresh approval ID. Each retry surfaces a new Telegram approval prompt for the identical command, cascading until the operator kills the gateway. Left alone, it saturates the approval channel fast enough to risk system-level memory pressure. Reproduced repeatably on a multi-agent install. Filing now so it can be fixed before users with `directPolicy: "allow"` + high-frequency heartbeats discover it the hard way. # PR #3: fix(approvals): accept allow-always source metadata - Repository: iamlukethedev/openclaw - Author: iamlukethedev - State: open | merged: False - Link: https://github.com/iamlukethedev/openclaw/pull/3 ## Description (problem / solution / changelog) ## Summary - Problem: Telegram/native allow-always approvals already persist allowlist entries with `source: "allow-always"`, but the gateway `exec.approvals.set` schema rejected that field as unexpected. - Why it matters: local approvals files and the gateway push path disagree on the allowlist contract, so gateway sync can fail even when the on-disk approvals file is valid for the runtime. - What changed: `src/gateway/protocol/schema/exec-approvals.ts` now accepts optional `source: "allow-always"` on allowlist entries, matching the existing persisted approvals type. - What did NOT change (scope boundary): no approval decision logic, persistence behavior, or Telegram/native reply handling changed in this PR; it only aligns the gateway protocol schema with existing stored data. - AI-assisted: Yes. I implemented the protocol fix and verified it locally with targeted tests and changed-scope checks. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #69482 - Related #69478 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - Root cause: the durable approvals store type and normalization path already preserved `source: "allow-always"`, but the gateway protocol schema for `exec.approvals.set` and `exec.approvals.node.set` did not allow that field. - Missing detection / guardrail: there was no validator regression test covering a real allowlist payload that included `source: "allow-always"`. - Contributing context (if known): the issue report shows this surfaced through Telegram "Allow always", but the mismatch is fundamentally between persisted approvals data and the gateway wire contract. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [ ] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: `src/gateway/protocol/index.test.ts` - Scenario the test should lock in: `validateExecApprovalsSetParams` and `validateExecApprovalsNodeSetParams` accept allowlist entries that include `source: "allow-always"`. - Why this is the smallest reliable guardrail: the bug is a protocol/schema rejection, so validator tests hit the exact contract boundary that failed without needing a live Telegram or gateway sync flow. - Existing test that already covers this (if any): none. - If no new test is added, why not: N/A. ## User-visible / Behavior Changes - Gateway approval set payloads may now include persisted allow-always source metadata without being rejected as invalid. ## Diagram (if applicable) ```text Before: [allowlist entry with source="allow-always"] -> [exec.approvals.set validator] -> [reject unexpected property] After: [allowlist entry with source="allow-always"] -> [exec.approvals.set validator] -> [accept payload] ``` ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) No - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: Ubuntu Linux in Cursor Cloud - Runtime/container: Node 22 + pnpm workspace - Model/provider: N/A - Integration/channel (if any): Gateway exec approvals protocol validation - Relevant config (redacted): approvals payload with allowlist entry `{ pattern: "/usr/bin/curl", source: "allow-always" }` ### Steps 1. Construct an approvals file payload containing an allowlist entry with `source: "allow-always"`. 2. Validate

openclaw2026-04-20 22:13:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Under load, enqueueSystemEvent does not deduplicate queued exec approval requests by runId or contextKey. When a heartbeat run times out and the gateway fails over, the replacement attempt re-queues the same exec call with a fresh approval ID. Each retry surfaces a new Telegram approval prompt for the identical command, cascading until the operator kills the gateway. Left alone, it saturates the approval channel fast enough to risk system-level memory pressure.

Reproduced repeatably on a multi-agent install. Filing now so it can be fixed before users with directPolicy: "allow" + high-frequency heartbeats discover it the hard way.

Error Message

Continual, unceasing consecutive approval prompts delivered to Telegram seconds apart, identical command, different IDs:

Root Cause

Reproduced repeatably on a multi-agent install. Filing now so it can be fixed before users with directPolicy: "allow" + high-frequency heartbeats discover it the hard way.

Fix Action

Fix / Workaround

Workaround in place

All 11 agent heartbeats set to every: "999h" (circuit breaker)
No agent work resumes on a normal schedule until this is fixed or a dedup workaround exists at the exec-approvals layer

PR fix notes

PR #3: fix(approvals): accept allow-always source metadata

Repository: iamlukethedev/openclaw
Author: iamlukethedev
State: open | merged: False
Link: https://github.com/iamlukethedev/openclaw/pull/3

Description (problem / solution / changelog)

Summary

Problem: Telegram/native allow-always approvals already persist allowlist entries with source: "allow-always", but the gateway exec.approvals.set schema rejected that field as unexpected.
Why it matters: local approvals files and the gateway push path disagree on the allowlist contract, so gateway sync can fail even when the on-disk approvals file is valid for the runtime.
What changed: src/gateway/protocol/schema/exec-approvals.ts now accepts optional source: "allow-always" on allowlist entries, matching the existing persisted approvals type.
What did NOT change (scope boundary): no approval decision logic, persistence behavior, or Telegram/native reply handling changed in this PR; it only aligns the gateway protocol schema with existing stored data.
AI-assisted: Yes. I implemented the protocol fix and verified it locally with targeted tests and changed-scope checks.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #69482
Related #69478
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: the durable approvals store type and normalization path already preserved source: "allow-always", but the gateway protocol schema for exec.approvals.set and exec.approvals.node.set did not allow that field.
Missing detection / guardrail: there was no validator regression test covering a real allowlist payload that included source: "allow-always".
Contributing context (if known): the issue report shows this surfaced through Telegram "Allow always", but the mismatch is fundamentally between persisted approvals data and the gateway wire contract.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/gateway/protocol/index.test.ts
Scenario the test should lock in: validateExecApprovalsSetParams and validateExecApprovalsNodeSetParams accept allowlist entries that include source: "allow-always".
Why this is the smallest reliable guardrail: the bug is a protocol/schema rejection, so validator tests hit the exact contract boundary that failed without needing a live Telegram or gateway sync flow.
Existing test that already covers this (if any): none.
If no new test is added, why not: N/A.

User-visible / Behavior Changes

Gateway approval set payloads may now include persisted allow-always source metadata without being rejected as invalid.

Diagram (if applicable)

Before:
[allowlist entry with source="allow-always"] -> [exec.approvals.set validator] -> [reject unexpected property]

After:
[allowlist entry with source="allow-always"] -> [exec.approvals.set validator] -> [accept payload]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: Ubuntu Linux in Cursor Cloud
Runtime/container: Node 22 + pnpm workspace
Model/provider: N/A
Integration/channel (if any): Gateway exec approvals protocol validation
Relevant config (redacted): approvals payload with allowlist entry { pattern: "/usr/bin/curl", source: "allow-always" }

Steps

Construct an approvals file payload containing an allowlist entry with source: "allow-always".
Validate it through the gateway protocol path for exec.approvals.set or exec.approvals.node.set.
Observe whether the validator accepts or rejects the payload.

Expected

The gateway protocol should accept allowlist entries that match the already-persisted approvals type, including source: "allow-always".

Actual

Before this fix, the schema rejected the payload because source was not part of ExecApprovalsAllowlistEntrySchema.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: ran pnpm test src/gateway/protocol/index.test.ts; ran pnpm check:changed.
Edge cases checked: both gateway set and node set validators accept allow-always source metadata; no unrelated schema fields were broadened.
What you did not verify: live Telegram approval flow or a real openclaw approvals set --gateway round-trip against a running gateway.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps:

Risks and Mitigations

Risk: this only fixes the schema mismatch slice of the issue; if another runtime path strips or mishandles source, that would need follow-up.
- Mitigation: this PR is intentionally scoped to the concrete validator rejection, and the new tests lock the contract in place.

<div><a href="https://cursor.com/agents/bc-a225a134-d59a-46cf-a2c4-14c5aaa6297d"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-web-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-web-light.png"><img alt="Open in Web" width="114" height="28" src="https://cursor.com/assets/images/open-in-web-dark.png"></picture></a> <a href="https://cursor.com/background-agent?bcId=bc-a225a134-d59a-46cf-a2c4-14c5aaa6297d"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-cursor-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-cursor-light.png"><img alt="Open in Cursor" width="131" height="28" src="https://cursor.com/assets/images/open-in-cursor-dark.png"></picture></a> </div>

Changed files

src/gateway/protocol/index.test.ts (modified, +45/-1)
src/gateway/protocol/schema/exec-approvals.ts (modified, +1/-0)

Code Example

ps aux | grep -E "contextstored|vllm|openclaw" | grep -v grep | awk '{print $11}' | sort -n | tail -5

---

## Attached evidence

1. Screenshot of two consecutive approval prompts with different IDs for the same command
2. `bug-30-log-excerpt.txt` — 60 lines of the cascade from the gateway log

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Reproduced repeatably on a multi-agent install. Filing now so it can be fixed before users with directPolicy: "allow" + high-frequency heartbeats discover it the hard way.

Steps to reproduce

What the exec call is

Routine health-check probe issued from Maelcum's heartbeat:

ps aux | grep -E "contextstored|vllm|openclaw" | grep -v grep | awk '{print $11}' | sort -n | tail -5

Hits on-miss under the current allowlist, so an approval prompt is expected on first encounter. The bug is that it fires again, and again, and again, each time under a new approval ID, for the same run intent.

Not a duplicate of

I looked for upstream issues that might cover this and found three that are adjacent but distinct:

#66487 — heartbeat prompt drops completion body (peek-not-consume on a different queue event path, not the approval queue)
#14191 — heartbeat routes to wrong session queue (routing bug, not a dedup bug)
#36325 — deliver:false hooks still inject via enqueueSystemEvent (delivery flag bypass, not retry dedup)

None of these address the approval-event retry path or the (runId, contextKey) dedup gap.

Workaround in place

All 11 agent heartbeats set to every: "999h" (circuit breaker)
No agent work resumes on a normal schedule until this is fixed or a dedup workaround exists at the exec-approvals layer

Related bug (filing separately)

Telegram /approve allow-always writes a source field into the approvals allowlist entry that openclaw approvals set --file then rejects as unexpected on push. Will cross-reference the issue once filed.

Expected behavior

Either:

enqueueSystemEvent deduplicates queued exec approval events by (agentId, contextKey) or (runId, contextKey), coalescing retries into the already-pending prompt; or
When a run fails over, any exec approval events it queued are cancelled before the replacement run is allowed to enqueue new ones.

Today, neither happens.

bug-30-log-excerpt-clean.txt

Actual behavior

Observed behavior

Continual, unceasing consecutive approval prompts delivered to Telegram seconds apart, identical command, different IDs:

befadc79-10bd-4e78-b1a4-9e2f546fd3c5
871d7305-c1cc-412c-9393-d538e99e4ae1
etc.

Screenshot attached below.

Gateway log (/tmp/openclaw/openclaw-2026-04-18.log) shows the cascade signature (excerpt attached):

stuck session: sessionId=maelcum sessionId=<uuid> sessionKey=agent:maelcum:telegram:direct:<user_id> — age ticking up by ~30s per line, crossing 462s before intervention
embedded_run_failover_decision failoverReason=timeout — cycling through the provider chain: vllm-fast → vllm-brain → openrouter/z-ai/glm-5
Heartbeat re-firing and regenerating the run under fresh runIds while the prior attempt is still pending approval

Each failover attempt re-enters enqueueSystemEvent carrying the same exec call, but the event queue has no compound key covering the (runId, contextKey) pair — so the prior queued approval does not cancel or collapse, and a new one is enqueued instead.

OpenClaw version

2026.4.14 (323493f)`

Operating system

macOS 26.4.1

Install method

npm global, latest stable as of filing

Model

mlx-community/Qwen3.5-9B-OptiQ-4bit (local, via rapid-mlx 0.3.12)

Provider / routing chain

openclaw -> vllm-fast (localhost:8001, rapid-mlx 0.3.12) -> Qwen3.5-9B-OptiQ-4bit

Additional provider/model setup details

Environment

Host: macOS, Mac Mini M4 Pro, 48 GB unified memory
Gateway: launchd-supervised, loopback bind, port 18789
Heartbeat: every: "3h", directPolicy: "allow", target: "telegram", lightContext: true
Exec approval policy: defaults.security: "allowlist", ask: "on-miss", askFallback: "deny"; maelcum uses host defaults

Logs, screenshots, and evidence

## Attached evidence

1. Screenshot of two consecutive approval prompts with different IDs for the same command
2. `bug-30-log-excerpt.txt` — 60 lines of the cascade from the gateway log

Impact and severity

Impact

Saturates the approval channel — every cascade cycle produces a new Telegram prompt
Fast enough to outrun manual intervention; forcing a gateway restart (openclaw gateway restart) is the only reliable stop
On installs with many agents sharing a channel, one stuck agent can drown all approval prompts for every other agent
Forced me to set all 11 heartbeats to every: "999h" as a circuit breaker while the bug is unresolved — effectively disabling the ecosystem's scheduled work layer

Additional information

No response

extent analysis

TL;DR

The most likely fix involves modifying the enqueueSystemEvent function to deduplicate queued exec approval events by (runId, contextKey) or (agentId, contextKey).

Guidance

Review the enqueueSystemEvent function to understand how it handles event queuing and deduplication.
Consider implementing a compound key for the event queue that includes (runId, contextKey) to prevent duplicate approval events.
Investigate the on-miss allowlist policy and its interaction with the enqueueSystemEvent function to ensure that it does not contribute to the duplication issue.
Evaluate the feasibility of cancelling pending approval events when a run fails over, as an alternative solution.

Example

// Pseudocode example of a potential fix
function enqueueSystemEvent(event) {
  const key = `${event.runId}:${event.contextKey}`;
  if (queue.has(key)) {
    // Update the existing event instead of adding a new one
    queue.get(key).approvalId = event.approvalId;
  } else {
    queue.set(key, event);
  }
}

Notes

The provided information suggests that the issue is specific to the enqueueSystemEvent function and its handling of event deduplication. However, without access to the full codebase, it is difficult to provide a definitive solution. The suggested fix may require modifications to the event queue data structure and the enqueueSystemEvent function.

Recommendation

Apply a workaround by modifying the enqueueSystemEvent function to deduplicate queued exec approval events, as this is the most direct approach to addressing the issue. This change can help prevent the saturation of the approval channel and mitigate the impact of the bug until a more comprehensive solution can be implemented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Either:

enqueueSystemEvent deduplicates queued exec approval events by (agentId, contextKey) or (runId, contextKey), coalescing retries into the already-pending prompt; or
When a run fails over, any exec approval events it queued are cancelled before the replacement run is allowed to enqueue new ones.

Today, neither happens.

bug-30-log-excerpt-clean.txt

#api #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: enqueueSystemEvent not deduplicated by runId/contextKey — agents cascade duplicate exec approval prompts under new IDs, locking ecosystem [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround in place

PR fix notes

PR #3: fix(approvals): accept allow-always source metadata

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Bug type

Beta release blocker

Summary

Summary

Steps to reproduce

What the exec call is

Not a duplicate of

Workaround in place

Related bug (filing separately)

Expected behavior

Actual behavior

Observed behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Environment

Logs, screenshots, and evidence

Impact and severity

Impact

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING