openclaw - ✅(Solved) Fix ACP gateway bridge: 'queue owner unavailable' for Claude (and intermittently Codex) sessions [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58659Fetched 2026-04-08 01:59:37
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Timeline (top)
closed ×1commented ×1cross-referenced ×1locked ×1

Error Message

acpx ensureSession retaining dead named session with recoverable status: session=agent:claude:acp:<uuid> status=dead summary=queue owner unavailable

⇄ res ✗ agent errorCode=UNAVAILABLE errorMessage=AcpRuntimeError: acpx exited with code 1 code=ACP_TURN_FAILED

Root Cause

Suspected Root Cause

Fix Action

Workaround

Using subagent runtime instead of acp runtime works reliably for all agents, but loses the persistent session / thread-bound conversation capability that ACP provides.

PR fix notes

PR #58669: fix(acpx): repair queue owner session recovery

Description (problem / solution / changelog)

Summary

  • Problem: ACP gateway sessions could keep a named session in status=dead when acpx reported queue owner unavailable, then hand that dead handle to the first prompt.
  • Why it matters: sessions_spawn could still fail on the first turn for Claude and intermittently for other ACP agents even though the runtime logged that the dead session was "recoverable".
  • What changed: the acpx runtime now repairs that dead named session by creating a replacement owner, resuming the backend session when a stable session id is available, and falling back to a fresh named session when it is not.
  • What did NOT change (scope boundary): this does not change unrelated ACP status handling, config defaults, or non-queue-owner dead-session recovery.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #58659
  • Related #56855
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: extensions/acpx/src/runtime.ts treated status=dead with summary~="queue owner unavailable" as a session to keep, so ensureSession() could return a dead handle that then failed on the first prompt.
  • Missing detection / guardrail: the regression tests only asserted that OpenClaw stopped replacing those dead sessions in a loop; they did not assert that the repaired session was usable for the next turn.
  • Prior context (git blame, prior PR, issue, or refactor if known): the queue-owner path was special-cased in 1c95c41c37 to avoid an infinite replace loop.
  • Why this regressed now: the earlier fix avoided the loop but still left the queue-owner recovery path returning the dead named session instead of repairing it.
  • If unknown, what was ruled out: ruled out a simple TTL-only fix because the reported failure still reproduces after increasing queueOwnerTtlSeconds.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/acpx/src/runtime.test.ts
  • Scenario the test should lock in: when status reports dead plus queue owner unavailable, the runtime should repair the named session, resume when a stable session id exists, and fall back to a fresh named session when it does not.
  • Why this is the smallest reliable guardrail: the bug lives entirely in the ACPX runtime control-flow around sessions ensure, status, and sessions new.
  • Existing test that already covers this (if any): the prior dead-session tests covered the queue-owner branch but asserted the wrong behavior for this issue.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • ACP sessions that hit queue owner unavailable during initialization now repair the dead named session instead of returning a dead handle to the first turn.
  • When ACPX exposes a stable session id for that dead session, OpenClaw resumes it to preserve continuity.

Diagram (if applicable)

Before:
[sessions ensure] -> [status=dead queue owner unavailable] -> [retain dead session] -> [first prompt fails]

After:
[sessions ensure] -> [status=dead queue owner unavailable] -> [repair named session owner] -> [first prompt uses repaired session]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS 25.3.0
  • Runtime/container: Node 22 / pnpm workspace
  • Model/provider: ACPX mock runtime in tests; issue repro targets Claude and Codex ACP agents
  • Integration/channel (if any): ACP gateway sessions
  • Relevant config (redacted): acpx.permissionMode=approve-all, acpx.nonInteractivePermissions=deny, acpx.queueOwnerTtlSeconds=30

Steps

  1. Create or ensure an ACP named session.
  2. Return status=dead with summary=queue owner unavailable from status --session.
  3. Continue initialization and attempt the first turn.

Expected

  • The runtime repairs the dead named session before returning the handle.
  • If a stable session id is present, the repair resumes that session.
  • If no stable session id is present, the runtime falls back to a fresh named session.

Actual

  • Before this change, the runtime kept the dead named session and the first prompt could still fail with ACP_TURN_FAILED.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: updated the queue-owner recovery tests, added the no-resumable-id fallback case, ran pnpm test -- extensions/acpx/src/runtime.test.ts -t "queue owner unavailable|ensure fallback|no resumable id", ran pnpm test:extension acpx, and ran pnpm build.
  • Edge cases checked: dead queue-owner status after sessions ensure, dead queue-owner status after ensure failure fallback, and missing ids in the status payload.
  • What you did not verify: a live Claude/Codex gateway repro in this environment, and pnpm check still reports unrelated pre-existing type failures in extensions/diffs/src/language-hints.test.ts and src/plugins/contracts/plugin-sdk-subpaths.test.ts.
  • AI assistance / testing note: prepared with AI assistance, manually reviewed before opening, and validated with the focused ACPX test lanes above.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: some ACPX backends may report queue-owner failures without stable ids, which forces a fresh named session instead of a resume.
    • Mitigation: the runtime only takes that fallback when status provides no resumable id at all, and the new regression test locks in that branch.

Made with Cursor

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/acpx/src/runtime.test.ts (modified, +35/-6)
  • extensions/acpx/src/runtime.ts (modified, +95/-41)
  • extensions/acpx/src/test-utils/runtime-fixtures.ts (modified, +5/-3)

Code Example

"acpx": {
  "enabled": true,
  "config": {
    "permissionMode": "approve-all",
    "timeoutSeconds": 300,
    "queueOwnerTtlSeconds": 30,
    "nonInteractivePermissions": "deny"
  }
}

---

sessions_spawn({ runtime: "acp", agentId: "claude", task: "Hello" })

---

acpx ensureSession retaining dead named session with recoverable status:
  session=agent:claude:acp:<uuid> status=dead summary=queue owner unavailable

⇄ res ✗ agent errorCode=UNAVAILABLE
  errorMessage=AcpRuntimeError: acpx exited with code 1
  code=ACP_TURN_FAILED
RAW_BUFFERClick to expand / collapse

Bug Description

ACP sessions spawned through the OpenClaw gateway (sessions_spawn with runtime: "acp") consistently fail with "queue owner unavailable" for the Claude agent, and intermittently for Codex. The same agents work perfectly when invoked directly via acpx CLI.

Environment

  • OpenClaw version: 2026.3.31 (213a704)
  • acpx version: 0.4.0
  • Node: v22.22.1
  • OS: macOS (arm64, Darwin 25.3.0)
  • Claude ACP package: @agentclientprotocol/[email protected]
  • Codex ACP package: @zed-industries/[email protected]

Steps to Reproduce

  1. Configure ACP plugin in openclaw.json:
"acpx": {
  "enabled": true,
  "config": {
    "permissionMode": "approve-all",
    "timeoutSeconds": 300,
    "queueOwnerTtlSeconds": 30,
    "nonInteractivePermissions": "deny"
  }
}
  1. Spawn a Claude ACP session via gateway:
sessions_spawn({ runtime: "acp", agentId: "claude", task: "Hello" })
  1. Session immediately dies with "queue owner unavailable"

What Works vs What Fails

MethodClaudeCodex
acpx claude exec "test" (direct CLI)✅ Works✅ Works
sessions_spawn via gateway❌ Fails⚠️ Intermittent

Error Logs

Gateway log shows repeated cycle:

acpx ensureSession retaining dead named session with recoverable status:
  session=agent:claude:acp:<uuid> status=dead summary=queue owner unavailable

⇄ res ✗ agent errorCode=UNAVAILABLE
  errorMessage=AcpRuntimeError: acpx exited with code 1
  code=ACP_TURN_FAILED

The session is created, but the queue owner dies before the first prompt is delivered. The gateway then retains the dead session as "recoverable" and tries again, but fails the same way.

Exit codes observed: 1 (Claude), 3 (Codex on one occasion).

Investigation Done

  1. queueOwnerTtlSeconds — Was originally 0.1s (way too low). Bumped to 30s. This did NOT fix the issue for Claude, though it briefly helped Codex.

  2. Deprecated package — Updated Claude from deprecated @zed-industries/[email protected] to @agentclientprotocol/[email protected]. Did NOT fix the gateway bridge issue (direct CLI still works fine with either).

  3. nonInteractivePermissions — Changed from wrong value to "deny" in both ~/.acpx/config.json and openclaw.json. No effect on gateway bridge.

  4. permissionMode: "approve-all" — Confirmed set. No effect.

  5. Gateway restart — Performed multiple restarts. Issue persists across restarts.

Suspected Root Cause

The issue appears to be in the gateway ACP plugin's ensureSessioncreateNamedSession lifecycle. The acpx process is spawned but the queue owner (the bridge between gateway and acpx) loses connection or times out before the first prompt can be delivered. This is a timing/lifecycle issue in the gateway plugin, not in the agent packages themselves.

The fact that acpx <agent> exec works perfectly from CLI confirms the agents and their packages are fine — the problem is specifically in how the gateway manages the acpx session bridge.

Expected Behavior

sessions_spawn with runtime: "acp" should reliably create persistent sessions for all supported agents (claude, codex, pi, etc).

Workaround

Using subagent runtime instead of acp runtime works reliably for all agents, but loses the persistent session / thread-bound conversation capability that ACP provides.

extent analysis

TL;DR

  • Adjust the queueOwnerTtlSeconds configuration to a higher value to potentially mitigate the "queue owner unavailable" issue when spawning ACP sessions through the OpenClaw gateway.

Guidance

  • Review and adjust the queueOwnerTtlSeconds setting in openclaw.json to ensure it's sufficiently high to accommodate the time needed for the queue owner to establish a connection before timing out.
  • Investigate the ensureSession and createNamedSession lifecycle in the gateway ACP plugin to identify potential timing or lifecycle issues that could be causing the queue owner to lose connection.
  • Consider implementing additional logging or debugging in the gateway plugin to gain more insight into the sequence of events leading up to the "queue owner unavailable" error.
  • Test the sessions_spawn functionality with different queueOwnerTtlSeconds values to determine if there's a specific threshold beyond which the issue is resolved.

Example

"acpx": {
  "enabled": true,
  "config": {
    "permissionMode": "approve-all",
    "timeoutSeconds": 300,
    "queueOwnerTtlSeconds": 120, // Increased from 30 to 120 seconds
    "nonInteractivePermissions": "deny"
  }
}

Notes

  • The issue seems to be related to the timing and lifecycle of the gateway ACP plugin, rather than the agent packages themselves.
  • The fact that using subagent runtime works reliably but loses persistent session capabilities suggests that the issue is specific to the ACP session management in the gateway.

Recommendation

  • Apply workaround: Increase queueOwnerTtlSeconds to a higher value (e.g., 120 seconds) to see if it mitigates the issue, as this is a relatively simple configuration change that can be easily reverted if it doesn't work.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix ACP gateway bridge: 'queue owner unavailable' for Claude (and intermittently Codex) sessions [1 pull requests, 2 comments, 2 participants]