openclaw - ✅(Solved) Fix ACP gateway bridge: 'queue owner unavailable' for Claude (and intermittently Codex) sessions [1 pull requests, 2 comments, 2 participants]

aaronkennard1 · 2026-04-01T01:54:53Z

[openclaw] PR 58669: fix acpx : repair queue owner session recovery - Repository: openclaw/openclaw - Author: neeravmakwana - State: closed | merged: True - Li… # PR #58669: fix(acpx): repair queue owner session recovery - Repository: openclaw/openclaw - Author: neeravmakwana - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/58669 ## Description (problem / solution / changelog) ## Summary - Problem: ACP gateway sessions could keep a named session in `status=dead` when acpx reported `queue owner unavailable`, then hand that dead handle to the first prompt. - Why it matters: `sessions_spawn` could still fail on the first turn for Claude and intermittently for other ACP agents even though the runtime logged that the dead session was "recoverable". - What changed: the acpx runtime now repairs that dead named session by creating a replacement owner, resuming the backend session when a stable session id is available, and falling back to a fresh named session when it is not. - What did NOT change (scope boundary): this does not change unrelated ACP status handling, config defaults, or non-queue-owner dead-session recovery. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #58659 - Related #56855 - [x] This PR fixes a bug or regression ## Root Cause / Regression History (if applicable) - Root cause: `extensions/acpx/src/runtime.ts` treated `status=dead` with `summary~="queue owner unavailable"` as a session to keep, so `ensureSession()` could return a dead handle that then failed on the first `prompt`. - Missing detection / guardrail: the regression tests only asserted that OpenClaw stopped replacing those dead sessions in a loop; they did not assert that the repaired session was usable for the next turn. - Prior context (`git blame`, prior PR, issue, or refactor if known): the queue-owner path was special-cased in `1c95c41c37` to avoid an infinite replace loop. - Why this regressed now: the earlier fix avoided the loop but still left the queue-owner recovery path returning the dead named session instead of repairing it. - If unknown, what was ruled out: ruled out a simple TTL-only fix because the reported failure still reproduces after increasing `queueOwnerTtlSeconds`. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [ ] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: `extensions/acpx/src/runtime.test.ts` - Scenario the test should lock in: when status reports `dead` plus `queue owner unavailable`, the runtime should repair the named session, resume when a stable session id exists, and fall back to a fresh named session when it does not. - Why this is the smallest reliable guardrail: the bug lives entirely in the ACPX runtime control-flow around `sessions ensure`, `status`, and `sessions new`. - Existing test that already covers this (if any): the prior dead-session tests covered the queue-owner branch but asserted the wrong behavior for this issue. - If no new test is added, why not: N/A ## User-visible / Behavior Changes - ACP sessions that hit `queue owner unavailable` during initialization now repair the dead named session instead of returning a dead handle to the first turn. - When ACPX exposes a stable session id for that dead session, OpenClaw resumes it to preserve continuity. ## Diagram (if applicable) ```text Before: [sessions ensure] -> [status=dead queue owner unavailable] -> [retain dead session] -> [first prompt fails] After: [sessions ensure] -> [status=dead queue owner unavailable] -> [repair named session owner] -> [first prompt uses repaired session] ``` ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) No - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS 25.3.0 - Runtime/container: Node 22 / pnpm workspace - Model/provider: ACPX mock runtime in tests; issue repro targets Claude and Codex ACP agents - Integration/channel (if any): ACP gateway sessions - Relevant config (redacted): `acpx.permissionMode=approve-all`, `acpx.nonInteractivePermissions=deny`, `acpx.queueOwnerTtlSeconds=30` ### Steps 1. Create or ensure an ACP named session. 2. Return `status=dead` with `summary=queue owner unavailable` from `status --session`. 3. Continue initialization and attempt the first turn. ### Expected - The runtime repairs the dead named session before returning the handle. -

openclaw2026-04-01 01:54:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#58659•Fetched 2026-04-08 01:59:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

aaronkennard1

Participants

aaronkennard1

neeravmakwana

Timeline (top)

closed ×1commented ×1cross-referenced ×1locked ×1

Error Message

acpx ensureSession retaining dead named session with recoverable status: session=agent:claude:acp:<uuid> status=dead summary=queue owner unavailable

⇄ res ✗ agent errorCode=UNAVAILABLE errorMessage=AcpRuntimeError: acpx exited with code 1 code=ACP_TURN_FAILED

Root Cause

Suspected Root Cause

Fix Action

Workaround

Using subagent runtime instead of acp runtime works reliably for all agents, but loses the persistent session / thread-bound conversation capability that ACP provides.

PR fix notes

PR #58669: fix(acpx): repair queue owner session recovery

Repository: openclaw/openclaw
Author: neeravmakwana
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/58669

Description (problem / solution / changelog)

Summary

Problem: ACP gateway sessions could keep a named session in status=dead when acpx reported queue owner unavailable, then hand that dead handle to the first prompt.
Why it matters: sessions_spawn could still fail on the first turn for Claude and intermittently for other ACP agents even though the runtime logged that the dead session was "recoverable".
What changed: the acpx runtime now repairs that dead named session by creating a replacement owner, resuming the backend session when a stable session id is available, and falling back to a fresh named session when it is not.
What did NOT change (scope boundary): this does not change unrelated ACP status handling, config defaults, or non-queue-owner dead-session recovery.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #58659
Related #56855
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

Root cause: extensions/acpx/src/runtime.ts treated status=dead with summary~="queue owner unavailable" as a session to keep, so ensureSession() could return a dead handle that then failed on the first prompt.
Missing detection / guardrail: the regression tests only asserted that OpenClaw stopped replacing those dead sessions in a loop; they did not assert that the repaired session was usable for the next turn.
Prior context (git blame, prior PR, issue, or refactor if known): the queue-owner path was special-cased in 1c95c41c37 to avoid an infinite replace loop.
Why this regressed now: the earlier fix avoided the loop but still left the queue-owner recovery path returning the dead named session instead of repairing it.
If unknown, what was ruled out: ruled out a simple TTL-only fix because the reported failure still reproduces after increasing queueOwnerTtlSeconds.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: extensions/acpx/src/runtime.test.ts
Scenario the test should lock in: when status reports dead plus queue owner unavailable, the runtime should repair the named session, resume when a stable session id exists, and fall back to a fresh named session when it does not.
Why this is the smallest reliable guardrail: the bug lives entirely in the ACPX runtime control-flow around sessions ensure, status, and sessions new.
Existing test that already covers this (if any): the prior dead-session tests covered the queue-owner branch but asserted the wrong behavior for this issue.
If no new test is added, why not: N/A

User-visible / Behavior Changes

ACP sessions that hit queue owner unavailable during initialization now repair the dead named session instead of returning a dead handle to the first turn.
When ACPX exposes a stable session id for that dead session, OpenClaw resumes it to preserve continuity.

Diagram (if applicable)

Before:
[sessions ensure] -> [status=dead queue owner unavailable] -> [retain dead session] -> [first prompt fails]

After:
[sessions ensure] -> [status=dead queue owner unavailable] -> [repair named session owner] -> [first prompt uses repaired session]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS 25.3.0
Runtime/container: Node 22 / pnpm workspace
Model/provider: ACPX mock runtime in tests; issue repro targets Claude and Codex ACP agents
Integration/channel (if any): ACP gateway sessions
Relevant config (redacted): acpx.permissionMode=approve-all, acpx.nonInteractivePermissions=deny, acpx.queueOwnerTtlSeconds=30

Steps

Create or ensure an ACP named session.
Return status=dead with summary=queue owner unavailable from status --session.
Continue initialization and attempt the first turn.

Expected

The runtime repairs the dead named session before returning the handle.
If a stable session id is present, the repair resumes that session.
If no stable session id is present, the runtime falls back to a fresh named session.

Actual

Before this change, the runtime kept the dead named session and the first prompt could still fail with ACP_TURN_FAILED.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: updated the queue-owner recovery tests, added the no-resumable-id fallback case, ran pnpm test -- extensions/acpx/src/runtime.test.ts -t "queue owner unavailable|ensure fallback|no resumable id", ran pnpm test:extension acpx, and ran pnpm build.
Edge cases checked: dead queue-owner status after sessions ensure, dead queue-owner status after ensure failure fallback, and missing ids in the status payload.
What you did not verify: a live Claude/Codex gateway repro in this environment, and pnpm check still reports unrelated pre-existing type failures in extensions/diffs/src/language-hints.test.ts and src/plugins/contracts/plugin-sdk-subpaths.test.ts.
AI assistance / testing note: prepared with AI assistance, manually reviewed before opening, and validated with the focused ACPX test lanes above.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps:

Risks and Mitigations

Risk: some ACPX backends may report queue-owner failures without stable ids, which forces a fresh named session instead of a resume.
- Mitigation: the runtime only takes that fallback when status provides no resumable id at all, and the new regression test locks in that branch.

Made with Cursor

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/acpx/src/runtime.test.ts (modified, +35/-6)
extensions/acpx/src/runtime.ts (modified, +95/-41)
extensions/acpx/src/test-utils/runtime-fixtures.ts (modified, +5/-3)

Code Example

"acpx": {
  "enabled": true,
  "config": {
    "permissionMode": "approve-all",
    "timeoutSeconds": 300,
    "queueOwnerTtlSeconds": 30,
    "nonInteractivePermissions": "deny"
  }
}

---

sessions_spawn({ runtime: "acp", agentId: "claude", task: "Hello" })

---

acpx ensureSession retaining dead named session with recoverable status:
  session=agent:claude:acp:<uuid> status=dead summary=queue owner unavailable

⇄ res ✗ agent errorCode=UNAVAILABLE
  errorMessage=AcpRuntimeError: acpx exited with code 1
  code=ACP_TURN_FAILED

RAW_BUFFERClick to expand / collapse

Bug Description

ACP sessions spawned through the OpenClaw gateway (sessions_spawn with runtime: "acp") consistently fail with "queue owner unavailable" for the Claude agent, and intermittently for Codex. The same agents work perfectly when invoked directly via acpx CLI.

Environment

OpenClaw version: 2026.3.31 (213a704)
acpx version: 0.4.0
Node: v22.22.1
OS: macOS (arm64, Darwin 25.3.0)
Claude ACP package: @agentclientprotocol/[email protected]
Codex ACP package: @zed-industries/[email protected]

Steps to Reproduce

Configure ACP plugin in openclaw.json:

"acpx": {
  "enabled": true,
  "config": {
    "permissionMode": "approve-all",
    "timeoutSeconds": 300,
    "queueOwnerTtlSeconds": 30,
    "nonInteractivePermissions": "deny"
  }
}

Spawn a Claude ACP session via gateway:

sessions_spawn({ runtime: "acp", agentId: "claude", task: "Hello" })

Session immediately dies with "queue owner unavailable"

What Works vs What Fails

Method	Claude	Codex
`acpx claude exec "test"` (direct CLI)	✅ Works	✅ Works
`sessions_spawn` via gateway	❌ Fails	⚠️ Intermittent

Error Logs

Gateway log shows repeated cycle:

acpx ensureSession retaining dead named session with recoverable status:
  session=agent:claude:acp:<uuid> status=dead summary=queue owner unavailable

⇄ res ✗ agent errorCode=UNAVAILABLE
  errorMessage=AcpRuntimeError: acpx exited with code 1
  code=ACP_TURN_FAILED

The session is created, but the queue owner dies before the first prompt is delivered. The gateway then retains the dead session as "recoverable" and tries again, but fails the same way.

Exit codes observed: 1 (Claude), 3 (Codex on one occasion).

Investigation Done

queueOwnerTtlSeconds — Was originally 0.1s (way too low). Bumped to 30s. This did NOT fix the issue for Claude, though it briefly helped Codex.
Deprecated package — Updated Claude from deprecated @zed-industries/[email protected] to @agentclientprotocol/[email protected]. Did NOT fix the gateway bridge issue (direct CLI still works fine with either).
nonInteractivePermissions — Changed from wrong value to "deny" in both ~/.acpx/config.json and openclaw.json. No effect on gateway bridge.
permissionMode: "approve-all" — Confirmed set. No effect.
Gateway restart — Performed multiple restarts. Issue persists across restarts.

Suspected Root Cause

The issue appears to be in the gateway ACP plugin's ensureSession → createNamedSession lifecycle. The acpx process is spawned but the queue owner (the bridge between gateway and acpx) loses connection or times out before the first prompt can be delivered. This is a timing/lifecycle issue in the gateway plugin, not in the agent packages themselves.

The fact that acpx <agent> exec works perfectly from CLI confirms the agents and their packages are fine — the problem is specifically in how the gateway manages the acpx session bridge.

Expected Behavior

sessions_spawn with runtime: "acp" should reliably create persistent sessions for all supported agents (claude, codex, pi, etc).

Workaround

Using subagent runtime instead of acp runtime works reliably for all agents, but loses the persistent session / thread-bound conversation capability that ACP provides.

extent analysis

TL;DR

Adjust the queueOwnerTtlSeconds configuration to a higher value to potentially mitigate the "queue owner unavailable" issue when spawning ACP sessions through the OpenClaw gateway.

Guidance

Review and adjust the queueOwnerTtlSeconds setting in openclaw.json to ensure it's sufficiently high to accommodate the time needed for the queue owner to establish a connection before timing out.
Investigate the ensureSession and createNamedSession lifecycle in the gateway ACP plugin to identify potential timing or lifecycle issues that could be causing the queue owner to lose connection.
Consider implementing additional logging or debugging in the gateway plugin to gain more insight into the sequence of events leading up to the "queue owner unavailable" error.
Test the sessions_spawn functionality with different queueOwnerTtlSeconds values to determine if there's a specific threshold beyond which the issue is resolved.

Example

"acpx": {
  "enabled": true,
  "config": {
    "permissionMode": "approve-all",
    "timeoutSeconds": 300,
    "queueOwnerTtlSeconds": 120, // Increased from 30 to 120 seconds
    "nonInteractivePermissions": "deny"
  }
}

Notes

The issue seems to be related to the timing and lifecycle of the gateway ACP plugin, rather than the agent packages themselves.
The fact that using subagent runtime works reliably but loses persistent session capabilities suggests that the issue is specific to the ACP session management in the gateway.

Recommendation

Apply workaround: Increase queueOwnerTtlSeconds to a higher value (e.g., 120 seconds) to see if it mitigates the issue, as this is a relatively simple configuration change that can be easily reverted if it doesn't work.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt issue #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix ACP gateway bridge: 'queue owner unavailable' for Claude (and intermittently Codex) sessions [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Suspected Root Cause

Fix Action

Workaround

PR fix notes

PR #58669: fix(acpx): repair queue owner session recovery

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause / Regression History (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Bug Description

Environment

Steps to Reproduce

What Works vs What Fails

Error Logs

Investigation Done

Suspected Root Cause

Expected Behavior

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING