OpenClaw should distinguish these cases more clearly during ACP-backed runs: 1. **no output yet / still starting** 2. **provider capacity or quota exhaustion** 3. **retry / fallback in progress** 4. **session actually dead / unrecoverable** 5. **output resumed after temporary stall**

openclaw - 💡(How to fix) Fix ACP diagnostics blur Gemini capacity stalls with dead-session signals [2 comments, 1 participants]

openclaw2026-03-26 08:05:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#55028•Fetched 2026-04-08 01:33:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

anyech

Participants

anyech

Timeline (top)

commented ×2

Error Message

Even if the exact provider error cannot always be normalized perfectly, a clearer distinction between slow/stalled, capacity-limited, and actually dead would make ACP debugging much easier.

Root Cause

The current messages can push operators toward the wrong conclusion:

treating provider capacity stress like local ACP breakage
treating a recoverable run like a dead session
or treating a generic exit code as a local regression

RAW_BUFFERClick to expand / collapse

Summary

When an ACP-backed Gemini run is slow or capacity-stressed, OpenClaw can surface a confusing mix of symptoms that make a recoverable provider/backoff situation look like local ACP/session failure.

In one real sessions_spawn(runtime="acp", agentId="gemini") smoke test on OpenClaw 2026.3.24, the same run emitted all of the following in sequence:

a no-output warning:
- gemini has produced no output for 60s. It may be waiting for interactive input.
ACP session-health warnings suggesting the session was dead/unavailable:
- acpx ensureSession replacing dead named session ... summary=queue owner unavailable
and then later the run resumed output and completed successfully.

That makes it hard to tell whether the system is dealing with:

a truly dead ACP/session path
a slow-but-recovering child run
or upstream Gemini quota/capacity/backoff behavior

Environment

OpenClaw: 2026.3.24
Install kind: global npm/pnpm package install
OS: Ubuntu 22.04 LTS (arm64)
ACP agent under test: Gemini

Why this seems like a real diagnostics/surfacing gap

The underlying Gemini/ACP path does not appear broadly broken:

normal shell-side update verification was healthy
the ACP smoke test eventually completed successfully

At the same time, direct Gemini/ACP testing can fail with more specific upstream errors such as:

daily quota exhausted
RESOURCE_EXHAUSTED
MODEL_CAPACITY_EXHAUSTED
provider-side capacity exhaustion messages

So the current operator-facing surface appears to collapse several distinct states into very generic symptoms:

no output for 60s
may be waiting for interactive input
queue owner unavailable
dead named session
generic acpx exited with code 1

Expected behavior

OpenClaw should distinguish these cases more clearly during ACP-backed runs:

no output yet / still starting
provider capacity or quota exhaustion
retry / fallback in progress
session actually dead / unrecoverable
output resumed after temporary stall

Actual behavior

A single capacity-stressed or slow Gemini ACP run can look like a local ACP/session failure before later succeeding.

Why this matters

The current messages can push operators toward the wrong conclusion:

treating provider capacity stress like local ACP breakage
treating a recoverable run like a dead session
or treating a generic exit code as a local regression

Suggested direction

Instead of only surfacing generic stall/dead-session symptoms, propagate more structured ACP child/runtime state upward when available, for example:

capacity_exhausted
quota_exhausted
retrying
falling_back
all_lanes_exhausted
output_resumed

Related / nearby issues

These may be adjacent but do not seem to cover this exact diagnostics framing:

#15287
#37869
#43496

If useful, I can provide a tighter sanitized repro packet with the exact user-visible message sequence and the eventual-success outcome.

extent analysis

Fix Plan

To address the issue of unclear symptoms for ACP-backed Gemini runs, we need to modify the OpenClaw code to propagate more structured ACP child/runtime state upward. Here are the steps:

Modify the sessions_spawn function to catch and handle specific error codes from the Gemini ACP run, such as RESOURCE_EXHAUSTED and MODEL_CAPACITY_EXHAUSTED.
Introduce new error codes or messages to distinguish between different states, such as:
- capacity_exhausted
- quota_exhausted
- retrying
- falling_back
- all_lanes_exhausted
- output_resumed
Update the logging mechanism to surface these new error codes and messages to the operator.

Example code snippet:

def sessions_spawn(runtime="acp", agentId="gemini"):
    try:
        # existing code
    except Exception as e:
        if "RESOURCE_EXHAUSTED" in str(e):
            logging.error("Capacity exhausted")
            return "capacity_exhausted"
        elif "MODEL_CAPACITY_EXHAUSTED" in str(e):
            logging.error("Quota exhausted")
            return "quota_exhausted"
        # handle other specific error codes
    # existing code

    # introduce new logging for retrying and falling back
    if retrying:
        logging.info("Retrying")
        return "retrying"
    elif falling_back:
        logging.info("Falling back")
        return "falling_back"

Verification

To verify the fix, run the sessions_spawn function with a Gemini ACP run that is expected to encounter capacity or quota exhaustion. Check the logs for the new error codes and messages, and ensure that they are correctly surfaced to the operator.

Extra Tips

Make sure to handle the new error codes and messages in the operator-facing UI to provide clear and actionable information.
Consider introducing a retry mechanism with exponential backoff to handle temporary capacity or quota exhaustion.
Review the related issues (#15287, #37869, #43496) to ensure that this fix does not introduce any regressions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

OpenClaw should distinguish these cases more clearly during ACP-backed runs:

no output yet / still starting
provider capacity or quota exhaustion
retry / fallback in progress
session actually dead / unrecoverable
output resumed after temporary stall

#prompt formatting #chain error #conversation history #tool integration #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix ACP diagnostics blur Gemini capacity stalls with dead-session signals [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Environment

Why this seems like a real diagnostics/surfacing gap

Expected behavior

Actual behavior

Why this matters

Suggested direction

Related / nearby issues

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix ACP diagnostics blur Gemini capacity stalls with dead-session signals [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Environment

Why this seems like a real diagnostics/surfacing gap

Expected behavior

Actual behavior

Why this matters

Suggested direction

Related / nearby issues

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING