openclaw - ✅(Solved) Fix [Bug]: abortable(activeSession.prompt()) creates zombie Agent loop when signal is pre-aborted [2 pull requests, 1 comments, 2 participants]

zhumengzhu · 2026-04-30T04:48:05Z

[openclaw] When params.abortSignal is already aborted before activeSession.prompt is called e.g. rapid consecutive messages with messages.queue.mode: "interrup… When `params.abortSignal` is already aborted before `activeSession.prompt()` is called (e.g. rapid consecutive messages with `messages.queue.mode: "interrupt"`), `abortable()` immediately rejects but the `prompt()` async chain has already started. The floating Promise creates a new `Agent._runLoop()` with a fresh `abortController` that nobody ever aborts, causing the Agent to loop indefinitely calling the LLM after the attempt has exited. Observed: 2617 LLM calls over 103 minutes from a single zombie run. # PR #74979: fix(attempt): prevent zombie Agent loop when abort arrives before prompt() - Repository: openclaw/openclaw - Author: zhumengzhu - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/74979 ## Description (problem / solution / changelog) ## Summary - **Problem**: When `params.abortSignal` is already aborted before `activeSession.prompt()` is called, JS evaluates `prompt()` first (starting the async chain), then `abortable()` rejects immediately. The floating Promise creates a new `Agent._runLoop()` with a fresh `abortController` that nobody ever aborts — the Agent loops indefinitely after the attempt exits. Observed in production: 2617 LLM calls over 103 minutes from a single zombie run. - **Why it matters**: Silent unbounded LLM cost; no user-visible symptom. Triggered deterministically by `queue.mode: interrupt` + rapid consecutive messages, and probabilistically by timeout-compaction retries. - **What changed**: (1) Pre-prompt guard in `attempt.ts` — check `runAbortController.signal.aborted` before calling `activeSession.prompt()` and throw early, so no floating Promise is created. (2) Defensive `agent.abort()` + `clearAllQueues()` in the `finally` block as a backstop for aborts that arrive mid-prompt. - **What did NOT change**: No behavior change for normal (non-aborted) runs. No config added. `maxLlmCallsPerRun` hardening deferred to a follow-up per ClawSweeper's triage recommendation. ## Change Type (select all) - [x] Bug fix ## Scope (select all touched areas) - [x] Gateway / orchestration ## Linked Issue/PR - Closes #74859 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - **Root cause**: `await abortable(activeSession.prompt(...))` — JS evaluates the argument `activeSession.prompt()` before `abortable()` runs its abort check. When the signal is pre-aborted, a floating Promise escapes into `Agent._runLoop()` with a fresh `abortController` that is never signaled. - **Missing detection / guardrail**: No pre-call abort state check before `prompt()`; `finally` block did not terminate an escaped Agent. - **Contributing context**: `void activeSession.abort()` in `abortRun()` is fire-and-forget and only aborts `agent.abortController` — which is `undefined` until `_runLoop()` starts. If abort arrives before that, the call is a no-op. ## Regression Test Plan (if applicable) - Coverage level: [x] Unit test - Target file: `src/agents/pi-embedded-runner/run/attempt.abort-before-prompt.test.ts` - Scenarios locked in: 1. Pre-aborted signal → `prompt()` is skipped → zero LLM calls after attempt exits 2. Normal (non-aborted) signal → `prompt()` executes normally, not affected by the guard - Why this is the smallest reliable guardrail: Uses a real `Agent` + mock `streamFn` with a call counter. No network, no gateway stack. Deterministic — the signal is already aborted at call time, eliminating any race. ## User-visible / Behavior Changes None. Aborted runs already returned immediately; this prevents hidden background activity that was invisible to users anyway. ## Diagram (if applicable) ```text Before: abort arrives → abortRun() → void activeSession.abort() [no-op, abortController=undefined] → activeSession.prompt() starts _runLoop, creates fresh abortController → abortable() rejects → attempt exits → _runLoop loops forever (fresh controller never aborted) After: abort arrives → abortRun() → runAbortController.signal.aborted = true → pre-prompt guard: signal.aborted → throw AbortError (prompt() never called) → attempt exits cleanly → finally: agent.abort() + clearAllQueues() [backstop] ``` ## Security Impact (required) - New permissions/capabilities? No - Secrets/tokens handling changed? No - New/changed network calls? No - Command/tool execution surface changed? No - Data access scope changed? No ## Repro + Verification ### Environment - OS: Linux - Runtime: Node 22 - Model/provider: Any (bug is model-agnostic) - Relevant config: `messages.queue.mode: interrupt` ### Steps 1. Configure `messages.queue.mode: interrupt` 2. Send a message; within < 1s send a second message 3. Observe the first run's attempt exits with `durationMs < 50ms` ### Expected - No LLM calls after the attempt exits ### Actual (before fix) - LLM calls continue for minutes/hours after the attempt exits; `embedded run done` never a

openclaw2026-04-30 04:48:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74859•Fetched 2026-05-01 05:40:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

zhumengzhu

Participants

clawsweeper[bot]

zhumengzhu

Timeline (top)

cross-referenced ×3referenced ×3commented ×1

When params.abortSignal is already aborted before activeSession.prompt() is called (e.g. rapid consecutive messages with messages.queue.mode: "interrupt"), abortable() immediately rejects but the prompt() async chain has already started. The floating Promise creates a new Agent._runLoop() with a fresh abortController that nobody ever aborts, causing the Agent to loop indefinitely calling the LLM after the attempt has exited. Observed: 2617 LLM calls over 103 minutes from a single zombie run.

Error Message

const err = new Error("Request was aborted."); const abortSignal = AbortSignal.abort(new Error("second message arrived")); The Agent continues calling the LLM indefinitely after the attempt has returned. Each iteration: ~90k input tokens + ~35 output tokens, stopReason always toolUse, tools always throw AbortError (caught as error result), model retries the same tool call. Loop never terminates unless the process restarts.

All tool results are "Aborted" (error result) Why the inner loop never stops: Agent._runLoop() in pi-agent-core only exits on stopReason === "error" | "aborted". The zombie's signal is never aborted. Tools throw AbortError (from the outer runAbortController.signal), but this is caught as an error tool result — the model retries indefinitely.

Root Cause

Root cause: await abortable(activeSession.prompt(effectivePrompt)) in attempt.ts (introduced in 016693a1f). JavaScript evaluates activeSession.prompt() first (starting the async chain), then abortable() races it. When the signal is pre-aborted, abortable() rejects immediately but the floating Promise from prompt() creates a new Agent._runLoop() with a fresh abortController that nobody ever aborts.

Fix Action

Fixed

Fixed by PR: fix(attempt): prevent zombie Agent loop when abort arrives before prompt() (https://github.com/openclaw/openclaw/pull/74979)
Fixed by PR: fix(agents): prevent zombie Agent loop when abort signal is pre-aborted (#74859) (https://github.com/openclaw/openclaw/pull/75012)

PR fix notes

PR #74979: fix(attempt): prevent zombie Agent loop when abort arrives before prompt()

Repository: openclaw/openclaw
Author: zhumengzhu
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/74979

Description (problem / solution / changelog)

Summary

Problem: When params.abortSignal is already aborted before activeSession.prompt() is called, JS evaluates prompt() first (starting the async chain), then abortable() rejects immediately. The floating Promise creates a new Agent._runLoop() with a fresh abortController that nobody ever aborts — the Agent loops indefinitely after the attempt exits. Observed in production: 2617 LLM calls over 103 minutes from a single zombie run.
Why it matters: Silent unbounded LLM cost; no user-visible symptom. Triggered deterministically by queue.mode: interrupt + rapid consecutive messages, and probabilistically by timeout-compaction retries.
What changed: (1) Pre-prompt guard in attempt.ts — check runAbortController.signal.aborted before calling activeSession.prompt() and throw early, so no floating Promise is created. (2) Defensive agent.abort() + clearAllQueues() in the finally block as a backstop for aborts that arrive mid-prompt.
What did NOT change: No behavior change for normal (non-aborted) runs. No config added. maxLlmCallsPerRun hardening deferred to a follow-up per ClawSweeper's triage recommendation.

Change Type (select all)

Bug fix

Scope (select all touched areas)

Gateway / orchestration

Linked Issue/PR

Closes #74859
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: await abortable(activeSession.prompt(...)) — JS evaluates the argument activeSession.prompt() before abortable() runs its abort check. When the signal is pre-aborted, a floating Promise escapes into Agent._runLoop() with a fresh abortController that is never signaled.
Missing detection / guardrail: No pre-call abort state check before prompt(); finally block did not terminate an escaped Agent.
Contributing context: void activeSession.abort() in abortRun() is fire-and-forget and only aborts agent.abortController — which is undefined until _runLoop() starts. If abort arrives before that, the call is a no-op.

Regression Test Plan (if applicable)

Coverage level: [x] Unit test
Target file: src/agents/pi-embedded-runner/run/attempt.abort-before-prompt.test.ts
Scenarios locked in:
1. Pre-aborted signal → prompt() is skipped → zero LLM calls after attempt exits
2. Normal (non-aborted) signal → prompt() executes normally, not affected by the guard
Why this is the smallest reliable guardrail: Uses a real Agent + mock streamFn with a call counter. No network, no gateway stack. Deterministic — the signal is already aborted at call time, eliminating any race.

User-visible / Behavior Changes

None. Aborted runs already returned immediately; this prevents hidden background activity that was invisible to users anyway.

Diagram (if applicable)

Before:
  abort arrives → abortRun() → void activeSession.abort() [no-op, abortController=undefined]
                → activeSession.prompt() starts _runLoop, creates fresh abortController
                → abortable() rejects → attempt exits
                → _runLoop loops forever (fresh controller never aborted)

After:
  abort arrives → abortRun() → runAbortController.signal.aborted = true
                → pre-prompt guard: signal.aborted → throw AbortError (prompt() never called)
                → attempt exits cleanly
                → finally: agent.abort() + clearAllQueues() [backstop]

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: Linux
Runtime: Node 22
Model/provider: Any (bug is model-agnostic)
Relevant config: messages.queue.mode: interrupt

Steps

Configure messages.queue.mode: interrupt
Send a message; within < 1s send a second message
Observe the first run's attempt exits with durationMs < 50ms

Expected

No LLM calls after the attempt exits

Actual (before fix)

LLM calls continue for minutes/hours after the attempt exits; embedded run done never appears for the zombie

Evidence

Failing test before + passing after: attempt.abort-before-prompt.test.ts — 2/2 green on patched code; the pre-aborted path produces zero LLM calls after attempt exits.

Human Verification (required)

Verified scenarios: unit tests pass (attempt.abort-before-prompt.test.ts — 2/2 green); pre-aborted signal path skips prompt() confirmed by code review.
Edge cases checked: agent.abort() called on a completed run where abortController is undefined — optional chain makes it a no-op.
What I did not verify: live end-to-end with a real LLM provider and interrupt mode (will follow up in the issue).

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Risks and Mitigations

Risk: agent.abort() in finally could theoretically interrupt a normally completing run if _runLoop hasn't cleared abortController yet.
- Mitigation: Agent.abort() calls this.abortController?.abort() — when _runLoop completes normally it sets abortController = undefined, making the call a documented no-op. Covered by the "normal completion" test case.

Changed files

src/agents/pi-embedded-runner/run/attempt.abort-before-prompt.test.ts (added, +306/-0)
src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts (modified, +4/-0)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +28/-0)

PR #75012: fix(agents): prevent zombie Agent loop when abort signal is pre-aborted (#74859)

Repository: openclaw/openclaw
Author: RichardCao
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/75012

Description (problem / solution / changelog)

Summary

Prevents the embedded runner from starting a floating Agent loop when the abort signal is already aborted before activeSession.prompt() is called.

Closes #74859

Root Cause

When params.abortSignal is pre-aborted (e.g. rapid consecutive messages with messages.queue.mode: "interrupt"):

onAbort() fires → abortRun() → runAbortController.abort() + void activeSession.abort()
Code still reaches await abortable(activeSession.prompt(prompt))
JavaScript evaluates activeSession.prompt() first (starts Agent _runLoop with a fresh internal abortController)
Then abortable() wraps/rejects immediately
The Agent loop runs with nobody to abort its internal controller → zombie

Observed: 2617 LLM calls over 103 minutes from a single zombie run.

Fix

Two changes in src/agents/pi-embedded-runner/run/attempt.ts:

1. Pre-prompt abort guard

Before every activeSession.prompt() call, check if the run is already aborted. If so, throw makeAbortError() instead of starting a prompt that would create an unabortable Agent loop.

if (aborted || runAbortController.signal.aborted) {
  throw makeAbortError(runAbortController.signal);
}

2. Finally-block safety net

Add void activeSession.abort() in the finally block when the run was aborted or timed out. This catches rare race windows where a prompt was started just before the pre-abort guard fires.

if (aborted || timedOut) {
  void activeSession.abort();
}

Uses void (not await) to avoid blocking the cleanup path.

Testing

All 127 existing tests in attempt.test.ts pass:

 Test Files  1 passed (1)
      Tests  127 passed (127)

Risk Assessment

Low risk. Both changes are additive guards:

The pre-prompt guard only fires when the run is already aborted — no change to the happy path
The finally-block activeSession.abort() is best-effort fire-and-forget, same as the existing call in abortRun()
The catch block at line 2768 correctly handles the thrown AbortError (existing isRunnerAbortError check)

Changed files

src/agents/pi-embedded-runner/run/attempt.ts (modified, +11/-0)

Code Example

import { Agent, type AgentMessage } from "@mariozechner/pi-agent-core";
import type { Api, Message, Model } from "@mariozechner/pi-ai";
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import {
  createDefaultEmbeddedSession,
  getHoisted,
  resetEmbeddedAttemptHarness,
  testModel,
} from "./attempt.spawn-workspace.test-support.js";

const sleep = (ms: number) => new Promise<void>((r) => setTimeout(r, ms));
const mockModel = testModel as unknown as Model<Api>;

const mockTool = {
  name: "mock_tool",
  label: "Mock Tool",
  description: "mock",
  parameters: { type: "object" as const, properties: {} },
  execute: async () => ({ content: [{ type: "text" as const, text: "Aborted" }], details: {} }),
};

function createToolUseStreamFn(tracker: { count: number }) {
  return async (_model: unknown, _context: unknown, options?: { signal?: AbortSignal }) => {
    tracker.count += 1;
    await sleep(5);
    if (options?.signal?.aborted) {
      const err = new Error("Request was aborted.");
      err.name = "AbortError";
      throw err;
    }
    const message = {
      role: "assistant" as const,
      content: [
        { type: "toolCall" as const, id: `call_${tracker.count}`, name: "mock_tool", arguments: {} },
      ],
      usage: { input: 70, output: 51, cacheRead: 0, cacheWrite: 0, totalTokens: 121, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 } },
      stopReason: "toolUse" as const,
      timestamp: Date.now(),
    };
    return {
      [Symbol.asyncIterator]() {
        let done = false;
        return { async next() { if (!done) { done = true; return { done: false, value: { type: "done", message } }; } return { done: true, value: undefined }; } };
      },
      async result() { return message; },
    } as never;
  };
}

const hoisted = getHoisted();

describe("Agent zombie loop (upstream bug)", () => {
  beforeEach(() => { resetEmbeddedAttemptHarness(); });

  it("bug: abort-before-prompt produces floating Promise, Agent loops after attempt exits", { timeout: 10_000 }, async () => {
    const tracker = { count: 0 };
    const agent = new Agent({
      initialState: { systemPrompt: "test", model: mockModel, tools: [mockTool] },
      streamFn: createToolUseStreamFn(tracker),
      convertToLlm: (msgs: AgentMessage[]): Message[] =>
        msgs.filter((m) => ["user", "assistant", "toolResult"].includes(m.role)) as Message[],
    });

    hoisted.createAgentSessionMock.mockResolvedValue({
      session: createDefaultEmbeddedSession({
        prompt: async (_session, prompt) => {
          agent.prompt(prompt).catch(() => {});
          await sleep(50);
        },
      }),
    });

    const abortSignal = AbortSignal.abort(new Error("second message arrived"));
    const { runEmbeddedAttempt } = await import("./attempt.js");

    await runEmbeddedAttempt({
      sessionId: "zombie-test", sessionKey: "agent:main:main",
      sessionFile: "/tmp/zombie-test.jsonl", workspaceDir: "/tmp", agentDir: "/tmp",
      config: {}, prompt: "first message", timeoutMs: 5_000, runId: "zombie-run",
      provider: "openai", modelId: "gpt-test", model: mockModel,
      authStorage: { getApiKey: async () => undefined } as never,
      modelRegistry: {} as never, thinkLevel: "off",
      senderIsOwner: true, disableMessageTool: true, abortSignal,
    });

    const countAtExit = tracker.count;
    await sleep(500);
    const countAfterWait = tracker.count;

    console.log(`LLM calls at exit=${countAtExit}, after 500ms=${countAfterWait}, delta=${countAfterWait - countAtExit}`);
    expect(countAfterWait).toBeGreaterThan(countAtExit);

    agent.abort();
    agent.clearAllQueues?.();
    await agent.waitForIdle();
  });
});

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Configure messages.queue.mode: "interrupt" in openclaw.json
Send a message to the agent
Within < 1 second, send a second message (interrupt mode aborts the first run)
Observe that the first run's Agent continues calling the LLM in the background after the attempt has exited

Alternatively, run the reproduction test below which uses a pre-aborted AbortSignal to simulate the same condition deterministically.

<details> <summary>agent-zombie-loop.test.ts (click to expand)</summary>

import { Agent, type AgentMessage } from "@mariozechner/pi-agent-core";
import type { Api, Message, Model } from "@mariozechner/pi-ai";
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import {
  createDefaultEmbeddedSession,
  getHoisted,
  resetEmbeddedAttemptHarness,
  testModel,
} from "./attempt.spawn-workspace.test-support.js";

const sleep = (ms: number) => new Promise<void>((r) => setTimeout(r, ms));
const mockModel = testModel as unknown as Model<Api>;

const mockTool = {
  name: "mock_tool",
  label: "Mock Tool",
  description: "mock",
  parameters: { type: "object" as const, properties: {} },
  execute: async () => ({ content: [{ type: "text" as const, text: "Aborted" }], details: {} }),
};

function createToolUseStreamFn(tracker: { count: number }) {
  return async (_model: unknown, _context: unknown, options?: { signal?: AbortSignal }) => {
    tracker.count += 1;
    await sleep(5);
    if (options?.signal?.aborted) {
      const err = new Error("Request was aborted.");
      err.name = "AbortError";
      throw err;
    }
    const message = {
      role: "assistant" as const,
      content: [
        { type: "toolCall" as const, id: `call_${tracker.count}`, name: "mock_tool", arguments: {} },
      ],
      usage: { input: 70, output: 51, cacheRead: 0, cacheWrite: 0, totalTokens: 121, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 } },
      stopReason: "toolUse" as const,
      timestamp: Date.now(),
    };
    return {
      [Symbol.asyncIterator]() {
        let done = false;
        return { async next() { if (!done) { done = true; return { done: false, value: { type: "done", message } }; } return { done: true, value: undefined }; } };
      },
      async result() { return message; },
    } as never;
  };
}

const hoisted = getHoisted();

describe("Agent zombie loop (upstream bug)", () => {
  beforeEach(() => { resetEmbeddedAttemptHarness(); });

  it("bug: abort-before-prompt produces floating Promise, Agent loops after attempt exits", { timeout: 10_000 }, async () => {
    const tracker = { count: 0 };
    const agent = new Agent({
      initialState: { systemPrompt: "test", model: mockModel, tools: [mockTool] },
      streamFn: createToolUseStreamFn(tracker),
      convertToLlm: (msgs: AgentMessage[]): Message[] =>
        msgs.filter((m) => ["user", "assistant", "toolResult"].includes(m.role)) as Message[],
    });

    hoisted.createAgentSessionMock.mockResolvedValue({
      session: createDefaultEmbeddedSession({
        prompt: async (_session, prompt) => {
          agent.prompt(prompt).catch(() => {});
          await sleep(50);
        },
      }),
    });

    const abortSignal = AbortSignal.abort(new Error("second message arrived"));
    const { runEmbeddedAttempt } = await import("./attempt.js");

    await runEmbeddedAttempt({
      sessionId: "zombie-test", sessionKey: "agent:main:main",
      sessionFile: "/tmp/zombie-test.jsonl", workspaceDir: "/tmp", agentDir: "/tmp",
      config: {}, prompt: "first message", timeoutMs: 5_000, runId: "zombie-run",
      provider: "openai", modelId: "gpt-test", model: mockModel,
      authStorage: { getApiKey: async () => undefined } as never,
      modelRegistry: {} as never, thinkLevel: "off",
      senderIsOwner: true, disableMessageTool: true, abortSignal,
    });

    const countAtExit = tracker.count;
    await sleep(500);
    const countAfterWait = tracker.count;

    console.log(`LLM calls at exit=${countAtExit}, after 500ms=${countAfterWait}, delta=${countAfterWait - countAtExit}`);
    expect(countAfterWait).toBeGreaterThan(countAtExit);

    agent.abort();
    agent.clearAllQueues?.();
    await agent.waitForIdle();
  });
});

</details>

Expected behavior

When a run is aborted (via interrupt mode, timeout, or RPC), the Agent should stop all LLM calls promptly. No floating Promises should outlive the attempt lifecycle.

Actual behavior

The Agent continues calling the LLM indefinitely after the attempt has returned. Each iteration: ~90k input tokens + ~35 output tokens, stopReason always toolUse, tools always throw AbortError (caught as error result), model retries the same tool call. Loop never terminates unless the process restarts.

OpenClaw version

All releases since v2026.1.20 (bug introduced in commit 016693a1f on 2026-01-18)

Operating system

Linux (also reproducible on macOS)

Install method

pnpm dev / npm global

Model

Any model (bug is model-agnostic; the loop is in the Agent runtime, not the LLM)

Provider / routing chain

Any provider (bug is provider-agnostic)

Additional provider/model setup details

NOT_ENOUGH_INFO

Logs, screenshots, and evidence

Production observations across 3 independent cases:

Case	Trigger	Duration	LLM calls
1	timeout-compaction retry	76 min	~2130
2	timeout-compaction retry	2+ hours	~952 (log truncated)
3	user rapid messages (652ms apart)	103 min	2617

Log signature of a zombie run:

embedded run prompt end durationMs=<very small, e.g. 22-26ms> (abortable() rejected immediately)
Continued model.usage stopReason=toolUse lines after run cleanup for the same runId
All tool results are "Aborted" (error result)
embedded run done never appears

Impact and severity

Affected: Any user with messages.queue.mode: "interrupt" who sends rapid consecutive messages Severity: High — silent resource drain, potential large API cost Frequency: Near-deterministic with interrupt mode + rapid messages; lower probability via timeout-compaction Consequence: Unbounded LLM API cost, server resource exhaustion, no user-visible indication of the problem

Additional information

Why the inner loop never stops: Agent._runLoop() in pi-agent-core only exits on stopReason === "error" | "aborted". The zombie's signal is never aborted. Tools throw AbortError (from the outer runAbortController.signal), but this is caught as an error tool result — the model retries indefinitely.

Why the circuit breaker doesn't fire: Tool wrapper order is abort-check (outer) → loop-detection (inner). The abort throw short-circuits before the loop detector ever runs.

Proposed 3-layer fix:

Pre-prompt guard: check aborted state before calling activeSession.prompt() — eliminates the floating Promise at source
finally block: call agent.abort() + agent.clearAllQueues() during attempt cleanup — terminates any escaped Agent
Per-run LLM call hard cap: shared counter across attempts, configurable via agents.defaults.maxLlmCallsPerRun — ultimate safety net independent of abort signal propagation

extent analysis

TL;DR

To fix the issue, implement a pre-prompt guard to check the aborted state before calling activeSession.prompt(), ensuring that no floating Promise is created when the signal is pre-aborted.

Guidance

Check the aborted state of the signal before calling activeSession.prompt() to prevent the creation of a floating Promise.
Implement a finally block to call agent.abort() and agent.clearAllQueues() during attempt cleanup to terminate any escaped Agent.
Consider introducing a per-run LLM call hard cap, configurable via agents.defaults.maxLlmCallsPerRun, as an ultimate safety net independent of abort signal propagation.
Review the tool wrapper order to ensure that the loop detector runs after the abort check to prevent the circuit breaker from being short-circuited.

Example

if (!abortSignal.aborted) {
  await abortable(activeSession.prompt(effectivePrompt));
} else {
  // Handle aborted state
}

Notes

The proposed 3-layer fix provides a comprehensive solution to the issue, addressing the root cause and introducing additional safety measures to prevent similar problems in the future.

Recommendation

Apply the proposed 3-layer fix, starting with the pre-prompt guard, to ensure that the Agent stops all LLM calls promptly when a run is aborted.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When a run is aborted (via interrupt mode, timeout, or RPC), the Agent should stop all LLM calls promptly. No floating Promises should outlive the attempt lifecycle.

#api #output truncation #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: abortable(activeSession.prompt()) creates zombie Agent loop when signal is pre-aborted [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #74979: fix(attempt): prevent zombie Agent loop when abort arrives before prompt()

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

PR #75012: fix(agents): prevent zombie Agent loop when abort signal is pre-aborted (#74859)

Description (problem / solution / changelog)

Summary

Root Cause

Fix

1. Pre-prompt abort guard

2. Finally-block safety net

Testing

Risk Assessment

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING