openclaw - ✅(Solved) Fix [Bug]: chat.abort fails when called immediately after chat.send due to race condition [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84176Fetched 2026-05-20 03:43:10
View on GitHub
Comments
1
Participants
2
Timeline
10
Reactions
1
Author
Timeline (top)
labeled ×7cross-referenced ×2commented ×1

When calling chat.abort immediately after chat.send (with the same runId), the abort operation sometimes fails because the abort controller has not been registered yet.

The issue occurs in src/gateway/server-methods/chat.ts (1927-1931):

const active = context.chatAbortControllers.get(runId); if (!active) { respond(true, { ok: true, aborted: false, runIds: [] }); return; }

Root Cause

When calling chat.abort immediately after chat.send (with the same runId), the abort operation sometimes fails because the abort controller has not been registered yet.

Fix Action

Fixed

PR fix notes

PR #84274: [codex] fix gateway chat.abort send race

Description (problem / solution / changelog)

Summary

  • register chat.send runs in chatAbortControllers before awaited attachment/model preparation so immediate chat.abort calls can still find the active run
  • short-circuit pre-dispatch sends that were already aborted so they do not enqueue a chat run or reach dispatchInboundMessage
  • update the attachment parsing race test and add a focused regression that aborts during attachment preparation

Root Cause

chat.send only registered its abort controller after awaited attachment preparation finished, while chat.abort returns aborted: false immediately when the runId is absent. An immediate abort during image/file preparation could therefore miss the in-flight run window.

Verification

  • node scripts/run-vitest.mjs src/gateway/server.chat.gateway-server-chat-b.test.ts src/gateway/server-methods/chat.abort-authorization.test.ts src/gateway/chat-abort.test.ts

Real behavior proof

Behavior addressed: chat.abort no longer misses an active chat.send run when the abort arrives during attachment preparation. Real environment tested: Local Gateway Vitest harness on rebased upstream/main. Exact steps or command run after this patch: node scripts/run-vitest.mjs src/gateway/server.chat.gateway-server-chat-b.test.ts src/gateway/server-methods/chat.abort-authorization.test.ts src/gateway/chat-abort.test.ts Evidence after fix: Added a regression in src/gateway/server.chat.gateway-server-chat-b.test.ts that starts chat.send with an image attachment, blocks attachment preparation on model-catalog resolution, issues chat.abort immediately, and asserts aborted: true, active-run cleanup, and no downstream dispatch. Observed result after fix: The immediate abort succeeds during attachment preparation and the targeted Gateway suite passes on latest upstream/main. What was not tested: No live gateway/manual UI repro, no broad pnpm check*, and no Crabbox/Testbox run.

Fixes #84176

Changed files

  • src/gateway/server-methods/chat.ts (modified, +130/-114)
  • src/gateway/server.chat.gateway-server-chat-b.test.ts (modified, +144/-4)

PR #84306: Fix chat.abort during attachment send preprocessing

Description (problem / solution / changelog)

Summary

  • Problem: chat.abort could miss a just-started chat.send run when the abort arrived while attachment/model preprocessing was still awaited before abort-controller registration.
  • Solution: register the run's abort controller after validation/send-policy/idempotency/active-dedupe checks and before awaited preprocessing, while preserving the existing public started / in_flight response shapes.
  • What changed: abort now wins during preprocessing, duplicate sends during preprocessing return in_flight, stale cleanup uses the active-run identity guard, and abort-before-dispatch cleans pre-dispatch offloaded media/staged files.
  • What did NOT change (scope boundary): no protocol/schema/status-field addition, no raw Pi transcript writes, no agent RPC abort helper change, and no channel adapter change.

AI-assisted disclosure: this PR was prepared with AI assistance from NexCore; I verified the diff, tests, typecheck, and live Gateway proof listed below.

Motivation

  • Fixes #84176 by removing the abort-controller registration race in the WebChat/Gateway chat.send path.
  • Related #84274 covers the same issue/surface, but it is still a draft with mock-only proof at submission time. This PR includes real Gateway proof plus the pre-dispatch cleanup polish for the cancellation path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #84176
  • Related #84274
  • This PR fixes a bug or regression

Real behavior proof

  • Behavior or issue addressed: chat.abort called immediately after an attachment chat.send can find and cancel the just-started run during attachment preprocessing; the send ack shape remains {runId,status:"started"}, and no stale active run remains afterward.

  • Real environment tested: Local OpenClaw dev Gateway from this branch, loopback WebSocket ws://127.0.0.1:19176, protocol 4, isolated temporary OpenClaw home under /tmp, and a device-signed operator GatewayClient with operator.read / operator.write scopes.

  • Exact steps or command run after this patch: Started a local dev Gateway from this checkout with channels/plugins disabled for isolation, then ran a Node GatewayClient proof script that sends chat.send with a 3 MiB attachment and immediately sends chat.abort for the same idempotency/run id without awaiting the send ack.

Command shape:

OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_DISABLE_BUNDLED_PLUGINS=1 \
OPENCLAW_GATEWAY_TOKEN=<temporary-token> \
node scripts/run-node.mjs --dev gateway --allow-unconfigured \
  --port 19176 --bind loopback --auth token --token <temporary-token> \
  --compact --cli-backend-logs

OPENCLAW_HOME=<tmp-openclaw-home> OPENCLAW_PROFILE=dev \
OPENCLAW_STATE_DIR=<tmp-openclaw-home>/.openclaw-dev \
OPENCLAW_CONFIG_PATH=<tmp-openclaw-home>/.openclaw-dev/openclaw.json \
OPENCLAW_GATEWAY_URL=ws://127.0.0.1:19176 \
OPENCLAW_GATEWAY_TOKEN=<temporary-token> \
node --import tsx /tmp/openclaw-84176-live-proof.mjs
  • Evidence after fix:

Terminal capture from the local Gateway run, copied live output:

{
  "environment": {
    "gatewayUrl": "ws://127.0.0.1:19176",
    "protocol": 4,
    "serverVersion": "2026.5.19",
    "authScopes": [
      "operator.pairing",
      "operator.read",
      "operator.write"
    ],
    "defaultAgentId": "dev",
    "beforeSessionCount": 0
  },
  "request": {
    "runId": "proof-abort-race-1779222835236",
    "sessionKey": "main",
    "attachmentCount": 1,
    "attachmentBytes": 3145728,
    "sentAbortImmediatelyAfterSendWithoutAwaitingSendAck": true
  },
  "results": {
    "abort": {
      "ok": true,
      "payload": {
        "ok": true,
        "aborted": true,
        "runIds": [
          "proof-abort-race-1779222835236"
        ]
      }
    },
    "send": {
      "ok": true,
      "payload": {
        "runId": "proof-abort-race-1779222835236",
        "status": "started"
      }
    },
    "secondAbortAfterSendSettled": {
      "ok": true,
      "payload": {
        "ok": true,
        "aborted": false,
        "runIds": []
      }
    },
    "history": {
      "ok": true,
      "sessionKey": "main",
      "messageCount": 0,
      "messages": []
    },
    "afterSessionCount": 0,
    "capturedEventSummary": [
      {
        "event": "health"
      },
      {
        "event": "health"
      },
      {
        "event": "chat",
        "runId": "proof-abort-race-1779222835236",
        "sessionKey": "main",
        "state": "aborted"
      },
      {
        "event": "agent",
        "runId": "proof-abort-race-1779222835236",
        "sessionKey": "main",
        "stream": "lifecycle",
        "dataPhase": "end",
        "dataStatus": "cancelled",
        "dataAborted": true
      }
    ]
  }
}
  • Observed result after fix: The immediate abort returned aborted:true for the in-flight run while chat.send still returned the existing accepted ack shape. After the send settled, a second abort returned aborted:false/runIds:[], chat.history had messageCount:0, and the dev Gateway session count stayed 0, showing no stale active run or persisted WebChat turn from the cancelled pre-dispatch send. The only run events captured were abort/cancel lifecycle events (chat state:"aborted", agent lifecycle status:"cancelled").

  • What was not tested: Manual browser UI reproduction, remote/non-loopback Gateway, and real model-provider dispatch after preprocessing; the proof intentionally cancels before dispatch.

Root Cause (if applicable)

  • Root cause: chat.send registered its AbortController only after awaited attachment/model preprocessing, while chat.abort can only cancel a run it can find in chatAbortControllers.
  • Missing detection / guardrail: coverage did not previously pin the abort-before-registration window for attachment sends.
  • Contributing context (if known): attachment preprocessing can await model-catalog checks and media staging before the agent dispatch path begins.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/gateway/server.chat.gateway-server-chat-b.test.ts
    • src/gateway/server-methods/chat.directive-tags.test.ts
    • existing abort authorization/persistence tests and src/gateway/chat-abort.test.ts
  • Scenario the test should lock in:
    • abort by run id while attachment preprocessing is pending
    • duplicate send during preprocessing returns in_flight
    • abort wins over preprocessing/staging errors once cancellation happened
    • pre-dispatch media cleanup runs when abort wins before dispatch
  • Why this is the smallest reliable guardrail:
    • the race lives inside the Gateway chat.send / chat.abort server-method boundary, before agent dispatch and without needing a real model provider.
  • Existing test that already covers this (if any):
    • none for the abort-before-registration preprocessing window before this PR.
  • If no new test is added, why not:
    • N/A, new tests are added.

User-visible / Behavior Changes

Users can abort a Gateway/WebChat chat.send immediately after sending, including while attachment preprocessing is still in progress. Public response shapes stay compatible: accepted sends still return {runId,status:"started"} and duplicate in-flight sends still return {runId,status:"in_flight"}.

Diagram (if applicable)

Before:
chat.send -> await attachment/model preprocessing -> register AbortController -> dispatch
chat.abort during preprocessing -> no controller found -> aborted:false

After:
chat.send -> register AbortController -> await attachment/model preprocessing -> dispatch
chat.abort during preprocessing -> controller found -> aborted:true -> no dispatch

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: local source checkout, Node v22.22.1, pnpm 11.1.0
  • Model/provider: no live model provider required; proof cancels before dispatch
  • Integration/channel (if any): local Gateway WebSocket on loopback
  • Relevant config (redacted): isolated /tmp OpenClaw dev profile with a temporary Gateway token; channels/plugins disabled for proof isolation

Steps

  1. Confirm branch hygiene against fresh upstream/main.
  2. Run focused Gateway abort/chat suites.
  3. Run pnpm tsgo.
  4. Run the live local Gateway proof shown above.
  5. Run the repo proof-policy evaluator against this ## Real behavior proof block.

Expected

  • immediate chat.abort returns aborted:true for the in-flight run
  • chat.send keeps returning {runId,status:"started"}
  • no downstream dispatch/persisted WebChat turn from the cancelled pre-dispatch send
  • real behavior proof checker accepts the proof block

Actual

  • focused suites passed: 10 files, 211 tests
  • pnpm tsgo passed
  • live Gateway proof produced aborted:true, status:"started", second abort aborted:false, chat.history.messageCount:0, and session count 0
  • scripts/github/real-behavior-proof-policy.mjs evaluator returned status:"passed"

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Post-fix commands run:

git diff --check
node scripts/run-vitest.mjs src/gateway/server.chat.gateway-server-chat-b.test.ts src/gateway/server-methods/chat.directive-tags.test.ts src/gateway/server-methods/chat.abort-authorization.test.ts src/gateway/server-methods/chat.abort-persistence.test.ts src/gateway/chat-abort.test.ts
pnpm tsgo

Result: git diff --check passed; focused suites passed 10 files / 211 tests; pnpm tsgo passed.

Human Verification (required)

  • Verified scenarios:
    • branch contains only the intended two commits and three files over fresh upstream/main
    • immediate abort during attachment preprocessing succeeds in a live local Gateway proof
    • duplicate and cleanup/error follow-up tests pass
    • no protocol/schema/status-field addition and no raw Pi transcript writes were introduced
  • Edge cases checked:
    • duplicate same-run send while preprocessing is pending
    • abort during preprocessing before dispatch
    • abort after preprocessing/staging error once cancellation already happened
    • pre-dispatch offloaded media cleanup when abort wins
    • abort authorization/persistence helper coverage still passes
  • What you did not verify:
    • manual browser UI repro
    • remote/non-loopback Gateway
    • real model-provider dispatch after preprocessing

Review Conversations

  • No bot review conversations exist yet for this PR.
  • No unresolved review conversations have been addressed by this PR yet.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: registering the abort controller earlier could expose a run to abort/maintenance cleanup before dispatch.
    • Mitigation: registration still happens only after validation/send-policy/idempotency/active-dedupe checks, and cleanup uses the active-run identity guard.
  • Risk: abort-before-dispatch could leave staged/offloaded media behind.
    • Mitigation: abort-before-dispatch cleanup deletes tracked media-store refs and sandbox-relative staged files; tests cover the cleanup path.
  • Risk: changing cancellation timing could alter public response shapes.
    • Mitigation: the implementation preserves existing {runId,status:"started"} and {runId,status:"in_flight"} response shapes, with tests covering both.

Changed files

  • src/gateway/server-methods/chat.directive-tags.test.ts (modified, +170/-0)
  • src/gateway/server-methods/chat.ts (modified, +77/-25)
  • src/gateway/server.chat.gateway-server-chat-b.test.ts (modified, +149/-4)
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When calling chat.abort immediately after chat.send (with the same runId), the abort operation sometimes fails because the abort controller has not been registered yet.

The issue occurs in src/gateway/server-methods/chat.ts (1927-1931):

const active = context.chatAbortControllers.get(runId); if (!active) { respond(true, { ok: true, aborted: false, runIds: [] }); return; }

Steps to reproduce

  1. Call chat.send with multiple images for recognition
  2. Immediately call chat.abort with the same runId
  3. Observe that chat.abort returns {ok:true, aborted:false, runIds:[]} even though the run is still acticve

Expected behavior

The chat.abort call should successfully abort the chat run even when called immediately after chat.send. The abort controller registration and the abort operation should be properly synchronized to prevent this race condition.

Actual behavior

Sometimes calling chat.abort immediately after chat.send fails, which is an occasional behavior

OpenClaw version

2026.5.7

Operating system

Ubuntu 22.04

Install method

pnpm dev

Model

qwen3.7

Provider / routing chain

default config

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The chat.abort call should successfully abort the chat run even when called immediately after chat.send. The abort controller registration and the abort operation should be properly synchronized to prevent this race condition.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: chat.abort fails when called immediately after chat.send due to race condition [2 pull requests, 1 comments, 2 participants]