openclaw - ✅(Solved) Fix [Bug]: Agent loop does not terminate after final response when Queued messages exist in context — causes full task replay [1 pull requests, 5 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#50956Fetched 2026-04-08 01:06:18
View on GitHub
Comments
5
Participants
3
Timeline
10
Reactions
0
Author
Participants
Timeline (top)
commented ×5cross-referenced ×2labeled ×2referenced ×1

Environment

  • OpenClaw version: 2026.3.12 (6472949) (confirmed on session dated 2026-03-12)
  • Channel: POPO IM (custom integration via aigateway-xxxxx-com)
  • Model: kimi-k2.5
  • OS: Linux server (Debian 12)

Description

When an external message arrives while the agent is busy executing a task, OpenClaw queues it as [Queued messages while agent was busy] and inserts it into the session's parentId chain. After the current task completes and the assistant outputs a final summary (no toolCall), the agent loop does not terminate. Instead, OpenClaw continues invoking the model, which sees the unhandled Queued user message in context and proceeds to fully replay the previous task — generating 10+ consecutive assistant messages within ~1.5 seconds, all without any real tool execution.

Restarting OpenClaw does not fix the issue. Deleting the session JSONL files and restarting does.

Reproduction Steps

  1. Configure an agent with an external IM integration (POPO, Telegram, etc.)
  2. Send a message that triggers a long multi-step task (multiple toolCall/toolResult cycles)
  3. While the agent is executing, send a new message from the external channel
  4. Observe [Queued messages while agent was busy] inserted into the session
  5. Wait for the current task to complete (assistant outputs final text-only summary)
  6. Observe: the agent does not stop — it immediately continues generating new assistant messages

Expected Behavior

After the assistant outputs a final response with no toolCall, the agent loop should terminate. Queued messages should be handled as a new turn, not as continuation of the current loop.

Actual Behavior

The agent loop continues. The model is called again with the full context (which includes the Queued user message) and begins replaying the previously completed task — outputting 10+ consecutive assistant messages at ~150ms intervals, none of which execute any tools.

Session Log Evidence

Analyzed from JSONL session logs (431 lines, session e999671b-...):

MetricValue
Total toolCall records145
Total toolResult records in replay sequences0
[Queued messages while agent was busy] user messages20
Consecutive assistant sequences (length > 1)20 groups
Longest consecutive assistant sequence11 messages in 1.36 seconds

The replayed content is byte-for-byte identical to earlier messages in the session:

Line 149 (original): "好的,我来更新配置:1. 每天提醒加入本日天气..."
Line 182 (replay):   "好的,我来更新配置:1. 每天提醒加入本日天气..."  ← identical

Timing of the 11-message replay sequence:

03:10:34.296Z → normal final summary (line 181)
03:10:34.502Z → replay begins (206ms later)
03:10:34.653Z → (151ms)
03:10:34.772Z → (119ms)
...
03:10:35.655Z → ends (11 messages, 1.36 seconds total)

Replay sequence length scales with context size — when context had ~50 messages, sequences were 2–3 messages long; by the time context reached 180 messages, sequences grew to 11.

Root Cause (Hypothesis)

The agent loop termination condition does not distinguish between:

  1. A Queued message already present in historical context (inserted mid-task)
  2. A new user message arriving after the current task completed

After the assistant produces a text-only final response, the loop should stop. Instead it appears to scan the full session chain for any unanswered user message — finds the Queued message — and invokes the model again.

Why Deleting Session Files Fixes It

The corrupted context chain is persisted in the JSONL file. On restart, OpenClaw resumes from the same poisoned context. Deleting the file removes the chain entirely, so the next session starts clean.

Workaround

Delete session JSONL files and restart. (Restart alone is insufficient.)

Related Issues

  • #30604Followup queue delivers same message multiple times when agent is busy: upstream/related at the queue layer. PR #46170 was opened to fix it but closed by the author without merging.
  • #35092/new does not flush queued messages: corroborates why session deletion is required for recovery.
  • #50892 — Discord collect-mode duplicate delivery: superficially similar but different mechanism; confirmed not the same issue.

The core problem described here — agent loop not terminating after final response when Queued messages exist in context — does not appear to have an existing tracking issue.

Root Cause

Root Cause (Hypothesis)

Fix Action

Fix / Workaround

Workaround

PR fix notes

PR #51298: fix(agent-loop): terminate loop after final response when queue items predate run start (#50956)

Description (problem / solution / changelog)

Summary

  • Problem: When queued messages exist in session history, the agent loop replays the previous task infinitely after completing. Items enqueued before the current run started are treated as new messages, causing the agent to re-process them.
  • Why it matters: Agents enter an infinite loop after completing their task, consuming tokens and possibly producing duplicate responses.
  • What changed: Added runStartedAt timestamp to scheduleFollowupDrain and finalizeWithFollowup. In the drain loop, items enqueued before the run started are dropped (stale context). Only items enqueued AFTER the run started are processed.
  • What did NOT change: New messages (enqueued after run start) still flow normally. All other queue logic unchanged.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #50956

User-visible / Behavior Changes

Agent loops that previously ran infinitely after completing will now terminate correctly. The agent will only respond to messages that arrived after the current run ended.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS / Linux
  • Runtime/container: Node.js
  • Model/provider: any
  • Integration/channel: any

Steps

  1. Have a session with queued messages in context from a previous run
  2. Send a new message that produces a final text-only response (no tool calls)
  3. Before fix: agent loop replays the previous task infinitely
  4. After fix: agent terminates after the final response

Expected

  • Agent terminates after sending final text-only response
  • New messages arriving AFTER run ends are still processed

Actual (before fix)

  • Agent replays previous task in infinite loop

Evidence

  • Code change: stale queue items filtered in drain.ts by comparing enqueuedAt vs runStartedAt

Human Verification (required)

  • Verified build compiles cleanly (pnpm build passes)
  • Logic: runStartedAt = Date.now() captured before agent run; items with enqueuedAt <= runStartedAt are stale
  • Edge case: runStartedAt is optional; when not passed (e.g. kickFollowupDrainIfIdle), no filtering applied (backward compatible)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes (runStartedAt is optional)
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable: remove runStartedAt parameter from finalizeWithFollowup calls in agent-runner.ts
  • Files to restore: src/auto-reply/reply/agent-runner.ts, src/auto-reply/reply/agent-runner-helpers.ts, src/auto-reply/reply/queue/drain.ts
  • Bad symptoms: messages dropped unexpectedly → check runStartedAt timestamp accuracy

Risks and Mitigations

  • Risk: A message enqueued exactly at runStartedAt (same millisecond) could be dropped
    • Mitigation: Extremely unlikely in practice; uses <= to drop items from before/at run start, ensuring items arriving strictly after are processed

Changed files

  • src/auto-reply/reply/agent-runner-helpers.ts (modified, +2/-1)
  • src/auto-reply/reply/agent-runner.ts (modified, +5/-4)
  • src/auto-reply/reply/queue/drain.ts (modified, +18/-0)

Code Example

Line 149 (original): "好的,我来更新配置:1. 每天提醒加入本日天气..."
  Line 182 (replay):   "好的,我来更新配置:1. 每天提醒加入本日天气..."  ← identical

---

03:10:34.296Z → normal final summary (line 181)
  03:10:34.502Z → replay begins (206ms later)
  03:10:34.653Z  (151ms)
  03:10:34.772Z  (119ms)
  ...
  03:10:35.655Z → ends (11 messages, 1.36 seconds total)

---
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

Environment

  • OpenClaw version: 2026.3.12 (6472949) (confirmed on session dated 2026-03-12)
  • Channel: POPO IM (custom integration via aigateway-xxxxx-com)
  • Model: kimi-k2.5
  • OS: Linux server (Debian 12)

Description

When an external message arrives while the agent is busy executing a task, OpenClaw queues it as [Queued messages while agent was busy] and inserts it into the session's parentId chain. After the current task completes and the assistant outputs a final summary (no toolCall), the agent loop does not terminate. Instead, OpenClaw continues invoking the model, which sees the unhandled Queued user message in context and proceeds to fully replay the previous task — generating 10+ consecutive assistant messages within ~1.5 seconds, all without any real tool execution.

Restarting OpenClaw does not fix the issue. Deleting the session JSONL files and restarting does.

Reproduction Steps

  1. Configure an agent with an external IM integration (POPO, Telegram, etc.)
  2. Send a message that triggers a long multi-step task (multiple toolCall/toolResult cycles)
  3. While the agent is executing, send a new message from the external channel
  4. Observe [Queued messages while agent was busy] inserted into the session
  5. Wait for the current task to complete (assistant outputs final text-only summary)
  6. Observe: the agent does not stop — it immediately continues generating new assistant messages

Expected Behavior

After the assistant outputs a final response with no toolCall, the agent loop should terminate. Queued messages should be handled as a new turn, not as continuation of the current loop.

Actual Behavior

The agent loop continues. The model is called again with the full context (which includes the Queued user message) and begins replaying the previously completed task — outputting 10+ consecutive assistant messages at ~150ms intervals, none of which execute any tools.

Session Log Evidence

Analyzed from JSONL session logs (431 lines, session e999671b-...):

MetricValue
Total toolCall records145
Total toolResult records in replay sequences0
[Queued messages while agent was busy] user messages20
Consecutive assistant sequences (length > 1)20 groups
Longest consecutive assistant sequence11 messages in 1.36 seconds

The replayed content is byte-for-byte identical to earlier messages in the session:

Line 149 (original): "好的,我来更新配置:1. 每天提醒加入本日天气..."
Line 182 (replay):   "好的,我来更新配置:1. 每天提醒加入本日天气..."  ← identical

Timing of the 11-message replay sequence:

03:10:34.296Z → normal final summary (line 181)
03:10:34.502Z → replay begins (206ms later)
03:10:34.653Z → (151ms)
03:10:34.772Z → (119ms)
...
03:10:35.655Z → ends (11 messages, 1.36 seconds total)

Replay sequence length scales with context size — when context had ~50 messages, sequences were 2–3 messages long; by the time context reached 180 messages, sequences grew to 11.

Root Cause (Hypothesis)

The agent loop termination condition does not distinguish between:

  1. A Queued message already present in historical context (inserted mid-task)
  2. A new user message arriving after the current task completed

After the assistant produces a text-only final response, the loop should stop. Instead it appears to scan the full session chain for any unanswered user message — finds the Queued message — and invokes the model again.

Why Deleting Session Files Fixes It

The corrupted context chain is persisted in the JSONL file. On restart, OpenClaw resumes from the same poisoned context. Deleting the file removes the chain entirely, so the next session starts clean.

Workaround

Delete session JSONL files and restart. (Restart alone is insufficient.)

Related Issues

  • #30604Followup queue delivers same message multiple times when agent is busy: upstream/related at the queue layer. PR #46170 was opened to fix it but closed by the author without merging.
  • #35092/new does not flush queued messages: corroborates why session deletion is required for recovery.
  • #50892 — Discord collect-mode duplicate delivery: superficially similar but different mechanism; confirmed not the same issue.

The core problem described here — agent loop not terminating after final response when Queued messages exist in context — does not appear to have an existing tracking issue.

Steps to reproduce

  1. Configure an agent with an external IM integration (POPO, Telegram, etc.)
  2. Send a message that triggers a long multi-step task (multiple toolCall/toolResult cycles)
  3. While the agent is executing, send a new message from the external channel
  4. Observe [Queued messages while agent was busy] inserted into the session
  5. Wait for the current task to complete (assistant outputs final text-only summary)
  6. Observe: the agent does not stop — it immediately continues generating new assistant messages

Expected behavior

After the assistant outputs a final response with no toolCall, the agent loop should terminate. Queued messages should be handled as a new turn, not as continuation of the current loop.

Actual behavior

The agent loop continues. The model is called again with the full context (which includes the Queued user message) and begins replaying the previously completed task — outputting 10+ consecutive assistant messages at ~150ms intervals, none of which execute any tools.

OpenClaw version

2026.3.12 (6472949)

Operating system

Debin12

Install method

No response

Model

kimi-k2.5

Provider / routing chain

openclaw -> aigw.xxx.com -> kimi

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

Fix Plan

To address the issue of the agent loop not terminating after a final response when queued messages exist in the context, we need to modify the termination condition to distinguish between queued messages already present in the historical context and new user messages arriving after the current task completed.

Here are the steps to implement the fix:

  1. Modify the Agent Loop Termination Condition:

    • Check if there are any new user messages that arrived after the current task completed.
    • If yes, do not terminate the loop but instead handle the new message as a new turn.
    • If not, and the current task has completed with a final text-only summary, terminate the loop.
  2. Implement a Mechanism to Track New User Messages:

    • Introduce a flag or a timestamp to mark when the current task started and completed.
    • When a new user message arrives, check if it arrived after the current task completed. If so, mark it as a new message to be handled in a new turn.
  3. Update the Context Handling:

    • When handling a new user message, ensure that the context is updated correctly to reflect the new turn.
    • Remove or ignore any queued messages that were part of the previous task's context to prevent replaying the previous task.

Example code snippet in Python to illustrate the modified termination condition and new message handling:

def check_termination_condition(current_task_completed, new_user_message_arrived):
    if current_task_completed and not new_user_message_arrived:
        # Terminate the loop if the task is completed and no new user message has arrived
        return True
    elif new_user_message_arrived:
        # Handle the new user message as a new turn
        handle_new_user_message()
        return False
    else:
        # Continue the loop if the task is not completed or a new user message has arrived
        return False

def handle_new_user_message():
    # Update the context to reflect the new turn
    update_context()
    # Remove or ignore any queued messages from the previous task's context
    clear_queued_messages()

# Example usage
current_task_completed = True
new_user_message_arrived = False

if check_termination_condition(current_task_completed, new_user_message_arrived):
    # Terminate the agent loop
    terminate_agent_loop()
else:
    # Continue the agent loop
    continue_agent_loop()

Verification

To verify that the fix worked, follow these steps:

  1. Reproduce the Issue: Follow the reproduction steps provided in the issue description to reproduce the problem.
  2. Apply the Fix: Implement the modified termination condition and new message handling mechanism as described in the fix plan.
  3. Test the Fix: Repeat the reproduction steps after applying the fix to ensure that the agent loop terminates correctly after a final response when queued messages exist in the context.
  4. Verify the Behavior: Observe the agent's

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After the assistant outputs a final response with no toolCall, the agent loop should terminate. Queued messages should be handled as a new turn, not as continuation of the current loop.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING