openclaw - ✅(Solved) Fix Feishu: duplicate final replies can occur after model failover from rate-limited primary [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#49381Fetched 2026-04-08 00:55:44
View on GitHub
Comments
3
Participants
3
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×3cross-referenced ×1

In Feishu direct messages, a single user message can sometimes result in two final assistant replies when the primary model fails with rate limiting and OpenClaw falls back to a secondary model.

This appears to be more than a Feishu rendering problem. Based on logs, the likely issue is in the failover + final reply delivery coordination path.

Error Message

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later. [agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.

Root Cause

This has two user-visible / operational effects:

  1. Duplicate user-facing replies in Feishu
  2. Extra model invocations / higher cost, because the system frequently attempts the rate-limited primary model first and then falls back

Fix Action

Fix / Workaround

  1. User sends a single Feishu DM.
  2. OpenClaw dispatches the request once.
  3. Primary model (openai-codex/gpt-5.4) fails with rate limit.
  4. OpenClaw falls back to cliproxyapi/gpt-5.4.
  5. In some runs, Feishu logs show:
    • dispatch complete (queuedFinal=true, replies=2)

This suggests that one inbound message / one dispatch can produce two final replies.

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=2)

PR fix notes

PR #59771: Fix Feishu streaming recovery after transient reply errors

Description (problem / solution / changelog)

Summary

  • keep the active Feishu streaming card open when onError fires for a reply item
  • add a regression test covering partial streaming content followed by a transient error and a recovered final reply
  • preserve the existing idle close path for genuinely incomplete replies

Why

We reproduced a Feishu duplicate-reply bug locally on OpenClaw 2026.4.1 where a single inbound DM could produce:

  • one visible streaming card carrying partial/final-looking text from an errored attempt
  • then a second visible final reply after the upstream run recovered

The proximate cause is that createFeishuReplyDispatcher() currently calls closeStreaming() inside onError. That finalizes the active card too early, so any later failover/retry has to create a second card/message.

With this change, transient reply errors no longer finalize the current streaming card. If the run later recovers, the final text closes the original card instead of producing a second visible reply.

Refs: #49381

Testing

  • pnpm exec vitest run --config vitest.extensions.config.ts extensions/feishu/src/reply-dispatcher.test.ts
  • pnpm exec vitest run --config vitest.extensions.config.ts extensions/feishu/src/reply-dispatcher.test.ts extensions/feishu/src/streaming-card.test.ts

Changed files

  • extensions/feishu/src/reply-dispatcher.test.ts (modified, +39/-0)
  • extensions/feishu/src/reply-dispatcher.ts (modified, +20/-4)
  • extensions/feishu/src/streaming-card.test.ts (modified, +46/-1)
  • extensions/feishu/src/streaming-card.ts (modified, +14/-0)

Code Example

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=2)

---

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=1)

---

[model-fallback/decision] model fallback decision: decision=probe_cooldown_candidate requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[agent/embedded] probing cooldowned auth profile for openai-codex/gpt-5.4 due to rate_limit unavailability
RAW_BUFFERClick to expand / collapse

Summary

In Feishu direct messages, a single user message can sometimes result in two final assistant replies when the primary model fails with rate limiting and OpenClaw falls back to a secondary model.

This appears to be more than a Feishu rendering problem. Based on logs, the likely issue is in the failover + final reply delivery coordination path.

Environment

  • OpenClaw version: 2026.3.13
  • Channel: feishu
  • Connection mode: websocket
  • Feishu channel config tested with:
    • renderMode: "card"
    • streaming: false
    • blockStreaming: false
  • Primary model in observed runs: openai-codex/gpt-5.4
  • Fallback model in observed runs: cliproxyapi/gpt-5.4

What happens

Observed behavior pattern:

  1. User sends a single Feishu DM.
  2. OpenClaw dispatches the request once.
  3. Primary model (openai-codex/gpt-5.4) fails with rate limit.
  4. OpenClaw falls back to cliproxyapi/gpt-5.4.
  5. In some runs, Feishu logs show:
    • dispatch complete (queuedFinal=true, replies=2)

This suggests that one inbound message / one dispatch can produce two final replies.

Important finding

Disabling Feishu streaming did not fully solve the issue:

  • streaming: false
  • blockStreaming: false

After disabling both, duplicate final replies could still occur.

That makes this look less like a card streaming problem and more like a final reply dedupe / failover delivery coordination issue.

Sanitized log excerpts

Duplicate final reply after fallback success

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=2)

Another run where only one reply was sent

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=1)

Cooldown probe behavior also observed

[model-fallback/decision] model fallback decision: decision=probe_cooldown_candidate requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[agent/embedded] probing cooldowned auth profile for openai-codex/gpt-5.4 due to rate_limit unavailability

Why this matters

This has two user-visible / operational effects:

  1. Duplicate user-facing replies in Feishu
  2. Extra model invocations / higher cost, because the system frequently attempts the rate-limited primary model first and then falls back

What I checked already

  • Upgraded from 2026.3.8 to 2026.3.13
  • Disabled Feishu streaming
  • Disabled Feishu block streaming
  • Confirmed Feishu is running in websocket mode
  • Confirmed only one gateway process is active
  • Confirmed duplicate replies can still happen even with streaming disabled

Likely root cause

My current hypothesis:

  • The issue is in reply finalization after failover, not only in Feishu channel rendering.
  • A single dispatch may pass through multiple final-delivery paths after fallback succeeds.
  • Existing queuedFinal / final reply coordination appears insufficient for this Feishu + failover scenario.

Possible improvements

A few ideas that might help:

  1. Stronger single-final-reply guard per dispatch

    • Ensure one inbound dispatch can only emit one final reply regardless of failover path.
  2. Optional model-level cooldown / temporary bypass

    • Today auth-profile cooldown exists, but the system still probes the primary model path aggressively.
    • A model-level temporary bypass after repeated rate_limit failures could reduce both duplicate-reply risk and wasted attempts.
  3. Configurable disable for transient cooldown probing

    • Something like an opt-out for probe_cooldown_candidate behavior might help environments where repeated probing is undesirable.

Question

Is this a known issue in the Feishu + failover path?

If needed, I can provide more sanitized logs / reproduction notes, but I intentionally removed all private chat identifiers and user identifiers here.

extent analysis

Fix Plan

To address the issue of duplicate final replies in Feishu direct messages after a model fallback, we will implement a stronger single-final-reply guard per dispatch. This involves ensuring that one inbound dispatch can only emit one final reply regardless of the failover path.

Step 1: Implement a Dispatch-Level Flag

Introduce a flag at the dispatch level to track whether a final reply has been sent. This flag will be checked before sending any final reply, preventing duplicates.

class Dispatch:
    def __init__(self):
        self.final_reply_sent = False

    def send_final_reply(self, reply):
        if not self.final_reply_sent:
            # Send the final reply
            self.final_reply_sent = True
        else:
            # Log or handle the attempt to send a duplicate final reply
            pass

Step 2: Integrate with Model Fallback Logic

Modify the model fallback logic to respect the dispatch-level flag. When a fallback occurs, check the flag before sending a final reply from the secondary model.

def handle_model_fallback(dispatch, primary_model, secondary_model):
    # Fallback logic
    if primary_modelfails:
        if not dispatch.final_reply_sent:
            # Send final reply from secondary model
            dispatch.send_final_reply(secondary_model_reply)
        else:
            # Handle the case where a final reply has already been sent
            pass

Step 3: Optional Model-Level Cooldown

Implement an optional model-level cooldown or temporary bypass after repeated rate_limit failures to reduce duplicate-reply risk and wasted attempts.

class Model:
    def __init__(self):
        self.cooldown_until = None

    def is_on_cooldown(self):
        return self.cooldown_until is not None and self.cooldown_until > datetime.now()

    def trigger_cooldown(self, duration):
        self.cooldown_until = datetime.now() + duration

def handle_rate_limit_failure(model):
    if model.is_on_cooldown():
        # Handle cooldown, possibly bypassing the model temporarily
        pass
    else:
        # Trigger cooldown
        model.trigger_cooldown(duration=30 minutes)

Verification

To verify that the fix worked, monitor the Feishu direct messages for duplicate final replies after model fallbacks. Also, check the logs for attempts to send duplicate final replies, which should now be prevented by the dispatch-level flag.

Extra Tips

  • Regularly review and adjust the cooldown durations and logic to balance between preventing duplicate replies and minimizing the impact on system responsiveness.
  • Consider implementing a configurable disable for transient cooldown probing to accommodate different environment needs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING