openclaw - ✅(Solved) Fix Feishu: duplicate final replies can occur after model failover from rate-limited primary [1 pull requests, 3 comments, 3 participants]

openclaw2026-03-18 02:30:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#49381•Fetched 2026-04-08 00:55:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3cross-referenced ×1

In Feishu direct messages, a single user message can sometimes result in two final assistant replies when the primary model fails with rate limiting and OpenClaw falls back to a secondary model.

This appears to be more than a Feishu rendering problem. Based on logs, the likely issue is in the failover + final reply delivery coordination path.

Error Message

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later. [agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.

Root Cause

This has two user-visible / operational effects:

Duplicate user-facing replies in Feishu
Extra model invocations / higher cost, because the system frequently attempts the rate-limited primary model first and then falls back

Fix Action

Fix / Workaround

User sends a single Feishu DM.
OpenClaw dispatches the request once.
Primary model (openai-codex/gpt-5.4) fails with rate limit.
OpenClaw falls back to cliproxyapi/gpt-5.4.
In some runs, Feishu logs show:
- dispatch complete (queuedFinal=true, replies=2)

This suggests that one inbound message / one dispatch can produce two final replies.

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=2)

PR fix notes

PR #59771: Fix Feishu streaming recovery after transient reply errors

Repository: openclaw/openclaw
Author: Vicky-v7
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/59771

Description (problem / solution / changelog)

Summary

keep the active Feishu streaming card open when onError fires for a reply item
add a regression test covering partial streaming content followed by a transient error and a recovered final reply
preserve the existing idle close path for genuinely incomplete replies

Why

We reproduced a Feishu duplicate-reply bug locally on OpenClaw 2026.4.1 where a single inbound DM could produce:

one visible streaming card carrying partial/final-looking text from an errored attempt
then a second visible final reply after the upstream run recovered

The proximate cause is that createFeishuReplyDispatcher() currently calls closeStreaming() inside onError. That finalizes the active card too early, so any later failover/retry has to create a second card/message.

With this change, transient reply errors no longer finalize the current streaming card. If the run later recovers, the final text closes the original card instead of producing a second visible reply.

Refs: #49381

Testing

pnpm exec vitest run --config vitest.extensions.config.ts extensions/feishu/src/reply-dispatcher.test.ts
pnpm exec vitest run --config vitest.extensions.config.ts extensions/feishu/src/reply-dispatcher.test.ts extensions/feishu/src/streaming-card.test.ts

Changed files

extensions/feishu/src/reply-dispatcher.test.ts (modified, +39/-0)
extensions/feishu/src/reply-dispatcher.ts (modified, +20/-4)
extensions/feishu/src/streaming-card.test.ts (modified, +46/-1)
extensions/feishu/src/streaming-card.ts (modified, +14/-0)

Code Example

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=2)

---

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=1)

---

[model-fallback/decision] model fallback decision: decision=probe_cooldown_candidate requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[agent/embedded] probing cooldowned auth profile for openai-codex/gpt-5.4 due to rate_limit unavailability

RAW_BUFFERClick to expand / collapse

Summary

In Feishu direct messages, a single user message can sometimes result in two final assistant replies when the primary model fails with rate limiting and OpenClaw falls back to a secondary model.

This appears to be more than a Feishu rendering problem. Based on logs, the likely issue is in the failover + final reply delivery coordination path.

Environment

OpenClaw version: 2026.3.13
Channel: feishu
Connection mode: websocket
Feishu channel config tested with:
- renderMode: "card"
- streaming: false
- blockStreaming: false
Primary model in observed runs: openai-codex/gpt-5.4
Fallback model in observed runs: cliproxyapi/gpt-5.4

What happens

Observed behavior pattern:

User sends a single Feishu DM.
OpenClaw dispatches the request once.
Primary model (openai-codex/gpt-5.4) fails with rate limit.
OpenClaw falls back to cliproxyapi/gpt-5.4.
In some runs, Feishu logs show:
- dispatch complete (queuedFinal=true, replies=2)

This suggests that one inbound message / one dispatch can produce two final replies.

Important finding

Disabling Feishu streaming did not fully solve the issue:

streaming: false
blockStreaming: false

After disabling both, duplicate final replies could still occur.

That makes this look less like a card streaming problem and more like a final reply dedupe / failover delivery coordination issue.

Sanitized log excerpts

Duplicate final reply after fallback success

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=2)

Another run where only one reply was sent

[agent/embedded] embedded run agent end: ... isError=true model=gpt-5.4 provider=openai-codex error=⚠️ API rate limit reached. Please try again later.
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.4 candidate=cliproxyapi/gpt-5.4 reason=unknown next=none
[feishu] feishu[main]: dispatch complete (queuedFinal=true, replies=1)

Cooldown probe behavior also observed

[model-fallback/decision] model fallback decision: decision=probe_cooldown_candidate requested=openai-codex/gpt-5.4 candidate=openai-codex/gpt-5.4 reason=rate_limit next=cliproxyapi/gpt-5.4
[agent/embedded] probing cooldowned auth profile for openai-codex/gpt-5.4 due to rate_limit unavailability

Why this matters

This has two user-visible / operational effects:

Duplicate user-facing replies in Feishu
Extra model invocations / higher cost, because the system frequently attempts the rate-limited primary model first and then falls back

What I checked already

Upgraded from 2026.3.8 to 2026.3.13
Disabled Feishu streaming
Disabled Feishu block streaming
Confirmed Feishu is running in websocket mode
Confirmed only one gateway process is active
Confirmed duplicate replies can still happen even with streaming disabled

Likely root cause

My current hypothesis:

The issue is in reply finalization after failover, not only in Feishu channel rendering.
A single dispatch may pass through multiple final-delivery paths after fallback succeeds.
Existing queuedFinal / final reply coordination appears insufficient for this Feishu + failover scenario.

Possible improvements

A few ideas that might help:

Stronger single-final-reply guard per dispatch
- Ensure one inbound dispatch can only emit one final reply regardless of failover path.
Optional model-level cooldown / temporary bypass
- Today auth-profile cooldown exists, but the system still probes the primary model path aggressively.
- A model-level temporary bypass after repeated rate_limit failures could reduce both duplicate-reply risk and wasted attempts.
Configurable disable for transient cooldown probing
- Something like an opt-out for probe_cooldown_candidate behavior might help environments where repeated probing is undesirable.

Question

Is this a known issue in the Feishu + failover path?

If needed, I can provide more sanitized logs / reproduction notes, but I intentionally removed all private chat identifiers and user identifiers here.

extent analysis

Fix Plan

To address the issue of duplicate final replies in Feishu direct messages after a model fallback, we will implement a stronger single-final-reply guard per dispatch. This involves ensuring that one inbound dispatch can only emit one final reply regardless of the failover path.

Step 1: Implement a Dispatch-Level Flag

Introduce a flag at the dispatch level to track whether a final reply has been sent. This flag will be checked before sending any final reply, preventing duplicates.

class Dispatch:
    def __init__(self):
        self.final_reply_sent = False

    def send_final_reply(self, reply):
        if not self.final_reply_sent:
            # Send the final reply
            self.final_reply_sent = True
        else:
            # Log or handle the attempt to send a duplicate final reply
            pass

Step 2: Integrate with Model Fallback Logic

Modify the model fallback logic to respect the dispatch-level flag. When a fallback occurs, check the flag before sending a final reply from the secondary model.

def handle_model_fallback(dispatch, primary_model, secondary_model):
    # Fallback logic
    if primary_modelfails:
        if not dispatch.final_reply_sent:
            # Send final reply from secondary model
            dispatch.send_final_reply(secondary_model_reply)
        else:
            # Handle the case where a final reply has already been sent
            pass

Step 3: Optional Model-Level Cooldown

Implement an optional model-level cooldown or temporary bypass after repeated rate_limit failures to reduce duplicate-reply risk and wasted attempts.

class Model:
    def __init__(self):
        self.cooldown_until = None

    def is_on_cooldown(self):
        return self.cooldown_until is not None and self.cooldown_until > datetime.now()

    def trigger_cooldown(self, duration):
        self.cooldown_until = datetime.now() + duration

def handle_rate_limit_failure(model):
    if model.is_on_cooldown():
        # Handle cooldown, possibly bypassing the model temporarily
        pass
    else:
        # Trigger cooldown
        model.trigger_cooldown(duration=30 minutes)

Verification

To verify that the fix worked, monitor the Feishu direct messages for duplicate final replies after model fallbacks. Also, check the logs for attempts to send duplicate final replies, which should now be prevented by the dispatch-level flag.

Extra Tips

Regularly review and adjust the cooldown durations and logic to balance between preventing duplicate replies and minimizing the impact on system responsiveness.
Consider implementing a configurable disable for transient cooldown probing to accommodate different environment needs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #API rate limit #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Feishu: duplicate final replies can occur after model failover from rate-limited primary [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #59771: Fix Feishu streaming recovery after transient reply errors

Description (problem / solution / changelog)

Summary

Why

Testing

Changed files

Code Example

Summary

Environment

What happens

Important finding

Sanitized log excerpts

Duplicate final reply after fallback success

Another run where only one reply was sent

Cooldown probe behavior also observed

Why this matters

What I checked already

Likely root cause

Possible improvements

Question

extent analysis

Fix Plan

Step 1: Implement a Dispatch-Level Flag

Step 2: Integrate with Model Fallback Logic

Step 3: Optional Model-Level Cooldown

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING