openclaw - ✅(Solved) Fix Agent run timeout during tool execution misclassified as LLM timeout, triggers unnecessary model fallback [1 pull requests, 3 comments, 3 participants]

andychu666 · 2026-03-22T08:24:12Z

[openclaw] Agent run timeout during long tool execution e.g. process poll is misclassified as "LLM request timed out", triggering unnecessary model fallback —… Agent run timeout during long tool execution (e.g. `process(poll)`) is misclassified as "LLM request timed out", triggering unnecessary model fallback — even though the primary model responded correctly. # PR #320: feat: OpenClaw adapter refactor + board UI redesign - Repository: charannyk06/conductor-oss - Author: charannyk06 - State: closed | merged: True - Link: https://github.com/charannyk06/conductor-oss/pull/320 ## Description (problem / solution / changelog) ## Summary Two-part PR combining OpenClaw adapter improvements with a board UI redesign inspired by Composio Agent Orchestrator and OpenAI Symphony. **Commit 1: OpenClaw adapter refactor** - Restructured Rust payload types (`ChatPayload` → `AgentPayload`, simplified stream handling) - Switched from `chat.send` to `agent` method for gateway communication - Added timeout handling (`timeout_ms`, `wait_timeout_ms`) to send requests - Updated dispatcher routes for new adapter shape - Frontend: suppress model/reasoning selection when agent is "openclaw" (backend manages config) - Hide model controls in DispatcherPreferenceChips for OpenClaw agent - Added test for OpenClaw model selection suppression **Commit 2: Board UI redesign** - New `CIBadge` component: standalone CI status badge with individual check list - New `DiffSizeBadge` component: compact diff size pill (+/- with XS-XL label) - Complete `SessionCard` rewrite: attention-aware styling, alert pills, quick reply, done variant, dynamic borders - `WorkspaceOverview`: replaced simple session buttons with rich SessionCard components - Symphony-style stat card captions ("Agent sessions in progress", "Cleared to land", etc.) - Kanban column captions ("Agents working", "Human needed", "Ready to land") - Added `additions`/`deletions` fields to `DashboardPR` type Closes # ## User-Facing Release Notes - Session cards now show CI status, diff size, alert pills, and inline agent messaging directly on the card - Dashboard stat cards include descriptive captions explaining each metric - Kanban board columns show contextual subtitles explaining their purpose - Done sessions render as compact cards with expandable detail panels - Cards visually indicate priority: merge-ready sessions get a green border glow ## Type of Change - [ ] Bug fix - [x] New feature - [ ] Breaking change - [ ] Plugin addition / modification - [ ] Documentation update - [ ] Refactor / chore ## Checklist - [x] `pnpm build` passes with no errors - [x] `pnpm typecheck` passes with no errors - [x] `cargo clippy --workspace -- -D warnings` passes - [x] `cargo test --workspace` passes - [x] No `any` types introduced without justification - [x] No secrets or credentials committed - [x] User-facing release notes are filled in plain English - [x] PR title follows conventional commits ## Testing ```bash # Rust checks cargo clippy --workspace -- -D warnings cargo test --workspace # Frontend checks bun run --cwd packages/core build bun run --cwd packages/cli build bun run --cwd packages/web build bun run --cwd packages/web typecheck ``` ## Screenshots / Demo Board column captions and new session cards are visible in the dashboard at `http://localhost:3000`. Cards show: - Activity dot with pulse animation for active agents - CI status badges and diff size pills - Color-coded alert pills for issues needing attention - Inline quick-reply for messaging agents - Done card variant with expandable detail panel ## Summary by CodeRabbit ## Release Notes * **New Features** * Added CI status badge component with dynamic check counts and links to pull request checks * Added diff size badge displaying additions and deletions * **Improvements** * Refactored session cards with new interactive selection model and consolidated badge displays * Enhanced workspace dashboard with captions for statistics and improved session list layout * Workspace board now displays role captions beneath column headers * Optimized model and reasoning control visibility for better UI flow * Updated OpenClaw installation guidance to reference backend runtime settings ## Changed files - `crates/conductor-executors/src/agents/openclaw.rs` (modified, +299/-209) - `crates/conductor-server/src/routes/agents.rs` (modified, +1/-1) - `crates/conductor-server/src/routes/dispatcher.rs` (modified, +43/-4) - `packages/web/src/components/CIBadge.tsx` (added, +140/-0) - `packages/web/src/components/DiffSizeBadge.tsx` (added, +30/-0) - `packages/web/src/components/SessionCard.tsx` (modified, +438/-277) - `packages/web/src/components/board/WorkspaceKanban.tsx` (modified, +17/-0) - `packages/web/src/components/dispatcher/DispatcherPreferenceChips.tsx` (modified, +58/-55) - `packages/

openclaw2026-03-22 08:24:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#52147•Fetched 2026-04-08 01:15:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3cross-referenced ×3mentioned ×1subscribed ×1

Agent run timeout during long tool execution (e.g. process(poll)) is misclassified as "LLM request timed out", triggering unnecessary model fallback — even though the primary model responded correctly.

Error Message

WARN embedded run timeout: runId=<redacted> sessionId=<redacted> timeoutMs=600000 DEBUG run cleanup: runId=<redacted> sessionId=<redacted> aborted=true timedOut=true WARN embedded_run_failover_decision: stage=assistant decision=fallback_model failoverReason=timeout provider=anthropic model=claude-opus-4-6 timedOut=true status=408 ERROR lane task error: lane=main durationMs=630119 error="FailoverError: LLM request timed out."

Root Cause

Failover condition (line 111112):

if (!aborted && failoverFailure || timedOut && !timedOutDuringCompaction) {
    // → enters fallback branch even when timeout was caused by tool execution
}

PR fix notes

PR #320: feat: OpenClaw adapter refactor + board UI redesign

Repository: charannyk06/conductor-oss
Author: charannyk06
State: closed | merged: True
Link: https://github.com/charannyk06/conductor-oss/pull/320

Description (problem / solution / changelog)

Summary

Two-part PR combining OpenClaw adapter improvements with a board UI redesign inspired by Composio Agent Orchestrator and OpenAI Symphony.

Commit 1: OpenClaw adapter refactor

Restructured Rust payload types (ChatPayload → AgentPayload, simplified stream handling)
Switched from chat.send to agent method for gateway communication
Added timeout handling (timeout_ms, wait_timeout_ms) to send requests
Updated dispatcher routes for new adapter shape
Frontend: suppress model/reasoning selection when agent is "openclaw" (backend manages config)
Hide model controls in DispatcherPreferenceChips for OpenClaw agent
Added test for OpenClaw model selection suppression

Commit 2: Board UI redesign

New CIBadge component: standalone CI status badge with individual check list
New DiffSizeBadge component: compact diff size pill (+/- with XS-XL label)
Complete SessionCard rewrite: attention-aware styling, alert pills, quick reply, done variant, dynamic borders
WorkspaceOverview: replaced simple session buttons with rich SessionCard components
Symphony-style stat card captions ("Agent sessions in progress", "Cleared to land", etc.)
Kanban column captions ("Agents working", "Human needed", "Ready to land")
Added additions/deletions fields to DashboardPR type

Closes #

User-Facing Release Notes

Session cards now show CI status, diff size, alert pills, and inline agent messaging directly on the card
Dashboard stat cards include descriptive captions explaining each metric
Kanban board columns show contextual subtitles explaining their purpose
Done sessions render as compact cards with expandable detail panels
Cards visually indicate priority: merge-ready sessions get a green border glow

Type of Change

Checklist

pnpm build passes with no errors
pnpm typecheck passes with no errors
cargo clippy --workspace -- -D warnings passes
cargo test --workspace passes
No any types introduced without justification
No secrets or credentials committed
User-facing release notes are filled in plain English
PR title follows conventional commits

Testing

# Rust checks
cargo clippy --workspace -- -D warnings
cargo test --workspace

# Frontend checks
bun run --cwd packages/core build
bun run --cwd packages/cli build
bun run --cwd packages/web build
bun run --cwd packages/web typecheck

Screenshots / Demo

Board column captions and new session cards are visible in the dashboard at http://localhost:3000. Cards show:

Activity dot with pulse animation for active agents
CI status badges and diff size pills
Color-coded alert pills for issues needing attention
Inline quick-reply for messaging agents
Done card variant with expandable detail panel

Summary by CodeRabbit

Release Notes

New Features
- Added CI status badge component with dynamic check counts and links to pull request checks
- Added diff size badge displaying additions and deletions
Improvements
- Refactored session cards with new interactive selection model and consolidated badge displays
- Enhanced workspace dashboard with captions for statistics and improved session list layout
- Workspace board now displays role captions beneath column headers
- Optimized model and reasoning control visibility for better UI flow
- Updated OpenClaw installation guidance to reference backend runtime settings

Changed files

crates/conductor-executors/src/agents/openclaw.rs (modified, +299/-209)
crates/conductor-server/src/routes/agents.rs (modified, +1/-1)
crates/conductor-server/src/routes/dispatcher.rs (modified, +43/-4)
packages/web/src/components/CIBadge.tsx (added, +140/-0)
packages/web/src/components/DiffSizeBadge.tsx (added, +30/-0)
packages/web/src/components/SessionCard.tsx (modified, +438/-277)
packages/web/src/components/board/WorkspaceKanban.tsx (modified, +17/-0)
packages/web/src/components/dispatcher/DispatcherPreferenceChips.tsx (modified, +58/-55)
packages/web/src/features/dashboard/components/WorkspaceOverview.tsx (modified, +14/-108)
packages/web/src/lib/agentModelSelection.test.ts (modified, +17/-0)
packages/web/src/lib/agentModelSelection.ts (modified, +28/-0)
packages/web/src/lib/knownAgents.ts (modified, +1/-1)
packages/web/src/lib/types.ts (modified, +2/-0)

Code Example

WARN  embedded run timeout: runId=<redacted> sessionId=<redacted> timeoutMs=600000
DEBUG run cleanup: runId=<redacted> sessionId=<redacted> aborted=true timedOut=true
WARN  embedded_run_failover_decision: stage=assistant decision=fallback_model failoverReason=timeout provider=anthropic model=claude-opus-4-6 timedOut=true status=408
ERROR lane task error: lane=main durationMs=630119 error="FailoverError: LLM request timed out."

---

// Line 109601-109602
let timedOut = false;
let timedOutDuringCompaction = false;

---

if (!aborted && failoverFailure || timedOut && !timedOutDuringCompaction) {
    // → enters fallback branch even when timeout was caused by tool execution
}

---

if (timedOut && !timedOutDuringCompaction && payloads.length === 0) return {
    payloads: [{ text: "Request timed out before a response was generated...", isError: true }],
    // ...
};

---

// Current: only compaction is exempt
if (timedOut && !timedOutDuringCompaction) { /* failover */ }

// Proposed: tool execution also exempt
if (timedOut && !timedOutDuringCompaction && !timedOutDuringToolExecution) { /* failover */ }

// Or better: general timeout cause
if (timedOut && timeoutCause === 'llm_request') { /* failover */ }

RAW_BUFFERClick to expand / collapse

Bug type

Agent behavior

Summary

Steps to reproduce

Configure an agent with claude-opus-4-6 as primary model and a fallback model chain
In a session, have the agent spawn a background process via exec (background: true)
Have the agent monitor the process using process(poll) with 2-minute poll intervals
Wait for the total run time to exceed DEFAULT_AGENT_TIMEOUT_SECONDS (600s / 10 minutes)
Observe: the run is aborted with "FailoverError: LLM request timed out." and falls back to the next model in the chain

Alternatively: any tool call that takes a long time (browser automation, long exec, repeated process polling) can trigger the same behavior.

Expected behavior

Tool execution time should not count toward the LLM request timeout. The primary model had already responded successfully and the agent was executing tool calls — no LLM request was in flight when the timeout fired.

Precedent: PR #46889 added timedOutDuringCompaction to exempt compaction operations from triggering failover. Long-running tool execution should receive the same treatment.

Actual behavior

The run-level timer fires after 600s regardless of what the agent is doing. When it fires during tool execution:

abortRun(true) sets timedOut = true
The failover decision checks timedOut && !timedOutDuringCompaction → enters fallback branch
A FailoverError: LLM request timed out. is thrown
The system falls over to the next configured model
The fallback model receives the full context (~61K tokens) but has no useful work to do (the original task was already being handled)

Gateway log sequence (redacted):

WARN  embedded run timeout: runId=<redacted> sessionId=<redacted> timeoutMs=600000
DEBUG run cleanup: runId=<redacted> sessionId=<redacted> aborted=true timedOut=true
WARN  embedded_run_failover_decision: stage=assistant decision=fallback_model failoverReason=timeout provider=anthropic model=claude-opus-4-6 timedOut=true status=408
ERROR lane task error: lane=main durationMs=630119 error="FailoverError: LLM request timed out."

The primary model (Opus) had completed its response. The agent was in a process(poll) loop monitoring a background worker:

Poll 1: 2 min wait → got output
Poll 2: 2 min wait → got output
Poll 3: 2 min wait → got output
Poll 4: 2 min wait → got output
Poll 5: started, aborted at ~17s by run timeout

Total tool execution time: ~10 minutes. No LLM request was pending.

OpenClaw version

2026.3.13 (61d171a)

Operating system

Ubuntu 24.04.4 LTS, Linux 6.17.0-19-generic x86_64

Install method

npm (global)

Model

claude-opus-4-6 (Anthropic) — primary model that was "timed out"

Provider / routing chain

anthropic/claude-opus-4-6 → openrouter/google/gemini-3.1-pro-preview (fallback #1) → openrouter/openai/gpt-5.4 (fallback #2) → ollama/qwen3.5:9b (fallback #3)

Additional provider/model setup details

Default agent timeout is the built-in DEFAULT_AGENT_TIMEOUT_SECONDS = 600. No custom agents.defaults.timeoutSeconds override was set.

The fallback model (gemini-3.1-pro-preview via OpenRouter) consumed ~61K input tokens at $0.145 cost but produced 0 completion tokens — confirming it had no useful work to do.

Logs, screenshots, and evidence

Source code analysis (minified bundle `auth-profiles-DDVivXkv.js`)

Timeout declaration and abort:

// Line 109601-109602
let timedOut = false;
let timedOutDuringCompaction = false;

Failover condition (line 111112):

if (!aborted && failoverFailure || timedOut && !timedOutDuringCompaction) {
    // → enters fallback branch even when timeout was caused by tool execution
}

User-facing error (line 111176):

if (timedOut && !timedOutDuringCompaction && payloads.length === 0) return {
    payloads: [{ text: "Request timed out before a response was generated...", isError: true }],
    // ...
};

Note: timedOutDuringCompaction is the only exemption. There is no timedOutDuringToolExecution or equivalent.

Compaction precedent (PR #46889)

The timedOutDuringCompaction mechanism proves the design intent: certain non-LLM operations should not trigger failover. Tool execution is a missing case.

Impact and severity

Affected: Any agent using long-running tools (process polling, browser automation, multi-minute exec tasks) with a fallback model chain configured
Severity: Blocks workflow + unnecessary cost — the fallback model is invoked with the full conversation context but produces no useful output
Frequency: Deterministic — any tool execution exceeding DEFAULT_AGENT_TIMEOUT_SECONDS (600s) will trigger this
Consequence: (1) Wasted API cost on the fallback model, (2) original task is interrupted mid-execution, (3) misleading "LLM request timed out" error when the LLM was not involved

Additional information

Suggested fix

Add a timedOutDuringToolExecution flag (or refactor to a general timeoutCause enum) so tool execution time is exempt from the failover path, consistent with the existing compaction exemption:

// Current: only compaction is exempt
if (timedOut && !timedOutDuringCompaction) { /* failover */ }

// Proposed: tool execution also exempt
if (timedOut && !timedOutDuringCompaction && !timedOutDuringToolExecution) { /* failover */ }

// Or better: general timeout cause
if (timedOut && timeoutCause === 'llm_request') { /* failover */ }

Alternatively, the run deadline could be extended while tool execution is actively in flight (same approach as PR #46889 does for compaction).

extent analysis

Fix Plan

To resolve the issue, we need to add a mechanism to exempt tool execution time from triggering the failover path. We can achieve this by introducing a timedOutDuringToolExecution flag or a more general timeoutCause enum.

Step-by-Step Solution:

Add a timedOutDuringToolExecution flag: Introduce a new flag to track whether the timeout occurred during tool execution.
Update the failover condition: Modify the failover condition to check for both timedOutDuringCompaction and timedOutDuringToolExecution.
Set the timedOutDuringToolExecution flag: Update the code to set the timedOutDuringToolExecution flag when a timeout occurs during tool execution.

Example Code:

// Add a timedOutDuringToolExecution flag
let timedOutDuringToolExecution = false;

// Update the failover condition
if (timedOut && !timedOutDuringCompaction && !timedOutDuringToolExecution) {
    // Enter fallback branch
}

// Set the timedOutDuringToolExecution flag
if (toolExecutionInProgress && timedOut) {
    timedOutDuringToolExecution = true;
}

Alternatively, you can use a timeoutCause enum to make the code more scalable:

// Define a timeoutCause enum
const TimeoutCause = {
    LLM_REQUEST: 'llm_request',
    TOOL_EXECUTION: 'tool_execution',
    COMPACTION: 'compaction'
};

// Update the failover condition
if (timedOut && timeoutCause === TimeoutCause.LLM_REQUEST) {
    // Enter fallback branch
}

// Set the timeoutCause
if (toolExecutionInProgress && timedOut) {
    timeoutCause = TimeoutCause.TOOL_EXECUTION;
}

Verification

To verify that the fix worked, you can test the following scenarios:

Run a tool execution that exceeds the DEFAULT_AGENT_TIMEOUT_SECONDS (600s) and verify that the failover path is not triggered.
Run a tool execution that completes within the DEFAULT_AGENT_TIMEOUT_SECONDS (600s) and verify that the failover path is not triggered.
Run an LLM request that times out and verify that the failover path is triggered.

Extra Tips

Make sure to update the documentation to reflect the changes made to the failover condition.
Consider adding logging to track the timedOutDuringToolExecution flag or timeoutCause enum to help with debugging.
Review the code to ensure that the timedOutDuringToolExecution flag or timeoutCause enum is properly reset after each tool execution or LLM request.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Precedent: PR #46889 added timedOutDuringCompaction to exempt compaction operations from triggering failover. Long-running tool execution should receive the same treatment.

#api #ssr #retrieval issue #search optimization #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Agent run timeout during tool execution misclassified as LLM timeout, triggers unnecessary model fallback [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #320: feat: OpenClaw adapter refactor + board UI redesign

Description (problem / solution / changelog)

Summary

User-Facing Release Notes

Type of Change

Checklist

Testing

Screenshots / Demo

Summary by CodeRabbit

Release Notes

Changed files

Code Example

Bug type

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Source code analysis (minified bundle auth-profiles-DDVivXkv.js)

Compaction precedent (PR #46889)

Impact and severity

Additional information

Suggested fix

extent analysis

Fix Plan

Step-by-Step Solution:

Example Code:

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Source code analysis (minified bundle `auth-profiles-DDVivXkv.js`)