openclaw - ✅(Solved) Fix LLM requests intermittently timeout at ~65s in 2026.4.2 despite SDK default of 600s [1 pull requests, 2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60174Fetched 2026-04-08 02:35:29
View on GitHub
Comments
2
Participants
1
Timeline
9
Reactions
0
Author
Participants
Timeline (top)
commented ×2subscribed ×2closed ×1cross-referenced ×1

OpenClaw 2026.4.2 introduces intermittent ~65-second LLM request timeouts that did not occur in 2026.4.1. The timeout affects all configured providers simultaneously, ruling out upstream API issues.

Error Message

[agent/embedded] Profile github-copilot:github timed out. Trying next account... [agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout [diagnostic] lane task error: lane=main durationMs=65831 error="FailoverError: LLM request timed out." [model-fallback/decision] model fallback decision: decision=candidate_failed reason=unknown Embedded agent failed before reply: All models failed (3): github-copilot/claude-opus-4.6: LLM request timed out. (unknown) | local-lb/claude-opus-4-6: LLM request timed out. (unknown) | github-copilot/claude-opus-4.6-1m: LLM request timed out. (unknown)

Root Cause

OpenClaw 2026.4.2 introduces intermittent ~65-second LLM request timeouts that did not occur in 2026.4.1. The timeout affects all configured providers simultaneously, ruling out upstream API issues.

Fix Action

Workaround

Downgrade to 2026.4.1:

sudo npm i -g [email protected]
openclaw gateway restart

PR fix notes

PR #60363: fix(providers): auto-select anthropic-messages API for Claude models on GitHub Copilot

Description (problem / solution / changelog)

Problem

When Claude models are configured under the github-copilot provider without an explicit api field, OpenClaw defaults to openai-responses (OpenAI chat/completions format). This format does not support Anthropic prompt caching, causing:

  1. Time-to-first-token scales linearly with context size — no caching means full prefill every request
  2. LLM idle timeouts for large sessions — at ~100k+ tokens, prefill exceeds DEFAULT_LLM_IDLE_TIMEOUT_MS (60s), causing FailoverError: LLM request timed out
  3. All fallback models fail in sequence — because the timeout is on the client side, not the API

This affects any user who configures GitHub Copilot with Claude models without explicitly setting api: anthropic-messages.

Solution

Add inferAnthropicApiForCopilotClaude() in src/agents/pi-embedded-runner/model.ts that automatically selects anthropic-messages as the transport API when:

  • Provider is github-copilot
  • Model ID contains claude

The explicit api config field still takes precedence (non-breaking).

Benchmark

Tested on GitHub Copilot Enterprise (api.enterprise.githubcopilot.com), streaming mode, same prompt content:

Prompt sizeAnthropic API (TTFB)OpenAI API (TTFB)Speedup
~3k tokens0.89s3.25s3.6x
~15k tokens0.71s4.75s6.7x
~60k tokens0.69s9.47s13.5x

With Anthropic Messages API, TTFB stays constant (~0.7-0.9s) regardless of prompt size due to prompt caching. With OpenAI API, it grows linearly.

Impact

  • Eliminates unexpected timeouts for GitHub Copilot + Claude users with large contexts
  • ~4-14x faster time-to-first-token depending on context size
  • Zero config change required for existing users
  • Users who explicitly set api are unaffected

Fixes #60174

Changed files

  • src/agents/pi-embedded-runner/model.ts (modified, +36/-5)

Code Example

[agent/embedded] Profile github-copilot:github timed out. Trying next account...
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout
[diagnostic] lane task error: lane=main durationMs=65831 error="FailoverError: LLM request timed out."
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=unknown
Embedded agent failed before reply: All models failed (3): github-copilot/claude-opus-4.6: LLM request timed out. (unknown) | local-lb/claude-opus-4-6: LLM request timed out. (unknown) | github-copilot/claude-opus-4.6-1m: LLM request timed out. (unknown)

---

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "github-copilot/claude-opus-4.6-1m",
        "fallbacks": ["local-lb/claude-opus-4-6"]
      },
      "timeoutSeconds": 900,
      "contextTokens": 900000
    }
  },
  "models": {
    "providers": {
      "local-lb": {
        "baseUrl": "http://172.16.2.4:8000",
        "api": "anthropic-messages"
      },
      "github-copilot": {
        "baseUrl": "https://api.enterprise.githubcopilot.com",
        "headers": {
          "Editor-Version": "vscode/1.111.0",
          "Editor-Plugin-Version": "copilot-chat/0.39.1",
          "Copilot-Integration-Id": "vscode-chat"
        }
      }
    }
  }
}

---

sudo npm i -g openclaw@2026.4.1
openclaw gateway restart
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw 2026.4.2 introduces intermittent ~65-second LLM request timeouts that did not occur in 2026.4.1. The timeout affects all configured providers simultaneously, ruling out upstream API issues.

Environment

  • OpenClaw version: 2026.4.2 (d74a122) — regressed; 2026.4.1 (da64a97) works fine
  • OS: Ubuntu Linux (x64)
  • Node: v22.22.0
  • Providers tested: GitHub Copilot (claude-opus-4.6, claude-opus-4.6-1m) and custom Anthropic-compatible LB (local-lb/claude-opus-4-6)
  • Channel: Feishu (WebSocket mode)

Problem

LLM requests timeout at exactly 65±5 seconds with FailoverError: LLM request timed out. The configured agents.defaults.timeoutSeconds: 900 has no effect on this timeout.

Key observations

  1. All providers fail simultaneously — GitHub Copilot API, local Anthropic-compatible LB, and fallback models all timeout at ~65s in sequence
  2. Direct API calls work finecurl to the same endpoints from the same machine returns in 2-9 seconds (tested with 30k token prompts)
  3. Intermittent pattern — Short sessions after /new often work; failures increase as session context grows (50k+ tokens)
  4. Both SDKs have 600s default — OpenAI SDK DEFAULT_TIMEOUT=6e5 and Anthropic SDK DEFAULT_TIMEOUT=6e5 are both 600,000ms
  5. Downgrading to 2026.4.1 resolves the issue

Timeline from logs (2026-04-03, all times CST)

TimeProviderDurationResult
14:00copilot/4.6-1m67s❌ timeout
14:04copilot14s✅ ok
14:05copilot~90s❌ timeout
14:10copilot29s✅ ok
14:17copilot2s✅ ok (short reply)
15:00all 3 models65s×3❌ all failed
15:24all 3 models65s×3❌ all failed
15:35all 3 models65s×3❌ all failed
16:18copilot/4.6-1m30s✅ ok (after restart)

Relevant log lines

[agent/embedded] Profile github-copilot:github timed out. Trying next account...
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout
[diagnostic] lane task error: lane=main durationMs=65831 error="FailoverError: LLM request timed out."
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=unknown
Embedded agent failed before reply: All models failed (3): github-copilot/claude-opus-4.6: LLM request timed out. (unknown) | local-lb/claude-opus-4-6: LLM request timed out. (unknown) | github-copilot/claude-opus-4.6-1m: LLM request timed out. (unknown)

Config (relevant parts)

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "github-copilot/claude-opus-4.6-1m",
        "fallbacks": ["local-lb/claude-opus-4-6"]
      },
      "timeoutSeconds": 900,
      "contextTokens": 900000
    }
  },
  "models": {
    "providers": {
      "local-lb": {
        "baseUrl": "http://172.16.2.4:8000",
        "api": "anthropic-messages"
      },
      "github-copilot": {
        "baseUrl": "https://api.enterprise.githubcopilot.com",
        "headers": {
          "Editor-Version": "vscode/1.111.0",
          "Editor-Plugin-Version": "copilot-chat/0.39.1",
          "Copilot-Integration-Id": "vscode-chat"
        }
      }
    }
  }
}

Reproduction

  1. Install OpenClaw 2026.4.2
  2. Configure GitHub Copilot + any Anthropic-compatible provider as fallback
  3. Have a multi-turn conversation (5+ turns, accumulating ~50k tokens of context)
  4. Observe ~65s timeouts on LLM requests, with all providers failing in sequence

Workaround

Downgrade to 2026.4.1:

sudo npm i -g [email protected]
openclaw gateway restart

Expected behavior

LLM requests should respect the SDK default timeout of 600s (or the configured timeoutSeconds), not an implicit ~65s limit.

extent analysis

TL;DR

Downgrade to OpenClaw version 2026.4.1 to resolve the intermittent 65-second LLM request timeouts.

Guidance

  • The issue seems to be specific to OpenClaw version 2026.4.2, as downgrading to 2026.4.1 resolves the problem.
  • The fact that all providers fail simultaneously with a consistent timeout of ~65 seconds suggests a client-side issue rather than an upstream API problem.
  • The configured timeoutSeconds of 900 has no effect on this timeout, implying that there might be a hardcoded or implicit timeout in the OpenClaw code.
  • To verify the issue, try reproducing the problem with a multi-turn conversation and observe the timeouts.

Example

No code snippet is provided as the issue seems to be related to a specific version of OpenClaw and not a code-related problem.

Notes

The root cause of the issue is not explicitly stated, but it appears to be related to a change introduced in OpenClaw version 2026.4.2. The workaround of downgrading to 2026.4.1 is effective, but it may not be a long-term solution.

Recommendation

Apply the workaround by downgrading to OpenClaw version 2026.4.1, as it is a known fix for the issue. This is a temporary solution until the root cause is identified and addressed in a future version of OpenClaw.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

LLM requests should respect the SDK default timeout of 600s (or the configured timeoutSeconds), not an implicit ~65s limit.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING