openclaw - ✅(Solved) Fix LLM requests intermittently timeout at ~65s in 2026.4.2 despite SDK default of 600s [1 pull requests, 2 comments, 1 participants]

openclaw2026-04-03 08:42:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#60174•Fetched 2026-04-08 02:35:29

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ottodeng

Participants

ottodeng

Timeline (top)

commented ×2subscribed ×2closed ×1cross-referenced ×1

OpenClaw 2026.4.2 introduces intermittent ~65-second LLM request timeouts that did not occur in 2026.4.1. The timeout affects all configured providers simultaneously, ruling out upstream API issues.

Error Message

[agent/embedded] Profile github-copilot:github timed out. Trying next account... [agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout [diagnostic] lane task error: lane=main durationMs=65831 error="FailoverError: LLM request timed out." [model-fallback/decision] model fallback decision: decision=candidate_failed reason=unknown Embedded agent failed before reply: All models failed (3): github-copilot/claude-opus-4.6: LLM request timed out. (unknown) | local-lb/claude-opus-4-6: LLM request timed out. (unknown) | github-copilot/claude-opus-4.6-1m: LLM request timed out. (unknown)

Root Cause

Fix Action

Workaround

Downgrade to 2026.4.1:

sudo npm i -g [email protected]
openclaw gateway restart

PR fix notes

PR #60363: fix(providers): auto-select anthropic-messages API for Claude models on GitHub Copilot

Repository: openclaw/openclaw
Author: ottodeng
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/60363

Description (problem / solution / changelog)

Problem

When Claude models are configured under the github-copilot provider without an explicit api field, OpenClaw defaults to openai-responses (OpenAI chat/completions format). This format does not support Anthropic prompt caching, causing:

Time-to-first-token scales linearly with context size — no caching means full prefill every request
LLM idle timeouts for large sessions — at ~100k+ tokens, prefill exceeds DEFAULT_LLM_IDLE_TIMEOUT_MS (60s), causing FailoverError: LLM request timed out
All fallback models fail in sequence — because the timeout is on the client side, not the API

This affects any user who configures GitHub Copilot with Claude models without explicitly setting api: anthropic-messages.

Solution

Add inferAnthropicApiForCopilotClaude() in src/agents/pi-embedded-runner/model.ts that automatically selects anthropic-messages as the transport API when:

Provider is github-copilot
Model ID contains claude

The explicit api config field still takes precedence (non-breaking).

Benchmark

Tested on GitHub Copilot Enterprise (api.enterprise.githubcopilot.com), streaming mode, same prompt content:

Prompt size	Anthropic API (TTFB)	OpenAI API (TTFB)	Speedup
~3k tokens	0.89s	3.25s	3.6x
~15k tokens	0.71s	4.75s	6.7x
~60k tokens	0.69s	9.47s	13.5x

With Anthropic Messages API, TTFB stays constant (~0.7-0.9s) regardless of prompt size due to prompt caching. With OpenAI API, it grows linearly.

Impact

Eliminates unexpected timeouts for GitHub Copilot + Claude users with large contexts
~4-14x faster time-to-first-token depending on context size
Zero config change required for existing users
Users who explicitly set api are unaffected

Fixes #60174

Changed files

src/agents/pi-embedded-runner/model.ts (modified, +36/-5)

Code Example

[agent/embedded] Profile github-copilot:github timed out. Trying next account...
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout
[diagnostic] lane task error: lane=main durationMs=65831 error="FailoverError: LLM request timed out."
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=unknown
Embedded agent failed before reply: All models failed (3): github-copilot/claude-opus-4.6: LLM request timed out. (unknown) | local-lb/claude-opus-4-6: LLM request timed out. (unknown) | github-copilot/claude-opus-4.6-1m: LLM request timed out. (unknown)

---

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "github-copilot/claude-opus-4.6-1m",
        "fallbacks": ["local-lb/claude-opus-4-6"]
      },
      "timeoutSeconds": 900,
      "contextTokens": 900000
    }
  },
  "models": {
    "providers": {
      "local-lb": {
        "baseUrl": "http://172.16.2.4:8000",
        "api": "anthropic-messages"
      },
      "github-copilot": {
        "baseUrl": "https://api.enterprise.githubcopilot.com",
        "headers": {
          "Editor-Version": "vscode/1.111.0",
          "Editor-Plugin-Version": "copilot-chat/0.39.1",
          "Copilot-Integration-Id": "vscode-chat"
        }
      }
    }
  }
}

---

sudo npm i -g openclaw@2026.4.1
openclaw gateway restart

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw version: 2026.4.2 (d74a122) — regressed; 2026.4.1 (da64a97) works fine
OS: Ubuntu Linux (x64)
Node: v22.22.0
Providers tested: GitHub Copilot (claude-opus-4.6, claude-opus-4.6-1m) and custom Anthropic-compatible LB (local-lb/claude-opus-4-6)
Channel: Feishu (WebSocket mode)

Problem

LLM requests timeout at exactly 65±5 seconds with FailoverError: LLM request timed out. The configured agents.defaults.timeoutSeconds: 900 has no effect on this timeout.

Key observations

All providers fail simultaneously — GitHub Copilot API, local Anthropic-compatible LB, and fallback models all timeout at ~65s in sequence
Direct API calls work fine — curl to the same endpoints from the same machine returns in 2-9 seconds (tested with 30k token prompts)
Intermittent pattern — Short sessions after /new often work; failures increase as session context grows (50k+ tokens)
Both SDKs have 600s default — OpenAI SDK DEFAULT_TIMEOUT=6e5 and Anthropic SDK DEFAULT_TIMEOUT=6e5 are both 600,000ms
Downgrading to 2026.4.1 resolves the issue

Timeline from logs (2026-04-03, all times CST)

Time	Provider	Duration	Result
14:00	copilot/4.6-1m	67s	❌ timeout
14:04	copilot	14s	✅ ok
14:05	copilot	~90s	❌ timeout
14:10	copilot	29s	✅ ok
14:17	copilot	2s	✅ ok (short reply)
15:00	all 3 models	65s×3	❌ all failed
15:24	all 3 models	65s×3	❌ all failed
15:35	all 3 models	65s×3	❌ all failed
16:18	copilot/4.6-1m	30s	✅ ok (after restart)

Relevant log lines

[agent/embedded] Profile github-copilot:github timed out. Trying next account...
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout
[diagnostic] lane task error: lane=main durationMs=65831 error="FailoverError: LLM request timed out."
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=unknown
Embedded agent failed before reply: All models failed (3): github-copilot/claude-opus-4.6: LLM request timed out. (unknown) | local-lb/claude-opus-4-6: LLM request timed out. (unknown) | github-copilot/claude-opus-4.6-1m: LLM request timed out. (unknown)

Config (relevant parts)

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "github-copilot/claude-opus-4.6-1m",
        "fallbacks": ["local-lb/claude-opus-4-6"]
      },
      "timeoutSeconds": 900,
      "contextTokens": 900000
    }
  },
  "models": {
    "providers": {
      "local-lb": {
        "baseUrl": "http://172.16.2.4:8000",
        "api": "anthropic-messages"
      },
      "github-copilot": {
        "baseUrl": "https://api.enterprise.githubcopilot.com",
        "headers": {
          "Editor-Version": "vscode/1.111.0",
          "Editor-Plugin-Version": "copilot-chat/0.39.1",
          "Copilot-Integration-Id": "vscode-chat"
        }
      }
    }
  }
}

Reproduction

Install OpenClaw 2026.4.2
Configure GitHub Copilot + any Anthropic-compatible provider as fallback
Have a multi-turn conversation (5+ turns, accumulating ~50k tokens of context)
Observe ~65s timeouts on LLM requests, with all providers failing in sequence

Workaround

Downgrade to 2026.4.1:

sudo npm i -g [email protected]
openclaw gateway restart

Expected behavior

LLM requests should respect the SDK default timeout of 600s (or the configured timeoutSeconds), not an implicit ~65s limit.

extent analysis

TL;DR

Downgrade to OpenClaw version 2026.4.1 to resolve the intermittent 65-second LLM request timeouts.

Guidance

The issue seems to be specific to OpenClaw version 2026.4.2, as downgrading to 2026.4.1 resolves the problem.
The fact that all providers fail simultaneously with a consistent timeout of ~65 seconds suggests a client-side issue rather than an upstream API problem.
The configured timeoutSeconds of 900 has no effect on this timeout, implying that there might be a hardcoded or implicit timeout in the OpenClaw code.
To verify the issue, try reproducing the problem with a multi-turn conversation and observe the timeouts.

Example

No code snippet is provided as the issue seems to be related to a specific version of OpenClaw and not a code-related problem.

Notes

The root cause of the issue is not explicitly stated, but it appears to be related to a change introduced in OpenClaw version 2026.4.2. The workaround of downgrading to 2026.4.1 is effective, but it may not be a long-term solution.

Recommendation

Apply the workaround by downgrading to OpenClaw version 2026.4.1, as it is a known fix for the issue. This is a temporary solution until the root cause is identified and addressed in a future version of OpenClaw.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

LLM requests should respect the SDK default timeout of 600s (or the configured timeoutSeconds), not an implicit ~65s limit.

#api #logging issue #authentication issue #prompt issue #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix LLM requests intermittently timeout at ~65s in 2026.4.2 despite SDK default of 600s [1 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #60363: fix(providers): auto-select anthropic-messages API for Claude models on GitHub Copilot

Description (problem / solution / changelog)

Problem

Solution

Benchmark

Impact

Changed files

Code Example

Summary

Environment

Problem

Key observations

Timeline from logs (2026-04-03, all times CST)

Relevant log lines

Config (relevant parts)

Reproduction

Workaround

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING