1. A configurable `models.providers.ollama.requestTimeoutMs` (or equivalent) key so local model latency can be accommodated. 2. Alternatively: the gateway streaming path should begin the timeout clock from first-token receipt, not from connection open — since local models may have high prompt-eval latency before any tokens stream.

openclaw - 💡(How to fix) Fix Ollama: gateway times out at 60s despite model responding in <10s via direct curl [1 comments, 2 participants]

openclaw2026-04-08 04:06:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62911•Fetched 2026-04-09 08:00:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

interchainlive

Participants

interchainlive

steipete

Timeline (top)

closed ×1commented ×1

OpenClaw's gateway times out on all Ollama requests after exactly 60 seconds, while the same model responds correctly in <10s via direct curl. The timeout fires even with a pre-warmed model already resident in VRAM. The Ollama stack is healthy — the break is in the gateway's HTTP streaming path to Ollama.

Error Message

[agent/embedded] embedded run failover decision: reason=timeout provider=ollama/qwen2.5:3b [diagnostic] lane task error: lane=main durationMs=60371 error="FailoverError: LLM request timed out." [diagnostic] lane task error: lane=session:agent:quinn:main durationMs=60373 [model-fallback/decision] candidate_failed reason=unknown next=ollama/qwen2.5:7b [diagnostic] lane task error: lane=main durationMs=60733 error="FailoverError: LLM request timed out."

Root Cause

Fix Action

Workaround

Direct curl http://localhost:11434/api/chat works correctly. openclaw agent against Ollama-backed agents is non-functional on this hardware until resolved.

Code Example

[agent/embedded] embedded run failover decision: reason=timeout provider=ollama/qwen2.5:3b
[diagnostic] lane task error: lane=main durationMs=60371 error="FailoverError: LLM request timed out."
[diagnostic] lane task error: lane=session:agent:quinn:main durationMs=60373
[model-fallback/decision] candidate_failed reason=unknown next=ollama/qwen2.5:7b
[diagnostic] lane task error: lane=main durationMs=60733 error="FailoverError: LLM request timed out."

RAW_BUFFERClick to expand / collapse

Summary

Diagnostic evidence

Test	Method	Result
`qwen2.5:3b` 5-token response	`curl /api/generate stream:false`	✅ 9.7s
`qwen2.5:3b` 5-token response	`curl /api/chat stream:false`	✅ 25.6s
`qwen2.5:3b` first token	`curl /api/chat stream:true`	✅ 4.2s
`openclaw agent --agent quinn` (same prompt)	gateway path	❌ 60s timeout, every time

Both models in the fallback chain (qwen2.5:3b, qwen2.5:7b) time out at the same threshold. The gateway logs show the timeout fires in the lane task layer:

[agent/embedded] embedded run failover decision: reason=timeout provider=ollama/qwen2.5:3b
[diagnostic] lane task error: lane=main durationMs=60371 error="FailoverError: LLM request timed out."
[diagnostic] lane task error: lane=session:agent:quinn:main durationMs=60373
[model-fallback/decision] candidate_failed reason=unknown next=ollama/qwen2.5:7b
[diagnostic] lane task error: lane=main durationMs=60733 error="FailoverError: LLM request timed out."

Key observations

Not a cold-start problem. Pre-warmed model (already resident, 2279MB VRAM), expires_at confirming it was loaded — same 60s timeout.
Not a model performance problem. Direct curl /api/chat stream:true returns first token in 4.2 seconds.
Hardcoded 60s timeout. Both fallback models hit exactly 60s. No config key found in configuration-reference.md or schema to adjust this. agents.defaults.timeoutSeconds: 1200 and openclaw agent --timeout do not affect it.
Gateway-first, embedded-second, both fail. The diagnostic output shows Gateway agent failed; falling back to embedded — the embedded path uses the same timeout, so fallback does not help.
Exit code unreliable. openclaw agent returned exit 0 on the first failure run and exit 1 on the second — cannot be used for programmatic pass/fail detection.

Environment

OS: Ubuntu 24.04.4 LTS (kernel 6.17.0-20-generic)
OpenClaw: v2026.4.2
Hardware: Lenovo V15 G4 ABP — AMD Ryzen 7 7000, 40GB RAM, AMD Radeon iGPU (Vulkan)
Ollama: localhost:11434, OLLAMA_VULKAN=1, OLLAMA_FLASH_ATTENTION=1
Model: qwen2.5:3b (primary), qwen2.5:7b (fallback)
Provider config: baseUrl: "http://127.0.0.1:11434", api: "ollama", apiKey: "ollama-local"

Expected behavior

A configurable models.providers.ollama.requestTimeoutMs (or equivalent) key so local model latency can be accommodated.
Alternatively: the gateway streaming path should begin the timeout clock from first-token receipt, not from connection open — since local models may have high prompt-eval latency before any tokens stream.

Workaround

Direct curl http://localhost:11434/api/chat works correctly. openclaw agent against Ollama-backed agents is non-functional on this hardware until resolved.

extent analysis

TL;DR

The most likely fix is to adjust the timeout configuration in the OpenClaw gateway to accommodate the local model latency.

Guidance

Investigate the OpenClaw configuration to find a potential timeout setting that can be adjusted to a value higher than 60 seconds, allowing the local model to respond without timing out.
Verify that the agents.defaults.timeoutSeconds and openclaw agent --timeout options do not affect the hardcoded 60s timeout, and look for alternative configuration options.
Consider implementing a workaround by using direct curl requests to the Ollama API, as this has been shown to work correctly.
Review the OpenClaw documentation and source code to determine if there is a configurable timeout setting, such as models.providers.ollama.requestTimeoutMs, that can be used to accommodate local model latency.

Example

No code snippet is provided as the issue does not imply a specific code change, but rather a configuration adjustment.

Notes

The exact configuration option to adjust the timeout is not specified in the provided information, so further investigation into the OpenClaw documentation and source code is necessary.

Recommendation

Apply a workaround by using direct curl requests to the Ollama API until a configurable timeout setting is found or implemented in the OpenClaw gateway.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

A configurable models.providers.ollama.requestTimeoutMs (or equivalent) key so local model latency can be accommodated.
Alternatively: the gateway streaming path should begin the timeout clock from first-token receipt, not from connection open — since local models may have high prompt-eval latency before any tokens stream.

#api #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Ollama: gateway times out at 60s despite model responding in <10s via direct curl [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Diagnostic evidence

Key observations

Environment

Expected behavior

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Ollama: gateway times out at 60s despite model responding in <10s via direct curl [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Diagnostic evidence

Key observations

Environment

Expected behavior

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING