openclaw - 💡(How to fix) Fix Provider timeoutSeconds is theatre — embedded-runner (120s) and lane (210s) watchdogs fire above it [1 comments, 2 participants]

openclaw2026-05-10 07:19:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80153•Fetched 2026-05-11 03:18:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jpazvd

Participants

clawsweeper[bot]

jpazvd

Timeline (top)

closed ×1commented ×1cross-referenced ×1

Setting models.providers.ollama.timeoutSeconds: 600 in ~/.openclaw/openclaw.json does not extend the actual runtime ceiling for an Ollama provider call. Two internal watchdogs intercept the call before the provider timeout has a chance to apply:

Embedded-runner watchdog: 120 s — fires regardless of provider config.
Lane watchdog: 210 s — fires regardless of provider config.

The provider's timeoutSeconds value is therefore an upper bound that is always dominated by the inner watchdogs. On hardware where cold-start latency for a 14B-class model exceeds 120 s (Mac Mini M4 Pro 64 GB has been measured at 180–300 s for qwen2.5:14b cold-start under contention), the bot turns up "model timeout" errors that the user can't fix by raising timeoutSeconds.

Error Message

2026-05-08 21:42:18 [agent/embedded] error: model invocation timed out after 120s (provider timeout was 600s) 2026-05-08 21:42:18 [agent/embedded] watchdog: lane closed; spawning replacement

Root Cause

Root cause (analysis from gateway log inspection)

Fix Action

Fix / Workaround

Workaround in place

Code Example

2026-05-08 21:42:18 [agent/embedded] error: model invocation timed out after 120s (provider timeout was 600s)
2026-05-08 21:42:18 [agent/embedded] watchdog: lane closed; spawning replacement

RAW_BUFFERClick to expand / collapse

Upstream issue draft — OpenClaw

Repo: https://github.com/openclaw/openclaw/issues/new Title: Provider timeoutSeconds is theatre — embedded-runner (120s) and lane (210s) watchdogs fire above it

Summary

Embedded-runner watchdog: 120 s — fires regardless of provider config.
Lane watchdog: 210 s — fires regardless of provider config.

Symptom

2026-05-08 21:42:18 [agent/embedded] error: model invocation timed out after 120s (provider timeout was 600s)
2026-05-08 21:42:18 [agent/embedded] watchdog: lane closed; spawning replacement

163 model-timeout events on 2026-05-08 alone after OLLAMA_KEEP_ALIVE=-1 was set (which keeps the model resident and reduces but does not eliminate cold-start scenarios — first turn after Ollama daemon restart, or after every-12h KV-cache flush, still trips this).

Root cause (analysis from gateway log inspection)

There appear to be three independent timeout layers that aren't aware of each other:

Provider timeoutSeconds (config, default 60s, can be raised to 600s).
Embedded-runner watchdog — 120s hardcoded floor, fires inside the agent runtime regardless of provider config.
Lane watchdog — 210s hardcoded floor at the gateway lane layer.

A user who raises timeoutSeconds to 600 still has their calls killed at 120 s by layer (2). The config setting silently has no effect for any cold-start scenario above 120s.

Proposed fix

Either:

(a) Make embedded-runner and lane watchdogs respect models.providers.<provider>.timeoutSeconds as the controlling value (preferred — keeps a single source of truth in config).
(b) Surface the watchdog values as configurable in models.providers.<provider>.embeddedRunnerTimeoutSeconds and lanes.timeoutSeconds, with documentation that the min of all three applies.

Either way, the user should be able to set ONE knob and have it apply.

Workaround in place

OLLAMA_KEEP_ALIVE=-1 reduces cold-start frequency to "first turn after daemon restart" only.
OLLAMA_FLASH_ATTENTION=1 + OLLAMA_KV_CACHE_TYPE=q8_0 reduce per-turn latency.
OLLAMA_NUM_PARALLEL=2 allows two concurrent calls without lane contention.

These don't solve the watchdog-floor problem, just reduce the frequency of hitting it.

Environment

openclaw 2026.5.2 (8b2a6e5)
macOS 25.4.0 (Darwin 25.4.0) / arm64 / Mac Mini M4 Pro 64 GB
Ollama 0.20.2 (MLX backend auto-engaged on Apple Silicon)
Model under test: qwen2.5:14b (Q4_K_M, ~9 GB resident)

What I'd expect

timeoutSeconds: 600 in models.providers.ollama allows a 600 s ceiling for cold-start. Currently it's silently ignored.

Pairs with feedback_destructive_remote_ops_need_chat_text and feedback_concrete_example_soul_edits only tangentially. Tracked locally as Operational #3 in docs/18-chatbot-improvement-plan.md. Blocks any Phase 5 bake-off attempt to bench cold-start latency honestly — the 120s watchdog masks differences between models on first call.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Provider timeoutSeconds is theatre — embedded-runner (120s) and lane (210s) watchdogs fire above it [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause (analysis from gateway log inspection)

Fix Action

Fix / Workaround

Workaround in place

Code Example

Upstream issue draft — OpenClaw

Summary

Symptom

Root cause (analysis from gateway log inspection)

Proposed fix

Workaround in place

Environment

What I'd expect

Related

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Provider timeoutSeconds is theatre — embedded-runner (120s) and lane (210s) watchdogs fire above it [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause (analysis from gateway log inspection)

Fix Action

Fix / Workaround

Workaround in place

Code Example

Upstream issue draft — OpenClaw

Summary

Symptom

Root cause (analysis from gateway log inspection)

Proposed fix

Workaround in place

Environment

What I'd expect

Related

Still need to ship something?

RELATED_DISCOVERY

TRENDING