openclaw - 💡(How to fix) Fix Provider timeoutSeconds is theatre — embedded-runner (120s) and lane (210s) watchdogs fire above it [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80153Fetched 2026-05-11 03:18:17
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
2
Author
Timeline (top)
closed ×1commented ×1cross-referenced ×1

Setting models.providers.ollama.timeoutSeconds: 600 in ~/.openclaw/openclaw.json does not extend the actual runtime ceiling for an Ollama provider call. Two internal watchdogs intercept the call before the provider timeout has a chance to apply:

  • Embedded-runner watchdog: 120 s — fires regardless of provider config.
  • Lane watchdog: 210 s — fires regardless of provider config.

The provider's timeoutSeconds value is therefore an upper bound that is always dominated by the inner watchdogs. On hardware where cold-start latency for a 14B-class model exceeds 120 s (Mac Mini M4 Pro 64 GB has been measured at 180–300 s for qwen2.5:14b cold-start under contention), the bot turns up "model timeout" errors that the user can't fix by raising timeoutSeconds.

Error Message

2026-05-08 21:42:18 [agent/embedded] error: model invocation timed out after 120s (provider timeout was 600s) 2026-05-08 21:42:18 [agent/embedded] watchdog: lane closed; spawning replacement

Root Cause

Root cause (analysis from gateway log inspection)

Fix Action

Fix / Workaround

Workaround in place

Code Example

2026-05-08 21:42:18 [agent/embedded] error: model invocation timed out after 120s (provider timeout was 600s)
2026-05-08 21:42:18 [agent/embedded] watchdog: lane closed; spawning replacement
RAW_BUFFERClick to expand / collapse

Upstream issue draft — OpenClaw

Repo: https://github.com/openclaw/openclaw/issues/new Title: Provider timeoutSeconds is theatre — embedded-runner (120s) and lane (210s) watchdogs fire above it

Summary

Setting models.providers.ollama.timeoutSeconds: 600 in ~/.openclaw/openclaw.json does not extend the actual runtime ceiling for an Ollama provider call. Two internal watchdogs intercept the call before the provider timeout has a chance to apply:

  • Embedded-runner watchdog: 120 s — fires regardless of provider config.
  • Lane watchdog: 210 s — fires regardless of provider config.

The provider's timeoutSeconds value is therefore an upper bound that is always dominated by the inner watchdogs. On hardware where cold-start latency for a 14B-class model exceeds 120 s (Mac Mini M4 Pro 64 GB has been measured at 180–300 s for qwen2.5:14b cold-start under contention), the bot turns up "model timeout" errors that the user can't fix by raising timeoutSeconds.

Symptom

2026-05-08 21:42:18 [agent/embedded] error: model invocation timed out after 120s (provider timeout was 600s)
2026-05-08 21:42:18 [agent/embedded] watchdog: lane closed; spawning replacement

163 model-timeout events on 2026-05-08 alone after OLLAMA_KEEP_ALIVE=-1 was set (which keeps the model resident and reduces but does not eliminate cold-start scenarios — first turn after Ollama daemon restart, or after every-12h KV-cache flush, still trips this).

Root cause (analysis from gateway log inspection)

There appear to be three independent timeout layers that aren't aware of each other:

  1. Provider timeoutSeconds (config, default 60s, can be raised to 600s).
  2. Embedded-runner watchdog — 120s hardcoded floor, fires inside the agent runtime regardless of provider config.
  3. Lane watchdog — 210s hardcoded floor at the gateway lane layer.

A user who raises timeoutSeconds to 600 still has their calls killed at 120 s by layer (2). The config setting silently has no effect for any cold-start scenario above 120s.

Proposed fix

Either:

  • (a) Make embedded-runner and lane watchdogs respect models.providers.<provider>.timeoutSeconds as the controlling value (preferred — keeps a single source of truth in config).
  • (b) Surface the watchdog values as configurable in models.providers.<provider>.embeddedRunnerTimeoutSeconds and lanes.timeoutSeconds, with documentation that the min of all three applies.

Either way, the user should be able to set ONE knob and have it apply.

Workaround in place

  • OLLAMA_KEEP_ALIVE=-1 reduces cold-start frequency to "first turn after daemon restart" only.
  • OLLAMA_FLASH_ATTENTION=1 + OLLAMA_KV_CACHE_TYPE=q8_0 reduce per-turn latency.
  • OLLAMA_NUM_PARALLEL=2 allows two concurrent calls without lane contention.

These don't solve the watchdog-floor problem, just reduce the frequency of hitting it.

Environment

  • openclaw 2026.5.2 (8b2a6e5)
  • macOS 25.4.0 (Darwin 25.4.0) / arm64 / Mac Mini M4 Pro 64 GB
  • Ollama 0.20.2 (MLX backend auto-engaged on Apple Silicon)
  • Model under test: qwen2.5:14b (Q4_K_M, ~9 GB resident)

What I'd expect

timeoutSeconds: 600 in models.providers.ollama allows a 600 s ceiling for cold-start. Currently it's silently ignored.

Related

Pairs with feedback_destructive_remote_ops_need_chat_text and feedback_concrete_example_soul_edits only tangentially. Tracked locally as Operational #3 in docs/18-chatbot-improvement-plan.md. Blocks any Phase 5 bake-off attempt to bench cold-start latency honestly — the 120s watchdog masks differences between models on first call.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Provider timeoutSeconds is theatre — embedded-runner (120s) and lane (210s) watchdogs fire above it [1 comments, 2 participants]