openclaw - 💡(How to fix) Fix [Feature/Bug]: Expose undici connect.timeout for Ollama provider + make fallback decision consistent on reason=timeout [1 participants]

openclaw2026-04-19 03:19:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68796•Fetched 2026-04-19 15:07:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Juankcba

Participants

Juankcba

Two related asks affecting remote-Ollama setups:

Feature: Expose the undici connect.timeout (TCP connect) so operators running Ollama on a different host (LAN, tailnet) can tolerate slow first-byte responses. Today it is hardcoded to undici's default 10 s and is not reachable from openclaw.json.
Bug: For the same reason=timeout from ollama/<model>, the model-fallback/decision sometimes resolves to decision=fallback_model (and recovers via the configured chain) and sometimes to decision=surface_error (no fallback attempted). The chosen path appears non-deterministic for what looks like the same failure mode.

Error Message

error=LLM request failed: network connection error. rawError=fetch failed | Connect Timeout Error (attempted address: 192.168.68.82:11434, timeout: 10000ms) Result: error surfaced to user, no fallback attempted. ❌ Both runs hit the same Connect Timeout Error to the same address with the same configured chain. The branching point that produces surface_error vs fallback_model is not obvious from the logs. For LAN/tailnet Ollama the first request after warm-up of a large model (or transient packet loss) easily exceeds 10 s, surfacing the Connect Timeout Error even though the host is healthy. There is no openclaw.json knob to raise it; only an upstream-source patch helps.

Root Cause

Two related asks affecting remote-Ollama setups:

Feature: Expose the undici connect.timeout (TCP connect) so operators running Ollama on a different host (LAN, tailnet) can tolerate slow first-byte responses. Today it is hardcoded to undici's default 10 s and is not reachable from openclaw.json.
Bug: For the same reason=timeout from ollama/<model>, the model-fallback/decision sometimes resolves to decision=fallback_model (and recovers via the configured chain) and sometimes to decision=surface_error (no fallback attempted). The chosen path appears non-deterministic for what looks like the same failure mode.

Fix Action

Fix / Workaround

dist/undici-global-dispatcher-yJO9KyXW.js builds the global dispatcher with bodyTimeout and headersTimeout honoring the embedded-run timeout (per #63175), but the connect block only sets autoSelectFamily / autoSelectFamilyAttemptTimeout — never a connect.timeout. Effective TCP connect timeout is undici's default (10 s).

For LAN/tailnet Ollama the first request after warm-up of a large model (or transient packet loss) easily exceeds 10 s, surfacing the Connect Timeout Error even though the host is healthy. There is no openclaw.json knob to raise it; only an upstream-source patch helps.

Workarounds I rejected

RAW_BUFFERClick to expand / collapse

Summary

Two related asks affecting remote-Ollama setups:

Feature: Expose the undici connect.timeout (TCP connect) so operators running Ollama on a different host (LAN, tailnet) can tolerate slow first-byte responses. Today it is hardcoded to undici's default 10 s and is not reachable from openclaw.json.
Bug: For the same reason=timeout from ollama/<model>, the model-fallback/decision sometimes resolves to decision=fallback_model (and recovers via the configured chain) and sometimes to decision=surface_error (no fallback attempted). The chosen path appears non-deterministic for what looks like the same failure mode.

Environment

OpenClaw 2026.4.15
Host: Linux (Ubuntu), gateway running as systemd user service
Ollama: remote, http://192.168.68.82:11434 (LAN)
Configured chain (relevant entry):
- primary: ollama/gemma4-team
- fallbacks: [\"ollama/qwen72b-team\"] (and ultimately claude-cli/claude-opus-4-7 from defaults)

Evidence — fallback inconsistency

Same gateway, same target (ollama/gemma4-team → 192.168.68.82:11434), same reason=timeout, two different decisions within ~50 minutes:

Run A — 2adb9632-2f8d-4ab0-a1bc-80c4d9107b0e (01:53 UTC): ``` [agent/embedded] embedded run agent end: ... isError=true model=gemma4-team provider=ollama error=LLM request failed: network connection error. rawError=fetch failed | Connect Timeout Error (attempted address: 192.168.68.82:11434, timeout: 10000ms) [agent/embedded] embedded run failover decision: ... stage=assistant decision=fallback_model reason=timeout from=ollama/gemma4-team profile=- [model-fallback/decision] model fallback decision: decision=candidate_failed requested=ollama/gemma4-team candidate=ollama/gemma4-team reason=timeout next=claude-cli/claude-opus-4-7 [model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=ollama/gemma4-team candidate=claude-cli/claude-opus-4-7 reason=unknown next=none ``` Result: recovered via fallback to Opus 4.7. ✅

Run B — a4d41d27-de26-42f2-8f36-5037132eff5c (02:42 UTC): ``` [agent/embedded] embedded run failover decision: ... stage=assistant decision=surface_error reason=timeout from=ollama/gemma4-team profile=- ``` Result: error surfaced to user, no fallback attempted. ❌

Both runs hit the same Connect Timeout Error to the same address with the same configured chain. The branching point that produces surface_error vs fallback_model is not obvious from the logs.

Evidence — connect.timeout not configurable

Asks

Add a config knob — e.g. models.providers.<id>.connectTimeoutMs or a global infra.net.connectTimeoutMs — that propagates into the undici Agent({ connect: { timeout } }).
Audit the failover decision branch so reason=timeout from=ollama/* is treated consistently — either always fallback_model (preferred) or document the conditions that produce surface_error so operators know when to expect it.

Workarounds I rejected

Patching dist/undici-global-dispatcher-yJO9KyXW.js locally — reverted on next npm i -g openclaw.
Pre-warming the model — masks the timeout but doesn't fix the fallback-policy inconsistency.

Happy to provide a sanitized openclaw.json, more log slices, or test a candidate fix.

extent analysis

TL;DR

To address the inconsistent fallback behavior and non-configurable TCP connect timeout, consider adding a config knob for connectTimeoutMs and auditing the failover decision branch to ensure consistent treatment of reason=timeout errors.

Guidance

Add a configuration option, such as models.providers.<id>.connectTimeoutMs or infra.net.connectTimeoutMs, to allow operators to adjust the TCP connect timeout.
Review the failover decision logic to identify the conditions that lead to surface_error instead of fallback_model for reason=timeout errors and ensure consistent behavior.
Verify that the connectTimeoutMs value is properly propagated to the undici Agent configuration.
Test the updated configuration with various network conditions to ensure the fallback policy works as expected.

Example

No code snippet is provided as the issue requires changes to the underlying configuration and logic.

Notes

The current implementation of the failover decision branch may contain subtle conditions that affect the choice between fallback_model and surface_error. A thorough review of the code and logs is necessary to identify and address these conditions.

Recommendation

Apply a workaround by adding a custom configuration option for connectTimeoutMs and auditing the failover decision branch to ensure consistent behavior, as a permanent fix requires changes to the underlying codebase.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Feature/Bug]: Expose undici connect.timeout for Ollama provider + make fallback decision consistent on reason=timeout [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds I rejected

Summary

Environment

Evidence — fallback inconsistency

Evidence — connect.timeout not configurable

Asks

Workarounds I rejected

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Feature/Bug]: Expose undici connect.timeout for Ollama provider + make fallback decision consistent on reason=timeout [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds I rejected

Summary

Environment

Evidence — fallback inconsistency

Evidence — connect.timeout not configurable

Asks

Workarounds I rejected

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING