openclaw - 💡(How to fix) Fix LLM idle timeout cannot be configured for long local llama.cpp requests in 2026.5.3-1 [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77744Fetched 2026-05-06 06:22:04
View on GitHub
Comments
2
Participants
2
Timeline
3
Reactions
3
Author
Timeline (top)
commented ×2subscribed ×1

After updating to OpenClaw 2026.5.3-1, long local llama.cpp requests time out even though the backend is still actively processing the prompt.

The old agents.defaults.llm.idleTimeoutSeconds config is rejected as an unrecognized key. The suggested models.providers.<id>.timeoutSeconds config is accepted and hot reloaded, but it does not prevent the chat from being cut off during long local model prefill.

This looks like the idle watchdog is still using another timeout path that is not configurable through the current schema.

Root Cause

A long local llama.cpp prefill should not be treated as a failed model request merely because no output token has been produced yet.

Code Example

"models": {
  "mode": "merge",
  "providers": {
    "llamacpp": {
      "baseUrl": "http://cerebro-mac:8080/v1",
      "apiKey": "dummy",
      "api": "openai-responses",
      "timeoutSeconds": 14400,
      "models": [
        {
          "id": "qwen3.6-35b",
          "name": "Qwen 3.6 35B A3B local llama.cpp",
          "reasoning": true,
          "input": ["text"],
          "contextWindow": 262144,
          "maxTokens": 100000
        }
      ]
    }
  }
}

---

"agents": {
  "defaults": {
    "timeoutSeconds": 14400,
    "model": {
      "primary": "llamacpp/qwen3.6-35b",
      "fallbacks": []
    },
    "compaction": {
      "timeoutSeconds": 10800
    },
    "models": {
      "llamacpp/qwen3.6-35b": {
        "timeoutSeconds": 14400,
        "streaming": true
      }
    }
  }
}

---

{
  "agents": {
    "defaults": {
      "llm": {
        "idleTimeoutSeconds": 3600
      }
    }
  }
}

---

agents.defaults: Unrecognized key: "llm"

---

Embedded agent failed before reply: All models failed
llamacpp/qwen3.6-35b: LLM request timed out

---

config hot reload applied (... models.providers.llamacpp.timeoutSeconds ...)

---

config reload skipped (invalid config): agents.defaults: Unrecognized key: "llm"

---

task.n_tokens = 152780
prompt processing progress ...
n_tokens = 67584
srv stop: cancel task
done request: POST /v1/responses 200
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After updating to OpenClaw 2026.5.3-1, long local llama.cpp requests time out even though the backend is still actively processing the prompt.

The old agents.defaults.llm.idleTimeoutSeconds config is rejected as an unrecognized key. The suggested models.providers.<id>.timeoutSeconds config is accepted and hot reloaded, but it does not prevent the chat from being cut off during long local model prefill.

This looks like the idle watchdog is still using another timeout path that is not configurable through the current schema.

Steps to reproduce

  1. Install or update OpenClaw to 2026.5.3-1.
  2. Run OpenClaw in Docker.
  3. Configure a local llama.cpp / OpenAI-compatible provider.
  4. Use a large-context local model, for example qwen3.6-35b.
  5. Configure the agent primary model as llamacpp/qwen3.6-35b.
  6. Send a very large prompt, around 150k tokens, through webchat.
  7. Observe that llama.cpp keeps processing the prompt, but OpenClaw cancels before the model produces a reply.
  8. Try adding agents.defaults.llm.idleTimeoutSeconds.
  9. Observe OpenClaw rejects the config with Unrecognized key: "llm".
  10. Try using models.providers.llamacpp.timeoutSeconds.
  11. Observe the config is accepted and hot reloaded, but the chat still times out.

Expected behavior

OpenClaw should wait for the local provider according to the configured timeout, especially when the provider is still processing a long prefill and has not failed.

A long local llama.cpp prefill should not be treated as a failed model request merely because no output token has been produced yet.

Actual behavior

OpenClaw cuts off the agent request before the local llama.cpp backend finishes prompt prefill and produces a reply.

The llama.cpp server is still alive and processing. It is not dead. The request is cancelled from the OpenClaw side before completion.

When fallbacks are configured, OpenClaw then tries fallback providers even though the local model was still working.

OpenClaw version

2026.5.3-1

Operating system

Docker container on NAS, with local llama.cpp backend running on Mac.

Install method

Docker / custom image based on ghcr.io/openclaw/openclaw:latest.

Model

OpenClaw model id:

llamacpp/qwen3.6-35b

Local backend model:

Qwen3.6-35B-A3B-GGUF

Qwen3.6-35B-A3B-Q6_K.gguf

Provider / routing chain

OpenClaw → local llama.cpp OpenAI-compatible server → /v1/responses

Additional provider/model setup details

Provider config:

"models": {
  "mode": "merge",
  "providers": {
    "llamacpp": {
      "baseUrl": "http://cerebro-mac:8080/v1",
      "apiKey": "dummy",
      "api": "openai-responses",
      "timeoutSeconds": 14400,
      "models": [
        {
          "id": "qwen3.6-35b",
          "name": "Qwen 3.6 35B A3B local llama.cpp",
          "reasoning": true,
          "input": ["text"],
          "contextWindow": 262144,
          "maxTokens": 100000
        }
      ]
    }
  }
}

Agent config:

"agents": {
  "defaults": {
    "timeoutSeconds": 14400,
    "model": {
      "primary": "llamacpp/qwen3.6-35b",
      "fallbacks": []
    },
    "compaction": {
      "timeoutSeconds": 10800
    },
    "models": {
      "llamacpp/qwen3.6-35b": {
        "timeoutSeconds": 14400,
        "streaming": true
      }
    }
  }
}

Config that no longer works:

{
  "agents": {
    "defaults": {
      "llm": {
        "idleTimeoutSeconds": 3600
      }
    }
  }
}

This raises:

agents.defaults: Unrecognized key: "llm"

models.providers.llamacpp.timeoutSeconds is accepted by hot reload, but the long request still gets cancelled.

Logs, screenshots, and evidence

OpenClaw logs show fallback and timeout behavior:

Embedded agent failed before reply: All models failed
llamacpp/qwen3.6-35b: LLM request timed out

Config reload confirms models.providers.llamacpp.timeoutSeconds is accepted:

config hot reload applied (... models.providers.llamacpp.timeoutSeconds ...)

Adding agents.defaults.llm.idleTimeoutSeconds fails:

config reload skipped (invalid config): agents.defaults: Unrecognized key: "llm"

The local llama.cpp backend receives the request and starts processing a very large prompt:

task.n_tokens = 152780
prompt processing progress ...
n_tokens = 67584
srv stop: cancel task
done request: POST /v1/responses 200

This suggests the backend was still processing and did not crash. The request was cancelled before the model had a chance to produce a reply.

Impact and severity

Affected: local large-context llama.cpp users.

Severity: Critical for long-running local model workflows.

Frequency: 100% on very large prompts since updating to 2026.5.3-1.

Consequence: long local model requests cannot complete because OpenClaw cancels before the first reply.

Additional information

This appears to be a regression or schema/config mismatch around idle timeout behavior.

The old agents.defaults.llm.idleTimeoutSeconds path is rejected.

The new suggested models.providers.<id>.timeoutSeconds path is accepted but does not stop the idle watchdog cancellation.

Question:

What is the correct supported config key in 2026.5.3-1 to increase the idle watchdog timeout for long local model prefill?

Is there a separate hidden/default idle timeout still active even when models.providers.<id>.timeoutSeconds is set?

extent analysis

TL;DR

The issue can be resolved by identifying and configuring the correct timeout setting for long local model prefill in OpenClaw 2026.5.3-1, potentially involving models.providers.<id>.timeoutSeconds or another undiscovered configuration key.

Guidance

  1. Review OpenClaw documentation: Check the official OpenClaw documentation for 2026.5.3-1 to see if there are any changes or updates to configuration keys related to timeout settings, especially for local model prefill.
  2. Experiment with configuration keys: Try combining models.providers.<id>.timeoutSeconds with other potential timeout settings (e.g., agents.defaults.timeoutSeconds) to see if there's an interaction or override that could prevent the idle watchdog from cancelling long requests.
  3. Inspect OpenClaw logs closely: Look for any log messages that might indicate which timeout setting is being used or if there's another timeout mechanism at play that's not immediately apparent from the configuration.
  4. Test with smaller prompts: Gradually increase the prompt size to identify the threshold at which OpenClaw starts cancelling requests, which might provide insight into the timeout value being used.

Example

No specific code example is provided due to the lack of direct code references in the issue. However, configuring the models.providers.llamacpp.timeoutSeconds as shown in the provider config might be a starting point:

"models": {
  "providers": {
    "llamacpp": {
      "baseUrl": "http://cerebro-mac:8080/v1",
      "apiKey": "dummy",
      "api": "openai-responses",
      "timeoutSeconds": 14400, // Adjust this value
      "models": [
        {
          "id": "qwen3.6-35b",
          "name": "Qwen 3.6 35B A3B local llama.cpp",

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

OpenClaw should wait for the local provider according to the configured timeout, especially when the provider is still processing a long prefill and has not failed.

A long local llama.cpp prefill should not be treated as a failed model request merely because no output token has been produced yet.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING