openclaw - 💡(How to fix) Fix Performance: Slow Ollama qwen3:14b prompt ingestion in long-context OpenClaw runs [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62267Fetched 2026-04-08 03:07:00
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Local Ollama works correctly when called directly through a small proxy and through short direct tests, but in OpenClaw-style long-context runs it becomes impractically slow and falls back to cloud models.

The key finding is:

  • short direct tests are fast and correct
  • long-context runs are not failing at generation
  • they are spending almost all time in prompt ingestion / prompt evaluation

This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions.

Root Cause

Local Ollama works correctly when called directly through a small proxy and through short direct tests, but in OpenClaw-style long-context runs it becomes impractically slow and falls back to cloud models.

The key finding is:

  • short direct tests are fast and correct
  • long-context runs are not failing at generation
  • they are spending almost all time in prompt ingestion / prompt evaluation

This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions.

RAW_BUFFERClick to expand / collapse

Bug type

Performance / local Ollama integration

Summary

Local Ollama works correctly when called directly through a small proxy and through short direct tests, but in OpenClaw-style long-context runs it becomes impractically slow and falls back to cloud models.

The key finding is:

  • short direct tests are fast and correct
  • long-context runs are not failing at generation
  • they are spending almost all time in prompt ingestion / prompt evaluation

This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions.

Environment

  • OpenClaw: [fill your exact version]
  • Ollama: [fill your exact version]
  • OS: macOS on Apple Silicon (Mac mini M3)
  • Model:
  • Transport: local Ollama via native proxy
  • Proxy behavior:
  • forces
  • forces
  • forces
  • logs request size and upstream latency

What I verified

  1. Local Ollama itself is working.
  2. The proxy is working.
  3. Short direct tests are fast and correct:
  • short Chinese answer
  • strict number output
  • strict JSON output
  1. So this is not a basic Ollama connectivity problem.

Reproduction

  1. Configure OpenClaw to use local Ollama .
  2. Run a realistic multi-turn session or a long-context request.
  3. Observe fallback to a cloud model.
  4. Compare with direct local test against the same model using a controlled proxy / direct request.

Direct test results

Short tests are healthy:

  • trivial prompt returns in about 0.3–1.4s
  • strict JSON / strict number / short Chinese reply all succeed

Long-context direct test:

  • repeated Chinese context block
  • result:

This strongly suggests the main bottleneck is prompt ingestion, not token generation.

Expected behavior

If OpenClaw is configured with a local Ollama model and a constrained runtime context, it should:

  • respect the effective runtime cap in practice
  • avoid pathological prompt-ingestion latency
  • avoid falling back when the model is actually healthy for shorter local tasks

Actual behavior

OpenClaw-style runs repeatedly fall back even though:

  • the local model is loaded
  • direct short tests are healthy
  • the model can answer correctly

In practice, the local model spends 100s+ ingesting the prompt and is no longer viable as the primary model.

Why I think this is an OpenClaw integration/runtime issue

The same local model behaves well in direct short tests. The slowdown appears only once the OpenClaw-style context packaging / runtime behavior is involved.

This looks related to:

  • runtime context budgeting
  • prompt packaging / multi-turn message expansion
  • failover thresholds interacting badly with local prompt-ingestion latency

Questions

  1. Is OpenClaw expected to compact / trim more aggressively for local Ollama models?
  2. Is there a recommended ceiling specifically for local Qwen 14B-class models?
  3. Could failover logic distinguish “slow prompt ingestion on healthy local model” from actual model failure?
  4. Is this related to the other reports where OpenClaw appears to send a much larger effective context than expected?

Additional notes

I can provide:

  • proxy logs
  • exact config snippets
  • reproducible direct-vs-OpenClaw comparisons
  • screenshots of fallback behavior

extent analysis

TL;DR

The most likely fix is to optimize OpenClaw's prompt ingestion and context packaging for local Ollama models to reduce latency.

Guidance

  • Investigate OpenClaw's runtime context budgeting and prompt packaging to identify potential bottlenecks.
  • Check if there are any configuration options to compact or trim the context more aggressively for local Ollama models.
  • Review the failover logic to distinguish between slow prompt ingestion on a healthy local model and actual model failure.
  • Compare the effective context size sent by OpenClaw with the expected size to determine if it's related to the reported issues of larger-than-expected context sizes.

Example

No specific code snippet can be provided without more information on the OpenClaw and Ollama implementations. However, reviewing the proxy logs and exact config snippets may help identify the root cause of the issue.

Notes

The issue seems to be related to the interaction between OpenClaw's runtime behavior and the local Ollama model's prompt ingestion latency. The fact that short direct tests are fast and correct suggests that the local model itself is not the problem.

Recommendation

Apply a workaround by optimizing OpenClaw's prompt ingestion and context packaging for local Ollama models, as this is likely to reduce the latency and make the local model viable as the primary model. This may involve adjusting configuration options or modifying the OpenClaw code to better handle local models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If OpenClaw is configured with a local Ollama model and a constrained runtime context, it should:

  • respect the effective runtime cap in practice
  • avoid pathological prompt-ingestion latency
  • avoid falling back when the model is actually healthy for shorter local tasks

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING