If OpenClaw is configured with a local Ollama model and a constrained runtime context, it should: - respect the effective runtime cap in practice - avoid pathological prompt-ingestion latency - avoid falling back when the model is actually healthy for shorter local tasks

openclaw - 💡(How to fix) Fix Performance: Slow Ollama qwen3:14b prompt ingestion in long-context OpenClaw runs [1 participants]

BenSHPD · 2026-04-07T03:08:48Z

[openclaw] Local Ollama works correctly when called directly through a small proxy and through short direct tests, but in OpenClaw-style long-context runs it b… Local Ollama works correctly when called directly through a small proxy and through short direct tests, but in OpenClaw-style long-context runs it becomes impractically slow and falls back to cloud models. The key finding is: - short direct tests are fast and correct - long-context runs are not failing at generation - they are spending almost all time in prompt ingestion / prompt evaluation This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions. ### Bug type Performance / local Ollama integration ### Summary Local Ollama works correctly when called directly through a small proxy and through short direct tests, but in OpenClaw-style long-context runs it becomes impractically slow and falls back to cloud models. The key finding is: - short direct tests are fast and correct - long-context runs are not failing at generation - they are spending almost all time in prompt ingestion / prompt evaluation This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions. ### Environment - OpenClaw: [fill your exact version] - Ollama: [fill your exact version] - OS: macOS on Apple Silicon (Mac mini M3) - Model: - Transport: local Ollama via native proxy - Proxy behavior: - forces - forces - forces - logs request size and upstream latency ### What I verified 1. Local Ollama itself is working. 2. The proxy is working. 3. Short direct tests are fast and correct: - - short Chinese answer - strict number output - strict JSON output 4. So this is not a basic Ollama connectivity problem. ### Reproduction 1. Configure OpenClaw to use local Ollama . 2. Run a realistic multi-turn session or a long-context request. 3. Observe fallback to a cloud model. 4. Compare with direct local test against the same model using a controlled proxy / direct request. ### Direct test results Short tests are healthy: - trivial prompt returns in about 0.3–1.4s - strict JSON / strict number / short Chinese reply all succeed Long-context direct test: - - repeated Chinese context block - result: - - - - - - This strongly suggests the main bottleneck is prompt ingestion, not token generation. ### Expected behavior If OpenClaw is configured with a local Ollama model and a constrained runtime context, it should: - respect the effective runtime cap in practice - avoid pathological prompt-ingestion latency - avoid falling back when the model is actually healthy for shorter local tasks ### Actual behavior OpenClaw-style runs repeatedly fall back even though: - the local model is loaded - direct short tests are healthy - the model can answer correctly In practice, the local model spends 100s+ ingesting the prompt and is no longer viable as the primary model. ### Why I think this is an OpenClaw integration/runtime issue The same local model behaves well in direct short tests. The slowdown appears only once the OpenClaw-style context packaging / runtime behavior is involved. This looks related to: - runtime context budgeting - prompt packaging / multi-turn message expansion - failover thresholds interacting badly with local prompt-ingestion latency ### Questions 1. Is OpenClaw expected to compact / trim more aggressively for local Ollama models? 2. Is there a recommended ceiling specifically for local Qwen 14B-class models? 3. Could failover logic distinguish “slow prompt ingestion on healthy local model” from actual model failure? 4. Is this related to the other reports where OpenClaw appears to send a much larger effective context than expected? ### Additional notes I can provide: - proxy logs - exact config snippets - reproducible direct-vs-OpenClaw comparisons - screenshots of fallback behavior

openclaw2026-04-07 03:08:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62267•Fetched 2026-04-08 03:07:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

BenSHPD

Participants

BenSHPD

Local Ollama works correctly when called directly through a small proxy and through short direct tests, but in OpenClaw-style long-context runs it becomes impractically slow and falls back to cloud models.

The key finding is:

short direct tests are fast and correct
long-context runs are not failing at generation
they are spending almost all time in prompt ingestion / prompt evaluation

This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions.

Root Cause

The key finding is:

short direct tests are fast and correct
long-context runs are not failing at generation
they are spending almost all time in prompt ingestion / prompt evaluation

This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions.

RAW_BUFFERClick to expand / collapse

Bug type

Performance / local Ollama integration

Summary

The key finding is:

short direct tests are fast and correct
long-context runs are not failing at generation
they are spending almost all time in prompt ingestion / prompt evaluation

This makes the local model effectively unusable as a primary OpenClaw model for real multi-turn sessions.

Environment

OpenClaw: [fill your exact version]
Ollama: [fill your exact version]
OS: macOS on Apple Silicon (Mac mini M3)
Model:
Transport: local Ollama via native proxy
Proxy behavior:
forces
forces
forces
logs request size and upstream latency

What I verified

Local Ollama itself is working.
The proxy is working.
Short direct tests are fast and correct:

short Chinese answer
strict number output
strict JSON output

So this is not a basic Ollama connectivity problem.

Reproduction

Configure OpenClaw to use local Ollama .
Run a realistic multi-turn session or a long-context request.
Observe fallback to a cloud model.
Compare with direct local test against the same model using a controlled proxy / direct request.

Direct test results

Short tests are healthy:

trivial prompt returns in about 0.3–1.4s
strict JSON / strict number / short Chinese reply all succeed

Long-context direct test:

repeated Chinese context block
result:

This strongly suggests the main bottleneck is prompt ingestion, not token generation.

Expected behavior

If OpenClaw is configured with a local Ollama model and a constrained runtime context, it should:

respect the effective runtime cap in practice
avoid pathological prompt-ingestion latency
avoid falling back when the model is actually healthy for shorter local tasks

Actual behavior

OpenClaw-style runs repeatedly fall back even though:

the local model is loaded
direct short tests are healthy
the model can answer correctly

In practice, the local model spends 100s+ ingesting the prompt and is no longer viable as the primary model.

Why I think this is an OpenClaw integration/runtime issue

The same local model behaves well in direct short tests. The slowdown appears only once the OpenClaw-style context packaging / runtime behavior is involved.

This looks related to:

runtime context budgeting
prompt packaging / multi-turn message expansion
failover thresholds interacting badly with local prompt-ingestion latency

Questions

Is OpenClaw expected to compact / trim more aggressively for local Ollama models?
Is there a recommended ceiling specifically for local Qwen 14B-class models?
Could failover logic distinguish “slow prompt ingestion on healthy local model” from actual model failure?
Is this related to the other reports where OpenClaw appears to send a much larger effective context than expected?

Additional notes

I can provide:

proxy logs
exact config snippets
reproducible direct-vs-OpenClaw comparisons
screenshots of fallback behavior

extent analysis

TL;DR

The most likely fix is to optimize OpenClaw's prompt ingestion and context packaging for local Ollama models to reduce latency.

Guidance

Investigate OpenClaw's runtime context budgeting and prompt packaging to identify potential bottlenecks.
Check if there are any configuration options to compact or trim the context more aggressively for local Ollama models.
Review the failover logic to distinguish between slow prompt ingestion on a healthy local model and actual model failure.
Compare the effective context size sent by OpenClaw with the expected size to determine if it's related to the reported issues of larger-than-expected context sizes.

Example

No specific code snippet can be provided without more information on the OpenClaw and Ollama implementations. However, reviewing the proxy logs and exact config snippets may help identify the root cause of the issue.

Notes

The issue seems to be related to the interaction between OpenClaw's runtime behavior and the local Ollama model's prompt ingestion latency. The fact that short direct tests are fast and correct suggests that the local model itself is not the problem.

Recommendation

Apply a workaround by optimizing OpenClaw's prompt ingestion and context packaging for local Ollama models, as this is likely to reduce the latency and make the local model viable as the primary model. This may involve adjusting configuration options or modifying the OpenClaw code to better handle local models.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

If OpenClaw is configured with a local Ollama model and a constrained runtime context, it should:

respect the effective runtime cap in practice
avoid pathological prompt-ingestion latency
avoid falling back when the model is actually healthy for shorter local tasks

#environment setup #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Performance: Slow Ollama qwen3:14b prompt ingestion in long-context OpenClaw runs [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Bug type

Summary

Environment

What I verified

Reproduction

Direct test results

Long-context direct test:

Expected behavior

Actual behavior

Why I think this is an OpenClaw integration/runtime issue

Questions

Additional notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Performance: Slow Ollama qwen3:14b prompt ingestion in long-context OpenClaw runs [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Bug type

Summary

Environment

What I verified

Reproduction

Direct test results

Long-context direct test:

Expected behavior

Actual behavior

Why I think this is an OpenClaw integration/runtime issue

Questions

Additional notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING