openclaw - ✅(Solved) Fix fix(agents): context overflow not detected for llama.cpp server provider [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64180Fetched 2026-04-11 06:16:00
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Timeline (top)
referenced ×5closed ×1commented ×1cross-referenced ×1

Error Message

When llama.cpp returns its native overflow error, openclaw does not classify it as a context overflow → auto-compaction never triggers → the run fails surfacing the raw upstream error to the user. error=400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it It then falls through to matchesProviderContextOverflow() in src/agents/pi-embedded-helpers/provider-error-patterns.ts. The candidate pre-check (PROVIDER_CONTEXT_OVERFLOW_SIGNAL_RE + PROVIDER_CONTEXT_OVERFLOW_ACTION_RE) passes — the message contains "request", "tokens", "context" and "exceeds" — but none of the concrete PROVIDER_CONTEXT_OVERFLOW_PATTERNS regexes match either: | \bollama error:\s*context length exceeded(?:,\s*too many tokens)?\b | No |

  • The run aborts with an opaque error instead of compacting and retrying. Add a llama.cpp-shaped pattern to PROVIDER_CONTEXT_OVERFLOW_PATTERNS in src/agents/pi-embedded-helpers/provider-error-patterns.ts. Llama.cpp's wording is stable across versions: Plus a matching test case in src/agents/pi-embedded-helpers/provider-error-patterns.test.ts. PR to follow.
  • src/agents/pi-embedded-helpers/provider-error-patterns.tsPROVIDER_CONTEXT_OVERFLOW_PATTERNS
  • src/agents/pi-embedded-helpers/provider-error-patterns.test.ts — coverage
  • Model: Qwen3.5-35B-A3B (any model — the error is from llama.cpp's slot manager, not the model)

Fix Action

Fixed

PR fix notes

PR #64196: fix(agents): detect llama.cpp slot overflow as context overflow

Description (problem / solution / changelog)

Summary

  • Adds a llama.cpp-shaped regex to PROVIDER_CONTEXT_OVERFLOW_PATTERNS so isContextOverflowError() recognises the native overflow message from llama.cpp's slot manager (used directly or behind any api: "openai-completions" proxy).
  • Adds 4 unit test cases covering the new pattern (3 parameterised messages + 1 end-to-end isContextOverflowError() assertion).

Closes #64180

The Bug

Self-hosted llama.cpp HTTP servers are very common (ghcr.io/ggml-org/llama.cpp:server-cuda and similar). When a prompt overshoots a slot's --ctx-size, llama.cpp returns:

400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it

This message slips past every existing detector:

CheckWhy it misses
request_too_largewrong wording
context length exceeded / maximum context lengthllama.cpp says "context size", not "context length"
prompt is too long / prompt too longllama.cpp says "request (N tokens) exceeds…"
exceeds model context window / context_window_exceededwrong wording
413 + too largellama.cpp returns 400
\binput token count exceeds the maximum number of input tokens\b (Bedrock)wrong wording
\binput is too long for this model\b (Bedrock/Mistral generic)wrong wording
\binput exceeds the maximum number of tokens\b (Vertex)wrong wording
\bollama error:\s*context length exceeded\b (Ollama)no ollama error: prefix
\btotal tokens?.*exceeds? (Cohere)message has "(N tokens)", not "total tokens"

The generic candidate pre-check (PROVIDER_CONTEXT_OVERFLOW_SIGNAL_RE + PROVIDER_CONTEXT_OVERFLOW_ACTION_RE) does pass — the message contains "request"/"tokens"/"context"/"exceeds" — but no concrete pattern matches, so matchesProviderContextOverflow() returns false. The agent runner sees isContextOverflowError() === false, never enters the compaction branch, and the user gets the raw upstream 400 instead of an automatic compact + retry.

Same class of bug as #58839 (Bedrock/Ollama/Cohere), just for a different provider.

The Fix

One regex added next to the existing provider patterns in src/agents/pi-embedded-helpers/provider-error-patterns.ts:

// llama.cpp HTTP server (often used directly or behind an OpenAI-compatible
// shim) returns "request (N tokens) exceeds the available context size
// (M tokens), try increasing it" when the prompt overshoots a slot's
// ctx-size. Wording is from the upstream slot manager and is stable.
// Example: "400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it"
/\b(?:request|prompt) \(\d[\d,]*\s*tokens?\) exceeds (?:the )?available context size\b/i,

The pattern is anchored on the stable slot-manager wording ((?:request|prompt) (N tokens) exceeds (the )?available context size) so it can't accidentally swallow unrelated provider errors. The existing candidate pre-check still gates the regex evaluation, keeping cost negligible.

Tests

  • provider-error-patterns.test.ts parameterised cases:
    • "400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it" (verbatim production payload)
    • "request (130000 tokens) exceeds available context size (131072 tokens)" (no the, no status prefix)
    • "prompt (8500 tokens) exceeds the available context size (8192 tokens), try increasing it" (the prompt alternation)
  • New end-to-end case under describe("isContextOverflowError with provider patterns") asserting that isContextOverflowError(<verbatim message>) === true, which is what the agent runner actually calls.

Targeted run:

$ pnpm vitest run src/agents/pi-embedded-helpers/provider-error-patterns.test.ts
 Test Files  1 passed (1)
      Tests  26 passed (26)

Broader vitest run shows 2 pre-existing failures in src/channels/plugins/contracts/group-policy.fallback.contract.test.ts — those are not touched by this PR (the change is scoped to two files in src/agents/pi-embedded-helpers/).

Test plan

  • Targeted unit tests pass (26/26 in provider-error-patterns.test.ts)
  • Pattern only matches llama.cpp's specific wording (verified by the existing "does not match unrelated errors" cases)
  • Manual: verify on a llama.cpp deployment that compaction now triggers when a session overshoots --ctx-size

AI-assisted

  • Drafted with Claude Code (Claude Opus 4.6, 1M context)
  • Lightly tested (targeted unit tests pass; manual verification still pending in our prod env)
  • I understand what the code does — added one regex to an existing fallback list and added matching test coverage

🤖 Generated with Claude Code

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/pi-embedded-helpers/provider-error-patterns.test.ts (modified, +14/-0)
  • src/agents/pi-embedded-helpers/provider-error-patterns.ts (modified, +7/-0)

Code Example

400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it

---

[agent] embedded run agent end: ... isError=true
  model=qwen3.5-35b-a3b
  provider=llamacpp-deep
  error=400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it
  rawError=400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it

---

/\b(?:request|prompt) \(\d[\d,]*\s*tokens?\) exceeds (?:the )?available context size\b/i
RAW_BUFFERClick to expand / collapse

Bug Description

isContextOverflowError() and matchesProviderContextOverflow() in src/agents/pi-embedded-helpers/ detect context window overflow for major providers (Anthropic, OpenAI, Bedrock, Ollama, Cohere, Vertex), but the llama.cpp HTTP server (used directly or via vLLM-style local deployments behind api: "openai-completions") is not covered.

When llama.cpp returns its native overflow error, openclaw does not classify it as a context overflow → auto-compaction never triggers → the run fails surfacing the raw upstream error to the user.

Reproduction

Run any llama.cpp server (e.g. ghcr.io/ggml-org/llama.cpp:server-cuda) with a fixed context size, configure it as an openai-completions provider in openclaw, and let an agent build up a context that overshoots the per-slot context.

Llama.cpp returns:

400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it

Openclaw surfaces it directly to the user instead of triggering compaction:

[agent] embedded run agent end: ... isError=true
  model=qwen3.5-35b-a3b
  provider=llamacpp-deep
  error=400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it
  rawError=400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it

Why It Isn't Detected

Walking through isContextOverflowError() in src/agents/pi-embedded-helpers/errors.ts against the message above, none of the existing string/regex checks match:

CheckMatches?
request_too_largeNo
invalid_argument + maximum number of tokensNo
request exceeds the maximum sizeNo
context length exceededNo (llama.cpp says "context size", not "context length")
maximum context lengthNo
prompt is too long / prompt too longNo
exceeds model context windowNo
model token limitNo
input exceeds + maximum number of tokensNo
request size exceeds + context window/lengthNo
context overflow:No
exceed context limitNo
exceeds the model's maximum contextNo
max_tokens + exceed + contextNo
input length + exceed + contextNo
413 + too largeNo (llama.cpp returns 400)
context_window_exceededNo
Chinese proxy patternsNo

It then falls through to matchesProviderContextOverflow() in src/agents/pi-embedded-helpers/provider-error-patterns.ts. The candidate pre-check (PROVIDER_CONTEXT_OVERFLOW_SIGNAL_RE + PROVIDER_CONTEXT_OVERFLOW_ACTION_RE) passes — the message contains "request", "tokens", "context" and "exceeds" — but none of the concrete PROVIDER_CONTEXT_OVERFLOW_PATTERNS regexes match either:

PatternMatches?
\binput token count exceeds the maximum number of input tokens\bNo
\binput is too long for this model\bNo
\binput exceeds the maximum number of tokens\bNo
\bollama error:\s*context length exceeded(?:,\s*too many tokens)?\bNo
\btotal tokens?.*exceeds? (?:the )?(?:model(?:'s)? )?(?:max|maximum|limit)No
\binput (?:is )?too long for (?:the )?model\bNo

So isContextOverflowError() returns false and the agent runner never calls into the compaction path.

Impact

  • Any user pointing openclaw at a llama.cpp HTTP server (a very common self-hosted setup) hits this the first time their session grows past the per-slot context.
  • The run aborts with an opaque error instead of compacting and retrying.
  • Compaction's whole job — rescuing oversized sessions — silently doesn't apply to llama.cpp.

This is the same class of bug as #58839, but for a different provider.

Expected Behavior

isContextOverflowError() (via matchesProviderContextOverflow()) should detect llama.cpp's native overflow wording so the existing compaction pipeline kicks in automatically, just like it does for OpenAI / Anthropic / Bedrock / Ollama / Cohere.

Proposed Fix

Add a llama.cpp-shaped pattern to PROVIDER_CONTEXT_OVERFLOW_PATTERNS in src/agents/pi-embedded-helpers/provider-error-patterns.ts. Llama.cpp's wording is stable across versions:

/\b(?:request|prompt) \(\d[\d,]*\s*tokens?\) exceeds (?:the )?available context size\b/i

Plus a matching test case in src/agents/pi-embedded-helpers/provider-error-patterns.test.ts. PR to follow.

Affected Code

  • src/agents/pi-embedded-helpers/provider-error-patterns.tsPROVIDER_CONTEXT_OVERFLOW_PATTERNS
  • src/agents/pi-embedded-helpers/provider-error-patterns.test.ts — coverage

Environment

  • OpenClaw v2026.4.9
  • Provider: llama.cpp HTTP server (ghcr.io/ggml-org/llama.cpp:server-cuda) configured via api: "openai-completions"
  • Model: Qwen3.5-35B-A3B (any model — the error is from llama.cpp's slot manager, not the model)

extent analysis

TL;DR

Add a new regex pattern to PROVIDER_CONTEXT_OVERFLOW_PATTERNS in src/agents/pi-embedded-helpers/provider-error-patterns.ts to match llama.cpp's native overflow error message.

Guidance

  • Update PROVIDER_CONTEXT_OVERFLOW_PATTERNS with the proposed regex pattern: /\b(?:request|prompt) \(\d[\d,]*\s*tokens?\) exceeds (?:the )?available context size\b/i
  • Add a test case in src/agents/pi-embedded-helpers/provider-error-patterns.test.ts to cover the new pattern
  • Verify that the updated isContextOverflowError() function correctly detects llama.cpp's overflow error and triggers the compaction pipeline
  • Test the fix with a llama.cpp server and a model that exceeds the per-slot context size

Example

// src/agents/pi-embedded-helpers/provider-error-patterns.ts
const PROVIDER_CONTEXT_OVERFLOW_PATTERNS = [
  // ... existing patterns ...
  /\b(?:request|prompt) \(\d[\d,]*\s*tokens?\) exceeds (?:the )?available context size\b/i,
];

Notes

The proposed fix assumes that the regex pattern accurately matches llama.cpp's native overflow error message. If the error message changes in future versions of llama.cpp, the pattern may need to be updated.

Recommendation

Apply the workaround by adding the new regex pattern to PROVIDER_CONTEXT_OVERFLOW_PATTERNS. This will allow the compaction pipeline to trigger correctly for llama.cpp servers, preventing runs from aborting with opaque errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING