openclaw - ✅(Solved) Fix GLM 5.1 / non-frontier models: stopReason=error + output=0 turns silently abort with no retry path [1 pull requests, 1 participants]

openclaw2026-04-17 21:26:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68281•Fetched 2026-04-18 05:53:22

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Chased1k

Participants

Chased1k

Timeline (top)

cross-referenced ×1referenced ×1

This is distinct from #68076. #68076 describes stopReason=stop payloads=0 caused by an OpenRouter baseUrl misconfiguration. The case here is stopReason=error from a provider that did return a valid wire response indicating the turn errored — no config issue, just a transient model/transport crash that has no existing retry path for non-frontier models.

Error Message

When ollama/glm-5.1:cloud (and, less commonly, other models) ends a turn with stopReason="error", usage.output=0, and empty content[], the embedded runner logs incomplete turn detected: … stopReason=error payloads=0 and bails. The user sees either no reply or the generic "Agent couldn't generate a response" payload. A one-word nudge ("boop", "continue", ".") from the user reliably recovers, which proves the next attempt would have succeeded if the harness had retried itself. This is distinct from #68076. #68076 describes stopReason=stop payloads=0 caused by an OpenRouter baseUrl misconfiguration. The case here is stopReason=error from a provider that did return a valid wire response indicating the turn errored — no config issue, just a transient model/transport crash that has no existing retry path for non-frontier models. stopReason=error, usage.output=0, content=[] Concrete example (redacted session id): turn at 15:18:21 logs stop=error out=0 content=[]; user sends "boop" at 15:25:22; next turn succeeds with the intended output. 3. Observe occasional turns where the log line incomplete turn detected: … stopReason=error payloads=0 appears and the user sees no reply. The harness retries a silent-error turn (bounded — 3 retries is plenty), model-agnostic, without reissuing any different prompt. Only when all retries fail should the "incomplete turn detected" error surface to the user.

sessionLastAssistant.stopReason === \"error\"
content.length ?? 0 === 0 (load-bearing — excludes reasoning-only error turns that existing coverage relies on) No model gating. The signature (error + zero output + empty content) is itself vendor-neutral.

Caps at 3 retries → 4 total attempts → surfaces incomplete-turn error.

Root Cause

Fix Action

Fixed

Fixed by PR: fix(pi-embedded-runner): retry silent stopReason=error turns (non-frontier models) (https://github.com/openclaw/openclaw/pull/68310)

PR fix notes

PR #68310: fix(pi-embedded-runner): retry silent stopReason=error turns (non-frontier models)

Repository: openclaw/openclaw
Author: Chased1k
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/68310

Description (problem / solution / changelog)

Summary

Adds a narrow, model-agnostic resubmission in the embedded runner attempt loop for turns that end with stopReason="error" + usage.output=0 + empty content[]. Bounded at 3 retries. Placed before the incompleteTurnText surface-to-user return so it actually gets a chance to fire.
Primary symptom: ollama/glm-5.1:cloud occasionally ends a turn this way after a successful tool-call sequence. The existing empty-response retry path (resolveEmptyResponseRetryInstruction in src/agents/pi-embedded-runner/run/incomplete-turn.ts) is gated on the strict-agentic contract (gpt-5 family only), so non-frontier models fall through to "incomplete turn detected" with payloads=0 and no recovery — the user sees no reply and has to nudge.
Also observed rarely on claude-opus-4-7. The guard conditions (error + zero output + empty content) are themselves vendor-neutral, so no model gating.

Closes #68281.

Why the content-empty guard is load-bearing

A reasoning-only error — where the assistant produced thinking blocks before the turn errored — also has output=0 against the typed usage shape. Without the content.length === 0 check the retry fires for those too and breaks the existing planning-only retry when the assistant ended in error coverage. Empty content distinguishes a silent crash from "the model produced hidden tokens before erroring."

Evidence

Corpus scan across ~412 session files in a long-running agent showed 23 user-nudge events ("boop" / "ahem" / "." / "continue"). 19/23 immediately follow a turn matching exactly this signature (stopReason=error, output=0, empty content, ollama/glm-5.1:cloud). See linked issue for more detail.

Test plan

New src/agents/pi-embedded-runner/run.empty-error-retry.test.ts:
- Retries for ollama/glm-5.1:cloud → succeeds on 2nd attempt
- Caps at MAX_EMPTY_ERROR_RETRIES = 3 → 4 total attempts → surfaces incomplete-turn error
- Does NOT retry when output > 0 (preserve produced text)
- Does NOT retry when stopReason=stop + output=0 (NO_REPLY path)
- Retries for anthropic/claude-opus-4-7 too (model-agnostic)
pnpm test src/agents/pi-embedded-runner → 629/629 pass locally on this branch
pnpm test src/agents/pi-embedded-runner/run.empty-error-retry.test.ts → 5/5 pass
oxfmt / oxlint clean
CI (will populate after push)

Notes

Related but distinct from #68076: that bug is stopReason=stop payloads=0 caused by an OpenRouter baseUrl misconfiguration (a config fix). This PR addresses the stopReason=error variant (a transient model/transport crash with no existing retry path for non-frontier models). Test case 4 explicitly leaves the stop + output=0 scenario alone.
No changelog entry added per the repo guideline that pure bug fixes without user-visible wording changes can land without one; happy to add ### Fixes: GLM 5.1 / non-frontier models auto-retry silent error turns. to the active version block if preferred.

🤖 Generated with Claude Code

Changed files

src/agents/pi-embedded-runner/run.empty-error-retry.test.ts (added, +156/-0)
src/agents/pi-embedded-runner/run.ts (modified, +44/-1)

Code Example

stopReason=error, usage.output=0, content=[]
provider=ollama, model=glm-5.1:cloud

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

Why there's no retry today

The existing empty-response retry (resolveEmptyResponseRetryInstruction in src/agents/pi-embedded-runner/run/incomplete-turn.ts) is gated on isStrictAgenticSupportedProviderModel, which returns true only for openai / openai-codex / mock-openai with the gpt-5 family (src/agents/execution-contract.ts:46). Non-frontier models (ollama/glm, many openrouter models, some anthropic cases) fall through to the "incomplete turn detected" branch with no recovery.

Other retry paths don't cover this either:

planningOnlyRetry / reasoningOnlyRetry — require non-empty content
continuationAttempts — stopReason=length only
timeoutCompaction — only on transport timeout

Evidence

Corpus scan across ~412 session files in a long-running agent showed 23 user-nudge events ("boop" / "ahem" / "." / "continue" / "do it"). 19/23 immediately follow a turn with exactly this signature:

stopReason=error, usage.output=0, content=[]
provider=ollama, model=glm-5.1:cloud

The remaining 4 follow clean stopReason=stop intent-announce turns (separate issue — prompt steer, not retry).

Concrete example (redacted session id): turn at 15:18:21 logs stop=error out=0 content=[]; user sends "boop" at 15:25:22; next turn succeeds with the intended output.

Steps to reproduce

Use ollama or an OpenRouter-hosted provider with glm-5.1:cloud (or similar non-frontier model).
Run an agentic session with multiple tool-call rounds.
Observe occasional turns where the log line incomplete turn detected: … stopReason=error payloads=0 appears and the user sees no reply.

Reliable repro is hard because the provider-side failure is intermittent, but it recurs within minutes on a moderately active agent.

Expected behavior

The harness retries a silent-error turn (bounded — 3 retries is plenty), model-agnostic, without reissuing any different prompt. Only when all retries fail should the "incomplete turn detected" error surface to the user.

Actual behavior

Turn aborts silently. User has to nudge to recover.

Proposed fix

Add a narrow retry block in src/agents/pi-embedded-runner/run.ts inside the attempt loop, before the if (incompleteTurnText) surface-to-user return. Conditions:

incompleteTurnText set and not aborted / promptError / timedOut
sessionLastAssistant.stopReason === \"error\"
usage.output ?? 0 === 0
content.length ?? 0 === 0 (load-bearing — excludes reasoning-only error turns that existing coverage relies on)
retry counter under MAX_EMPTY_ERROR_RETRIES = 3

If all match, bump the counter and continue the attempt loop to resubmit the same request.

No model gating. The signature (error + zero output + empty content) is itself vendor-neutral.

New coverage

Five Vitest cases via loadRunOverflowCompactionHarness():

Retries on ollama/glm-5.1:cloud → succeeds on 2nd attempt.
Caps at 3 retries → 4 total attempts → surfaces incomplete-turn error.
Does NOT retry when output > 0 (model produced text before erroring).
Does NOT retry when stopReason=stop + output=0 (NO_REPLY path; legit silent reply).
Retries for anthropic/claude-opus-4-7 too — confirms model-agnostic.

Full suite (591/591 in pi-embedded-runner) green. Oxfmt/oxlint clean.

A draft PR is open separately with the three-file change (run.ts + a system-prompt steer for the smaller intent-announce cluster + the test file).

OpenClaw version

2026.4.14 (reproduced); also reviewed against current main.

Operating system

macOS

Install method

npm (global)

Model

ollama / glm-5.1:cloud (primary); also observed on anthropic/claude-opus-4-7 (rare)

Provider / routing chain

local

extent analysis

TL;DR

Add a retry mechanism for silent-error turns with a bounded number of retries to handle intermittent provider-side failures.

Guidance

Identify turns with stopReason="error", usage.output=0, and empty content[] to determine when a retry is needed.
Implement a retry counter with a maximum number of retries (e.g., 3) to prevent infinite loops.
Modify the src/agents/pi-embedded-runner/run.ts file to include a retry block before the if (incompleteTurnText) return statement.
Ensure the retry mechanism is model-agnostic and does not rely on specific provider or model configurations.

Example

const MAX_EMPTY_ERROR_RETRIES = 3;
let retryCounter = 0;

// ...

if (incompleteTurnText && 
    sessionLastAssistant.stopReason === "error" && 
    usage.output === 0 && 
    content.length === 0 && 
    retryCounter < MAX_EMPTY_ERROR_RETRIES) {
  retryCounter++;
  continue; // resubmit the same request
}

Notes

The proposed fix assumes that the intermittent provider-side failures are transient and can be recovered from with a retry mechanism. However, if the failures are persistent or caused by a underlying issue, additional debugging and troubleshooting may be necessary.

Recommendation

Apply the proposed workaround by adding a retry mechanism to handle silent-error turns, as it provides a bounded number of retries to recover from intermittent provider-side failures.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#model download #tokenizer error #prompt formatting #chain error #conversation history

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix GLM 5.1 / non-frontier models: stopReason=error + output=0 turns silently abort with no retry path [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #68310: fix(pi-embedded-runner): retry silent stopReason=error turns (non-frontier models)

Description (problem / solution / changelog)

Summary

Why the content-empty guard is load-bearing

Evidence

Test plan

Notes

Changed files

Code Example

Bug type

Summary

Why there's no retry today

Evidence

Steps to reproduce

Expected behavior

Actual behavior

Proposed fix

New coverage

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING