openclaw - ✅(Solved) Fix Codex app-server watchdog can timeout after status-only assistant item before next tool call [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77984Fetched 2026-05-06 06:18:15
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
2
Timeline (top)
cross-referenced ×3referenced ×2commented ×1

OpenClaw's Codex app-server turn-completion watchdog can fire when the assistant emits a status/progress sentence as a rawResponseItem/completed notification, then stays internally silent for more than 60s before the next tool call or turn/completed.

This showed up in daily isolated cron runs that had already completed research tool calls. The last visible assistant output was a transitional sentence like "I'm writing the report...", but no write tool call or terminal event followed within the hardcoded 60s window, so OpenClaw aborted the attempt.

Root Cause

A common assistant behavior is to emit a short status sentence before starting a file write or final synthesis phase. With the Codex app-server watchdog, that sentence can become the last notification and start a 60s idle window even though the model may still be doing useful internal work or preparing the next tool call.

The current workaround is prompt-level mitigation: instruct agents not to announce the write phase and to make the next action a tool call. That works in our probe, but it is brittle and has to be applied to each affected scheduled prompt.

Fix Action

Fix / Workaround

The current workaround is prompt-level mitigation: instruct agents not to announce the write phase and to make the next action a tool call. That works in our probe, but it is brittle and has to be applied to each affected scheduled prompt.

PR fix notes

PR #77989: fix(codex): don't reset completion-idle watch on rawResponseItem/completed (#77984)

Description (problem / solution / changelog)

Fixes #77984.

Root cause

handleNotification calls touchTurnCompletionActivity for every incoming notification, including rawResponseItem/completed. This reschedules the 60-second turnCompletionIdleWatch timer. When the model emits a status sentence ("I'm writing the report now…") as a rawResponseItem/completed and then stays internally silent for >60s before issuing the next tool call, the timer fires and aborts the run — a false positive.

rawResponseItem/completed is mid-turn assistant text progress, not a post-tool-call idle signal. The 60s timer exists to detect when the model has stopped responding after a tool call completes. Resetting it on a text notification defeats that purpose.

Fix

Add skipCompletionWatch option to touchTurnCompletionActivity. In handleNotification, pass skipCompletionWatch: true for rawResponseItem/completed so the activity timestamp and lastActivityReason are updated (preserving diagnostics) but the 60s completion-idle timer is not rescheduled.

The 30-minute turnTerminalIdleWatch still runs unconditionally for genuine hangs.

+    const isRawResponseProgress = notification.method === "rawResponseItem/completed";
+    touchTurnCompletionActivity(`notification:${notification.method}`, {
+      skipCompletionWatch: isRawResponseProgress,
+    });

Notes

  • touchTurnCompletionActivity is a closure-private function in run-attempt.ts; no external callers.
  • The fix does not change the 30-min terminal-idle watch behavior.
  • The two affected cron sessions from the issue (f415c132, 3c0be461) had lastActivityReason=notification:rawResponseItem/completed at timeout — consistent with this fix.

Changed files

  • extensions/codex/src/app-server/run-attempt.ts (modified, +16/-3)
  • extensions/ollama/provider-discovery.test.ts (modified, +22/-0)
  • extensions/ollama/src/discovery-shared.ts (modified, +7/-1)
  • src/agents/skills/plugin-skills.ts (modified, +1/-1)

PR #78006: fix(daemon): fall back to Startup-folder on localized schtasks access-denied (#77993)

Description (problem / solution / changelog)

Fixes #77993.

Root cause

shouldFallbackToStartupEntry matched only the English string "Access is denied". On Windows hosts with a non-English locale, schtasks /Create emits the localized equivalent — for example Spanish "Acceso denegado" — which did not match the regex. The fallback to the per-user Startup-folder launcher was skipped and the install threw schtasks create failed: ERROR: Acceso denegado.

Fix

Extend the denial-string check to cover the most common locale variants:

LocaleString
Englishaccess is denied (existing)
Spanishacceso denegado
Germanzugriff verweigert
Frenchaccès refusé
Italianaccesso negato

All checks are case-insensitive.

Test

New test in schtasks.startup-fallback.test.ts drives installGatewayScheduledTask with stderr: "ERROR: Acceso denegado." and confirms the Startup-folder launcher is created (matching the English coverage added in the existing test).

All 14 startup-fallback tests pass.

Changed files

  • extensions/codex/src/app-server/run-attempt.ts (modified, +16/-3)
  • extensions/ollama/provider-discovery.test.ts (modified, +22/-0)
  • extensions/ollama/src/discovery-shared.ts (modified, +7/-1)
  • src/agents/skills/plugin-skills.ts (modified, +1/-1)
  • src/daemon/schtasks.startup-fallback.test.ts (modified, +13/-0)
  • src/daemon/schtasks.ts (modified, +4/-0)

PR #78014: Expose Codex turn completion idle timeout config

Description (problem / solution / changelog)

Summary

  • expose plugins.entries.codex.config.appServer.turnCompletionIdleTimeoutMs
  • keep default behavior at 60000ms
  • keep this separate from appServer.requestTimeoutMs
  • wire config/schema/manifest/docs/runtime and add regressions

Fixes #77984.

Verification

  • Local workflow batch verification passed before PR branch extraction.
  • Targeted regressions live in Codex app-server config/run-attempt tests.

Changed files

  • docs/plugins/codex-harness.md (modified, +24/-20)
  • extensions/codex/openclaw.plugin.json (modified, +10/-0)
  • extensions/codex/src/app-server/config.test.ts (modified, +15/-0)
  • extensions/codex/src/app-server/config.ts (modified, +8/-0)
  • extensions/codex/src/app-server/run-attempt.test.ts (modified, +74/-0)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +1/-1)
  • extensions/codex/src/app-server/schema-normalization-runtime-contract.test.ts (modified, +1/-0)

Code Example

{
  "plugins": {
    "entries": {
      "codex": {
        "config": {
          "appServer": {
            "turnCompletionIdleTimeoutMs": 120000
          }
        }
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw's Codex app-server turn-completion watchdog can fire when the assistant emits a status/progress sentence as a rawResponseItem/completed notification, then stays internally silent for more than 60s before the next tool call or turn/completed.

This showed up in daily isolated cron runs that had already completed research tool calls. The last visible assistant output was a transitional sentence like "I'm writing the report...", but no write tool call or terminal event followed within the hardcoded 60s window, so OpenClaw aborted the attempt.

Environment

  • OpenClaw: 2026.5.4 (325df3e)
  • Codex plugin package: @openclaw/[email protected]
  • Provider/model: openai-codex/gpt-5.5
  • Runtime: Codex app-server harness (agentRuntime.id=codex, agentHarnessId=codex)
  • Workload: isolated scheduled Researcher report crons with web research + markdown write/copy/verify phase

Observed failing sessions

Two independent cron sessions failed with the same shape:

  1. f415c132-e0b2-434b-884d-1121468d394e

    • Cron: Daily Researcher OpenClaw Ecosystem Deep Dive
    • Timeout event: turn.completion_idle_timeout
    • Timeout data included idleMs=60000, timeoutMs=60000, lastActivityReason=notification:rawResponseItem/completed
    • Last visible assistant text was a transitional status sentence: "I'm writing the report file now, then I'll copy it..."
  2. 3c0be461-1f3b-4dd1-b236-31b2f2c6e126

    • Cron: Daily Researcher AI Industry Briefing
    • Timeout event: turn.completion_idle_timeout
    • Timeout data included idleMs=60000, timeoutMs=60000, lastActivityReason=notification:rawResponseItem/completed
    • Last visible assistant text was a transitional status sentence: "I'm writing the report and then I'll copy and verify all three destinations."

In both cases prompt-token usage was around 54-55% of a 200k context, below the timeout-compaction threshold we found locally, so this did not appear to be the high-token compaction recovery path.

Local code evidence

In the externalized Codex extension bundle:

  • /root/.openclaw/extensions/codex/dist/run-attempt-BNbVe-IG.js
  • CODEX_TURN_COMPLETION_IDLE_TIMEOUT_MS = 6e4
  • The watchdog timeout is resolved via resolveCodexTurnCompletionIdleTimeoutMs(...).
  • Activity is touched for notifications via touchTurnCompletionActivity("notification:<method>").
  • The last activity reason in both failures was notification:rawResponseItem/completed.

The config schema appears to expose plugins.entries.codex.config.appServer.requestTimeoutMs, but not a field for the turn-completion idle timeout. In local dist/config-CkkoMeqF.js, the appServer config object includes fields such as mode, transport, command, args, url, authToken, headers, clearEnv, requestTimeoutMs, approvalPolicy, sandbox, approvalsReviewer, serviceTier, and defaultWorkspaceDir, but no turnCompletionIdleTimeoutMs knob.

Control probe

We ran a controlled one-off probe with the same general workload shape: web fetches, then a markdown report write/verify phase. The prompt explicitly forbade transitional assistant text before the write phase:

After the research calls finish, your next action must be a bash or file-write tool call that writes the markdown report. Do not announce that you are writing.

Result:

  • Session: codex-watchdog-probe-b-20260505T171203Z
  • Evidence dir: /root/.openclaw/workspace-tender/tmp/codex-watchdog-probe-b-20260505T171203Z
  • Runtime confirmed: openai-codex/gpt-5.5, agentRuntime.id=codex, agentHarnessId=codex
  • Duration: 73049ms, longer than the 60s watchdog window
  • Tool calls: 13 (web_fetch + bash), failures: 0
  • Output report: 8313 bytes
  • No turn idle timed out warning observed

This suggests the problematic pattern is not long tool execution itself. Long tool phases emit enough notifications. The risky shape is status-only assistant text followed by internal silence before the next tool call or terminal turn event.

Why this matters

A common assistant behavior is to emit a short status sentence before starting a file write or final synthesis phase. With the Codex app-server watchdog, that sentence can become the last notification and start a 60s idle window even though the model may still be doing useful internal work or preparing the next tool call.

The current workaround is prompt-level mitigation: instruct agents not to announce the write phase and to make the next action a tool call. That works in our probe, but it is brittle and has to be applied to each affected scheduled prompt.

Suggested directions

Possible fixes, either of which would help:

  1. Expose the watchdog as config, for example:
{
  "plugins": {
    "entries": {
      "codex": {
        "config": {
          "appServer": {
            "turnCompletionIdleTimeoutMs": 120000
          }
        }
      }
    }
  }
}
  1. Revisit how rawResponseItem/completed is treated by the turn-completion watchdog. A completed assistant status item may not mean the turn is now idle in the same way as a lack of app-server notifications; it can be an intermediate model item before the next tool call.

  2. Emit a more specific diagnostic when this timeout fires, including the last assistant text/item type and whether the previous event was status-only assistant content versus an active tool call.

Notes

This is related in spirit to prior Codex app-server terminal-notification hardening, but the observed failure here is more specific: terminal events do not merely disappear after turn/start; the app-server emits a non-terminal assistant item (rawResponseItem/completed) and then the watchdog times out waiting for subsequent progress.

extent analysis

TL;DR

Increase the turnCompletionIdleTimeoutMs value or modify the watchdog to handle rawResponseItem/completed notifications differently to prevent premature timeouts.

Guidance

  • Consider increasing the turnCompletionIdleTimeoutMs value to a higher number, such as 120000, to give the model more time to complete its internal work before timing out.
  • Review how rawResponseItem/completed notifications are handled by the turn-completion watchdog and consider treating them as intermediate model items rather than idle events.
  • Emit more specific diagnostic information when the timeout fires, including the last assistant text and item type, to help identify the cause of the issue.

Example

{
  "plugins": {
    "entries": {
      "codex": {
        "config": {
          "appServer": {
            "turnCompletionIdleTimeoutMs": 120000
          }
        }
      }
    }
  }
}

Notes

The suggested solutions assume that the issue is caused by the watchdog timing out too quickly after receiving a rawResponseItem/completed notification. However, the root cause may be more complex and require further investigation.

Recommendation

Apply workaround by increasing the turnCompletionIdleTimeoutMs value, as this is a simpler and more straightforward solution that can be implemented immediately.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Codex app-server watchdog can timeout after status-only assistant item before next tool call [3 pull requests, 1 comments, 2 participants]