openclaw - ✅(Solved) Fix GPT-5.4 / Codex agentic runtime parity in OpenClaw [5 pull requests, 11 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64227Fetched 2026-04-11 06:15:54
View on GitHub
Comments
11
Participants
2
Timeline
27
Reactions
0
Timeline (top)
cross-referenced ×12commented ×11mentioned ×2subscribed ×2

Fix Action

Fixed

PR fix notes

PR #64241: agents: add strict-agentic execution contract and revise update_plan semantics

Description (problem / solution / changelog)

Summary

This is PR 1 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64228.

It adds an opt-in strict-agentic execution contract for embedded Pi agents and revises update_plan so it behaves like structured progress state instead of user-visible filler. The runtime now stops treating plan-only turns as acceptable completion in strict-agentic mode and fails closed with an explicit blocked response after the retry cap.

PR 1 is intentionally GPT-5-first: in this slice, strict-agentic only activates for embedded Pi openai and openai-codex GPT-5-family runs. Unsupported providers/models keep default behavior unless tools.experimental.planTool is explicitly enabled.

What changed

  • add agents.defaults.embeddedPi.executionContract plus per-agent override support
  • resolve strict-agentic behavior through explicit-agent-aware lookup so no-session-key / hook / cron-style flows use the right agent config
  • remove default OpenAI/Codex auto-enable for update_plan
  • auto-enable update_plan only for explicit tools.experimental.planTool or supported strict-agentic GPT-5 runs
  • make update_plan non-chatty and tolerant of extra step fields
  • honor planTool: false even when strict-agentic is configured
  • treat update_plan as non-progress in plan-only retry logic
  • detect both prose plans and structured bullet plans before retrying or failing closed
  • give supported strict-agentic runs two plan-only retries before returning an explicit blocked state
  • keep the slice embedded-Pi-only rather than broadening into a provider-agnostic execution-contract framework

Why

GPT-5.4 / Codex currently stalls too easily after planning or recap-style turns. This slice makes that behavior opt-in fixable at the runtime-contract level without changing the default execution mode for every agent.

Non-goals

  • does not supersede #62989
  • does not implement #38780
  • does not address harness/plugin scope from #63452
  • embedded-Pi-only slice

Builds on prior groundwork

  • #38736
  • #37558
  • #55535
  • #57166
  • #58299

Validation

Focused checks run:

  • CI=1 pnpm vitest run src/agents/openclaw-tools.update-plan.test.ts
  • CI=1 pnpm vitest run src/agents/tools/update-plan-tool.test.ts
  • CI=1 pnpm vitest run src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
  • CI=1 pnpm vitest run src/config/zod-schema.agent-defaults.test.ts
  • CI=1 pnpm vitest run src/agents/system-prompt.test.ts
  • CI=1 pnpm vitest run src/agents/tool-catalog.test.ts
  • CI=1 pnpm vitest run src/agents/openclaw-tools.sessions.test.ts
  • CI=1 pnpm vitest run src/agents/openclaw-tools.nodes-workspace-guard.test.ts
  • CI=1 pnpm vitest run src/agents/pi-embedded-runner/system-prompt.test.ts
  • CI=1 pnpm vitest run extensions/openai/transport-policy.test.ts
  • CI=1 pnpm vitest run src/plugin-sdk/provider-tools.test.ts

Linked issues

  • Closes #64228
  • Refs #64227
  • Refs #62854
  • Refs #47213
  • Refs #62989

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/config-baseline.sha256 (modified, +3/-3)
  • docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
  • docs/gateway/configuration-reference.md (modified, +2/-2)
  • src/agents/agent-scope.ts (modified, +37/-0)
  • src/agents/openclaw-tools.registration.ts (modified, +16/-12)
  • src/agents/openclaw-tools.ts (modified, +12/-3)
  • src/agents/openclaw-tools.update-plan.test.ts (modified, +165/-11)
  • src/agents/pi-embedded-runner/run.incomplete-turn.test.ts (modified, +118/-0)
  • src/agents/pi-embedded-runner/run.ts (modified, +54/-3)
  • src/agents/pi-embedded-runner/run/incomplete-turn.ts (modified, +39/-3)
  • src/agents/pi-tools.ts (modified, +1/-0)
  • src/agents/tools/update-plan-tool.test.ts (modified, +26/-1)
  • src/agents/tools/update-plan-tool.ts (modified, +12/-7)
  • src/config/schema.base.generated.ts (modified, +56/-2)
  • src/config/schema.help.ts (modified, +7/-1)
  • src/config/schema.labels.ts (modified, +3/-0)
  • src/config/types.agent-defaults.ts (modified, +7/-0)
  • src/config/types.agents.ts (modified, +6/-1)
  • src/config/types.tools.ts (modified, +2/-2)
  • src/config/zod-schema.agent-defaults.test.ts (modified, +9/-0)
  • src/config/zod-schema.agent-defaults.ts (modified, +1/-0)
  • src/config/zod-schema.agent-runtime.ts (modified, +6/-0)

PR #64286: openai-codex: fix auth scope handling and classify provider/runtime failures

Description (problem / solution / changelog)

Summary

This is PR 2 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64229.

It fixes the maintained-source OpenAI Codex OAuth scope gap in OpenClaw's login wrapper and adds a separate provider/runtime failure taxonomy that makes auth-scope, refresh, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid failures observable in logs and easier to explain to users.

What changed

  • normalize OpenAI Codex authorize URLs so the required scopes are always present:
    • openid
    • profile
    • email
    • offline_access
    • model.request
    • api.responses.write
  • add classifyProviderRuntimeFailureKind(...) as a typed provider/runtime failure classifier
  • keep the older failover-reason contract intact instead of widening it in this slice
  • thread providerRuntimeFailureKind through embedded-run observation fields and lifecycle logging
  • surface more truthful user-facing copy for:
    • OAuth refresh failures
    • missing OpenAI Codex scopes
    • HTML 403 auth failures
    • proxy/tunnel misroutes
    • replay-invalid failures
  • add focused regressions for scope failures, refresh failures, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid paths

Why

GPT-5.4 / Codex failures in OpenClaw are still too easy to misdiagnose as generic model stops. This slice makes the auth/runtime layer tell the truth before we move on to tool-contract and parity-harness work.

Non-goals

  • does not implement tool compatibility work from #64230
  • does not implement permission truthfulness work from #64231
  • does not implement replay/liveness hardening from #64232
  • does not implement the benchmark harness from #64233
  • does not widen the generic failover-reason enum for every caller in this slice

Builds on prior groundwork

  • #45176
  • #48592
  • #53702
  • #55206
  • #44019

Validation

Focused checks run:

  • CI=1 pnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts
  • repo hook gate during commit:
    • pnpm check:no-conflict-markers
    • pnpm tool-display:check
    • pnpm check:host-env-policy:swift
    • pnpm tsgo
    • node scripts/prepare-extension-package-boundary-artifacts.mjs
    • pnpm lint
    • pnpm lint:webhook:no-low-level-body-read
    • pnpm lint:auth:no-pairing-store-group
    • pnpm lint:auth:pairing-account-scope

Linked issues

  • Closes #64229
  • Refs #64227
  • Refs #64133
  • Refs #64174
  • Refs #64092
  • Refs #57399
  • Refs #62672

Changed files

  • src/agents/failover-error.test.ts (modified, +10/-0)
  • src/agents/pi-embedded-error-observation.test.ts (modified, +14/-0)
  • src/agents/pi-embedded-error-observation.ts (modified, +23/-4)
  • src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts (modified, +67/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +79/-0)
  • src/agents/pi-embedded-helpers.ts (modified, +2/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +219/-4)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +22/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +16/-3)
  • src/commands/openai-codex-oauth.test.ts (modified, +28/-3)
  • src/plugins/provider-openai-codex-oauth.ts (modified, +40/-1)

PR #64300: agents: add OpenAI/Codex tool compatibility and replay/liveness state

Description (problem / solution / changelog)

Summary

  • keep the provider-owned OpenAI/Codex tool-compat layer via the existing provider hook surface
  • add replay/liveness state surfacing so long-running embedded runs stop disappearing silently
  • compact the original Contracts 2 and 5 into one execution-correctness PR in the GPT-5.4 / Codex parity program tracked by #64227

Scope

  • Refs #64230
  • Refs #64232
  • Refs #64227
  • combines provider-owned tool compatibility with replay/liveness hardening
  • no auth / permission truthfulness changes in this PR
  • no self-elected continuation scope from #38780
  • no benchmark harness work from #64233

What changed

  • add an openai tool-compat family to buildProviderToolCompatFamilyHooks(...)
  • gate the family to native OpenAI/OpenAI Codex response routes only
  • normalize provider-owned parameter-free and missing-object-shape tool schemas for strict OpenAI/Codex routes
  • surface provider-owned diagnostics for remaining strict-schema incompatibilities
  • attach the compat hooks in extensions/openai/index.ts so OpenAI and OpenAI Codex providers both expose them
  • add replay/liveness state to embedded run results and lifecycle surfaces
  • classify replay/liveness outcomes as observable working, paused, blocked, or abandoned states instead of silent disappearance
  • preserve replay-invalid truth across compaction retries after mutating tool side effects
  • add focused regressions for replay/liveness surfacing alongside the existing tool-compat coverage

Validation

  • pnpm build
  • CI=1 pnpm exec vitest run src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-subscribe.handlers.compaction.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts

Non-goals

  • does not supersede #64229 or #64231
  • does not add tool-name or argument aliases
  • does not change generic runner behavior outside provider-owned hooks and replay/liveness surfacing

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/openai/index.test.ts (modified, +78/-0)
  • extensions/openai/index.ts (modified, +3/-0)
  • src/agents/pi-embedded-runner/run.incomplete-turn.test.ts (modified, +43/-0)
  • src/agents/pi-embedded-runner/run.overflow-compaction.test.ts (modified, +23/-0)
  • src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/run.ts (modified, +80/-0)
  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts (modified, +6/-0)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +18/-5)
  • src/agents/pi-embedded-runner/run/incomplete-turn.ts (modified, +45/-0)
  • src/agents/pi-embedded-runner/run/retry-limit.ts (modified, +5/-0)
  • src/agents/pi-embedded-runner/run/types.ts (modified, +7/-0)
  • src/agents/pi-embedded-runner/types.ts (modified, +4/-0)
  • src/agents/pi-embedded-subscribe.handlers.compaction.ts (modified, +4/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +67/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +27/-1)
  • src/agents/pi-embedded-subscribe.handlers.tools.test.ts (modified, +92/-0)
  • src/agents/pi-embedded-subscribe.handlers.tools.ts (modified, +5/-0)
  • src/agents/pi-embedded-subscribe.handlers.types.ts (modified, +6/-0)
  • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts (modified, +38/-0)
  • src/agents/pi-embedded-subscribe.ts (modified, +21/-0)
  • src/agents/pi-embedded-subscribe.types.ts (modified, +2/-0)
  • src/auto-reply/reply/dispatch-from-config.ts (modified, +2/-2)
  • src/plugin-sdk/provider-tools.test.ts (modified, +244/-0)
  • src/plugin-sdk/provider-tools.ts (modified, +286/-1)
  • src/plugins/contracts/provider-family-plugin-tests.test.ts (modified, +1/-0)

PR #64332: agents: make elevated full truthful and emit explicit full-access hints

Description (problem / solution / changelog)

Summary

  • make /elevated full truth-surfacing explicit in embedded sandbox metadata
  • stop advertising auto-approved full access when it is unavailable for the current runtime
  • tell the agent not to suggest /elevated full when the session cannot provide it

Refs #64231 Part of #64227

What changed

  • extends embedded elevated metadata with fullAccessAvailable and fullAccessBlockedReason
  • adds resolveEmbeddedFullAccessState(...) so the truth state is computed once and reused
  • updates the embedded system prompt to advertise /elevated full only when auto-approved host exec is actually available
  • updates the current exec session hint to call out unavailable full access and steer the model toward ask / on
  • adds focused regression coverage for unavailable-full prompt and sandbox-info behavior

Non-goals

  • no new permission system
  • no exec enforcement rewrite in bash-tools.exec
  • no replay / continuation changes
  • no auth or tool-compat scope

Validation

  • direct targeted assertions on the new sandbox/prompt path using node --import tsx
  • added focused tests in:
    • src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts
    • src/agents/system-prompt.test.ts
    • src/auto-reply/reply/get-reply-run.exec-hint.test.ts

Notes

  • this stays intentionally narrow to the truth-surfacing slice for #64231
  • existing blocked/unavailable exec enforcement remains in the current runtime path; this PR makes the agent-facing contract match reality sooner

Changed files

  • src/agents/bash-tools.exec-types.ts (modified, +3/-0)
  • src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts (modified, +34/-1)
  • src/agents/pi-embedded-runner/sandbox-info.ts (modified, +40/-3)
  • src/agents/pi-embedded-runner/types.ts (modified, +4/-0)
  • src/agents/system-prompt.test.ts (modified, +30/-1)
  • src/agents/system-prompt.ts (modified, +46/-7)
  • src/auto-reply/reply/get-reply-run.exec-hint.test.ts (modified, +13/-0)
  • src/auto-reply/reply/get-reply-run.ts (modified, +30/-1)

PR #64439: openai-codex: classify runtime failures and make full access truthful

Description (problem / solution / changelog)

Summary

This is the compact runtime-truthfulness slice of the GPT-5.4 / Codex parity program tracked in #64227.

It combines the original Contract 1 auth/runtime truthfulness work from #64229 with the Contract 4 permission truthfulness work from #64231, so OpenClaw tells the truth about both provider/runtime failures and whether /elevated full is actually available.

Scope

  • Closes #64229
  • Closes #64231
  • Refs #64227
  • combines auth/runtime failure classification with truthful full-access surfacing
  • no tool-compat or replay/liveness scope in this PR
  • no benchmark harness scope in this PR

What changed

  • normalize OpenAI Codex authorize URLs so the required scopes are always present:
    • openid
    • profile
    • email
    • offline_access
    • model.request
    • api.responses.write
  • add typed provider/runtime failure classification for:
    • auth_scope
    • auth_refresh
    • auth_html_403
    • proxy
    • dns
    • timeout
    • schema
    • sandbox_blocked
    • replay_invalid
    • unknown
  • thread providerRuntimeFailureKind through embedded-run observation fields and lifecycle logging
  • surface more truthful user-facing copy for scope failures, refresh failures, HTML 403 auth failures, proxy/tunnel misroutes, and replay-invalid failures
  • extend embedded elevated metadata with fullAccessAvailable and fullAccessBlockedReason
  • advertise /elevated full only when auto-approved host exec is actually available for the current runtime
  • update current exec hints so unavailable full access is explained precisely instead of being suggested as if it were always possible

Validation

  • full repo check stack completed while landing the combined branch commits
  • pnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts src/agents/system-prompt.test.ts src/auto-reply/reply/get-reply-run.exec-hint.test.ts

Non-goals

  • does not supersede #64230 or #64232
  • does not widen the generic failover-reason enum for every caller in this slice
  • does not introduce a new permission system
  • does not change exec enforcement in bash-tools.exec

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/qa-lab/src/live-transports/telegram/telegram-live.runtime.test.ts (modified, +6/-2)
  • src/agents/bash-tools.exec-types.ts (modified, +3/-0)
  • src/agents/pi-embedded-error-observation.test.ts (modified, +14/-0)
  • src/agents/pi-embedded-error-observation.ts (modified, +23/-4)
  • src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts (modified, +71/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +84/-0)
  • src/agents/pi-embedded-helpers.ts (modified, +2/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +224/-4)
  • src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts (modified, +65/-1)
  • src/agents/pi-embedded-runner/sandbox-info.ts (modified, +33/-3)
  • src/agents/pi-embedded-runner/types.ts (modified, +4/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +22/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +6/-2)
  • src/agents/system-prompt.test.ts (modified, +30/-1)
  • src/agents/system-prompt.ts (modified, +46/-7)
  • src/auto-reply/reply.directive.directive-behavior.defaults-think-low-reasoning-capable-models-no.test.ts (modified, +2/-1)
  • src/auto-reply/reply/commands-system-prompt.test.ts (modified, +34/-0)
  • src/auto-reply/reply/commands-system-prompt.ts (modified, +12/-0)
  • src/auto-reply/reply/get-reply-run.exec-hint.test.ts (modified, +13/-0)
  • src/auto-reply/reply/get-reply-run.media-only.test.ts (modified, +8/-4)
  • src/auto-reply/reply/get-reply-run.ts (modified, +25/-1)
  • src/commands/openai-codex-oauth.test.ts (modified, +53/-3)
  • src/media/base64.ts (modified, +32/-3)
  • src/plugins/provider-openai-codex-oauth.test.ts (added, +24/-0)
  • src/plugins/provider-openai-codex-oauth.ts (modified, +47/-1)
RAW_BUFFERClick to expand / collapse

OpenClaw GPT-5.4 / Codex Parity Program

This tracker is the source of truth for the GPT-5.4 / Codex parity program.

Execution model

The original product architecture is still the same 6-contract design:

  1. provider transport/auth correctness
  2. tool contract/schema compatibility
  3. same-turn execution correctness
  4. permission truthfulness
  5. replay/continuation/liveness correctness
  6. benchmark/release gate

Execution is compacted into 4 real upstream PRs for reviewability and active-PR-limit pressure. The 6 child issues remain the product map; the 4 PRs are the merge units.

Live merge units

  • PR A — strict-agentic execution: #64241
    • owns Contract 3
  • PR B — runtime truthfulness: #64439
    • owns Contracts 1 and 4
  • PR C — execution correctness: #64300
    • owns Contracts 2 and 5
  • PR D — parity harness / release gate: #64441
    • owns Contract 6

Mapping back to the original child issues

  • #64228 -> PR A (#64241)
  • #64229 + #64231 -> PR B (#64439)
  • #64230 + #64232 -> PR C (#64300)
  • #64233 -> PR D (#64441)

Merge order

  1. PR A (#64241)
  2. PR B (#64439)
  3. PR C (#64300)
  4. PR D (#64441)

PR D is the proof layer. It should not block review of runtime-correctness slices.

Completion gate

The project is only complete when all of these are true:

  • PR A, PR B, and PR C are merged and stable
  • PR D is merged and produces a readable parity report plus a machine-readable verdict
  • GPT-5.4 no longer stalls after planning
  • GPT-5.4 no longer fakes progress or fake tool completion
  • GPT-5.4 no longer gives false /elevated full guidance
  • replay/liveness failures are surfaced as explicit states, not silent disappearance
  • the parity gate shows GPT-5.4 matches or beats Opus 4.6 on the agreed metrics

extent analysis

TL;DR

To complete the GPT-5.4 / Codex parity program, ensure all four PRs (A, B, C, and D) are merged and stable, and verify that GPT-5.4 meets the specified completion criteria.

Guidance

  • Review and merge PRs in the specified order (A, B, C, and then D) to maintain dependency and correctness.
  • Verify that each PR is stable before proceeding to the next one, especially ensuring that PR A, B, and C are merged and stable before finalizing PR D.
  • After merging all PRs, check that GPT-5.4 no longer exhibits issues like stalling, faking progress, or providing false guidance, and that replay/liveness failures are properly surfaced.
  • Use the parity gate to confirm that GPT-5.4 matches or beats Opus 4.6 on the agreed metrics, indicating successful completion of the parity program.

Notes

The completion of the project depends on the successful merge and stability of all four PRs, as well as the fulfillment of specific behavioral and performance criteria for GPT-5.4.

Recommendation

Apply the workaround by following the specified merge order and verifying each PR's stability before proceeding, as this approach ensures that dependencies are correctly managed and that the final parity report and verdict from PR D are reliable.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING