openclaw - ✅(Solved) Fix GPT-5.4 / Codex agentic runtime parity in OpenClaw [5 pull requests, 11 comments, 2 participants]

100yenadmin · 2026-04-10T09:54:53Z

[openclaw] PR 64241: agents: add strict-agentic execution contract and revise update plan semantics - Repository: openclaw/openclaw - Author: 100yenadmin - Sta… # PR #64241: agents: add strict-agentic execution contract and revise update_plan semantics - Repository: openclaw/openclaw - Author: 100yenadmin - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/64241 ## Description (problem / solution / changelog) ## Summary This is PR 1 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64228. It adds an opt-in `strict-agentic` execution contract for embedded Pi agents and revises `update_plan` so it behaves like structured progress state instead of user-visible filler. The runtime now stops treating plan-only turns as acceptable completion in strict-agentic mode and fails closed with an explicit blocked response after the retry cap. PR 1 is intentionally GPT-5-first: in this slice, `strict-agentic` only activates for embedded Pi `openai` and `openai-codex` GPT-5-family runs. Unsupported providers/models keep default behavior unless `tools.experimental.planTool` is explicitly enabled. ## What changed - add `agents.defaults.embeddedPi.executionContract` plus per-agent override support - resolve strict-agentic behavior through explicit-agent-aware lookup so no-session-key / hook / cron-style flows use the right agent config - remove default OpenAI/Codex auto-enable for `update_plan` - auto-enable `update_plan` only for explicit `tools.experimental.planTool` or supported strict-agentic GPT-5 runs - make `update_plan` non-chatty and tolerant of extra step fields - honor `planTool: false` even when strict-agentic is configured - treat `update_plan` as non-progress in plan-only retry logic - detect both prose plans and structured bullet plans before retrying or failing closed - give supported strict-agentic runs two plan-only retries before returning an explicit blocked state - keep the slice embedded-Pi-only rather than broadening into a provider-agnostic execution-contract framework ## Why GPT-5.4 / Codex currently stalls too easily after planning or recap-style turns. This slice makes that behavior opt-in fixable at the runtime-contract level without changing the default execution mode for every agent. ## Non-goals - does not supersede #62989 - does not implement #38780 - does not address harness/plugin scope from #63452 - embedded-Pi-only slice ## Builds on prior groundwork - #38736 - #37558 - #55535 - #57166 - #58299 ## Validation Focused checks run: - `CI=1 pnpm vitest run src/agents/openclaw-tools.update-plan.test.ts` - `CI=1 pnpm vitest run src/agents/tools/update-plan-tool.test.ts` - `CI=1 pnpm vitest run src/agents/pi-embedded-runner/run.incomplete-turn.test.ts` - `CI=1 pnpm vitest run src/config/zod-schema.agent-defaults.test.ts` - `CI=1 pnpm vitest run src/agents/system-prompt.test.ts` - `CI=1 pnpm vitest run src/agents/tool-catalog.test.ts` - `CI=1 pnpm vitest run src/agents/openclaw-tools.sessions.test.ts` - `CI=1 pnpm vitest run src/agents/openclaw-tools.nodes-workspace-guard.test.ts` - `CI=1 pnpm vitest run src/agents/pi-embedded-runner/system-prompt.test.ts` - `CI=1 pnpm vitest run extensions/openai/transport-policy.test.ts` - `CI=1 pnpm vitest run src/plugin-sdk/provider-tools.test.ts` ## Linked issues - Closes #64228 - Refs #64227 - Refs #62854 - Refs #47213 - Refs #62989 ## Changed files - `CHANGELOG.md` (modified, +1/-0) - `docs/.generated/config-baseline.sha256` (modified, +3/-3) - `docs/.generated/plugin-sdk-api-baseline.sha256` (modified, +2/-2) - `docs/gateway/configuration-reference.md` (modified, +2/-2) - `src/agents/agent-scope.ts` (modified, +37/-0) - `src/agents/openclaw-tools.registration.ts` (modified, +16/-12) - `src/agents/openclaw-tools.ts` (modified, +12/-3) - `src/agents/openclaw-tools.update-plan.test.ts` (modified, +165/-11) - `src/agents/pi-embedded-runner/run.incomplete-turn.test.ts` (modified, +118/-0) - `src/agents/pi-embedded-runner/run.ts` (modified, +54/-3) - `src/agents/pi-embedded-runner/run/incomplete-turn.ts` (modified, +39/-3) - `src/agents/pi-tools.ts` (modified, +1/-0) - `src/agents/tools/update-plan-tool.test.ts` (modified, +26/-1) - `src/agents/tools/update-plan-tool.ts` (modified, +12/-7) - `src/config/schema.base.generated.ts` (modified, +56/-2) - `src/config/schema.help.ts` (modified, +7/-1) - `src/config/schema.labels.ts` (modified, +3/-0) - `src/config/types.agent-defaults.ts` (modified, +7/-0) - `src/config/types.agents.ts` (modified, +6/-1) - `src/config/types.tools.ts` (modified, +2/-2) - `src/config/zod-schema.agent-defaults.test.ts` (modified, +9/-0) - `src/config/zod-schema.agent-defaults.ts` (modified, +1/-0) - `src/config/zod-schema.agent-runtime.ts` (modified, +6/-0) --- # PR #64286: openai-codex: fix auth scope handling and classify provider/runtime failures - Repository: openclaw/openclaw - Author: 100yenadmin - State: closed | merged: False - Link: https://github.com/openclaw/opencl

openclaw2026-04-10 09:54:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#64227•Fetched 2026-04-11 06:15:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

100yenadmin

Participants

100yenadmin

steipete

Timeline (top)

cross-referenced ×12commented ×11mentioned ×2subscribed ×2

Fix Action

Fixed

Fixed by PR: agents: add strict-agentic execution contract and revise update_plan semantics (https://github.com/openclaw/openclaw/pull/64241)
Fixed by PR: openai-codex: fix auth scope handling and classify provider/runtime failures (https://github.com/openclaw/openclaw/pull/64286)
Fixed by PR: agents: add OpenAI/Codex tool compatibility and replay/liveness state (https://github.com/openclaw/openclaw/pull/64300)
Fixed by PR: agents: make elevated full truthful and emit explicit full-access hints (https://github.com/openclaw/openclaw/pull/64332)
Fixed by PR: openai-codex: classify runtime failures and make full access truthful (https://github.com/openclaw/openclaw/pull/64439)

PR fix notes

PR #64241: agents: add strict-agentic execution contract and revise update_plan semantics

Repository: openclaw/openclaw
Author: 100yenadmin
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/64241

Description (problem / solution / changelog)

Summary

This is PR 1 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64228.

It adds an opt-in strict-agentic execution contract for embedded Pi agents and revises update_plan so it behaves like structured progress state instead of user-visible filler. The runtime now stops treating plan-only turns as acceptable completion in strict-agentic mode and fails closed with an explicit blocked response after the retry cap.

PR 1 is intentionally GPT-5-first: in this slice, strict-agentic only activates for embedded Pi openai and openai-codex GPT-5-family runs. Unsupported providers/models keep default behavior unless tools.experimental.planTool is explicitly enabled.

What changed

add agents.defaults.embeddedPi.executionContract plus per-agent override support
resolve strict-agentic behavior through explicit-agent-aware lookup so no-session-key / hook / cron-style flows use the right agent config
remove default OpenAI/Codex auto-enable for update_plan
auto-enable update_plan only for explicit tools.experimental.planTool or supported strict-agentic GPT-5 runs
make update_plan non-chatty and tolerant of extra step fields
honor planTool: false even when strict-agentic is configured
treat update_plan as non-progress in plan-only retry logic
detect both prose plans and structured bullet plans before retrying or failing closed
give supported strict-agentic runs two plan-only retries before returning an explicit blocked state
keep the slice embedded-Pi-only rather than broadening into a provider-agnostic execution-contract framework

Why

GPT-5.4 / Codex currently stalls too easily after planning or recap-style turns. This slice makes that behavior opt-in fixable at the runtime-contract level without changing the default execution mode for every agent.

Non-goals

does not supersede #62989
does not implement #38780
does not address harness/plugin scope from #63452
embedded-Pi-only slice

Builds on prior groundwork

#38736
#37558
#55535
#57166
#58299

Validation

Focused checks run:

CI=1 pnpm vitest run src/agents/openclaw-tools.update-plan.test.ts
CI=1 pnpm vitest run src/agents/tools/update-plan-tool.test.ts
CI=1 pnpm vitest run src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
CI=1 pnpm vitest run src/config/zod-schema.agent-defaults.test.ts
CI=1 pnpm vitest run src/agents/system-prompt.test.ts
CI=1 pnpm vitest run src/agents/tool-catalog.test.ts
CI=1 pnpm vitest run src/agents/openclaw-tools.sessions.test.ts
CI=1 pnpm vitest run src/agents/openclaw-tools.nodes-workspace-guard.test.ts
CI=1 pnpm vitest run src/agents/pi-embedded-runner/system-prompt.test.ts
CI=1 pnpm vitest run extensions/openai/transport-policy.test.ts
CI=1 pnpm vitest run src/plugin-sdk/provider-tools.test.ts

Linked issues

Closes #64228
Refs #64227
Refs #62854
Refs #47213
Refs #62989

Changed files

CHANGELOG.md (modified, +1/-0)
docs/.generated/config-baseline.sha256 (modified, +3/-3)
docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
docs/gateway/configuration-reference.md (modified, +2/-2)
src/agents/agent-scope.ts (modified, +37/-0)
src/agents/openclaw-tools.registration.ts (modified, +16/-12)
src/agents/openclaw-tools.ts (modified, +12/-3)
src/agents/openclaw-tools.update-plan.test.ts (modified, +165/-11)
src/agents/pi-embedded-runner/run.incomplete-turn.test.ts (modified, +118/-0)
src/agents/pi-embedded-runner/run.ts (modified, +54/-3)
src/agents/pi-embedded-runner/run/incomplete-turn.ts (modified, +39/-3)
src/agents/pi-tools.ts (modified, +1/-0)
src/agents/tools/update-plan-tool.test.ts (modified, +26/-1)
src/agents/tools/update-plan-tool.ts (modified, +12/-7)
src/config/schema.base.generated.ts (modified, +56/-2)
src/config/schema.help.ts (modified, +7/-1)
src/config/schema.labels.ts (modified, +3/-0)
src/config/types.agent-defaults.ts (modified, +7/-0)
src/config/types.agents.ts (modified, +6/-1)
src/config/types.tools.ts (modified, +2/-2)
src/config/zod-schema.agent-defaults.test.ts (modified, +9/-0)
src/config/zod-schema.agent-defaults.ts (modified, +1/-0)
src/config/zod-schema.agent-runtime.ts (modified, +6/-0)

PR #64286: openai-codex: fix auth scope handling and classify provider/runtime failures

Repository: openclaw/openclaw
Author: 100yenadmin
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/64286

Description (problem / solution / changelog)

Summary

This is PR 2 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64229.

It fixes the maintained-source OpenAI Codex OAuth scope gap in OpenClaw's login wrapper and adds a separate provider/runtime failure taxonomy that makes auth-scope, refresh, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid failures observable in logs and easier to explain to users.

What changed

normalize OpenAI Codex authorize URLs so the required scopes are always present:
- openid
- profile
- email
- offline_access
- model.request
- api.responses.write
add classifyProviderRuntimeFailureKind(...) as a typed provider/runtime failure classifier
keep the older failover-reason contract intact instead of widening it in this slice
thread providerRuntimeFailureKind through embedded-run observation fields and lifecycle logging
surface more truthful user-facing copy for:
- OAuth refresh failures
- missing OpenAI Codex scopes
- HTML 403 auth failures
- proxy/tunnel misroutes
- replay-invalid failures
add focused regressions for scope failures, refresh failures, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid paths

Why

GPT-5.4 / Codex failures in OpenClaw are still too easy to misdiagnose as generic model stops. This slice makes the auth/runtime layer tell the truth before we move on to tool-contract and parity-harness work.

Non-goals

does not implement tool compatibility work from #64230
does not implement permission truthfulness work from #64231
does not implement replay/liveness hardening from #64232
does not implement the benchmark harness from #64233
does not widen the generic failover-reason enum for every caller in this slice

Builds on prior groundwork

#45176
#48592
#53702
#55206
#44019

Validation

Focused checks run:

CI=1 pnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts
repo hook gate during commit:
- pnpm check:no-conflict-markers
- pnpm tool-display:check
- pnpm check:host-env-policy:swift
- pnpm tsgo
- node scripts/prepare-extension-package-boundary-artifacts.mjs
- pnpm lint
- pnpm lint:webhook:no-low-level-body-read
- pnpm lint:auth:no-pairing-store-group
- pnpm lint:auth:pairing-account-scope

Linked issues

Closes #64229
Refs #64227
Refs #64133
Refs #64174
Refs #64092
Refs #57399
Refs #62672

Changed files

src/agents/failover-error.test.ts (modified, +10/-0)
src/agents/pi-embedded-error-observation.test.ts (modified, +14/-0)
src/agents/pi-embedded-error-observation.ts (modified, +23/-4)
src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts (modified, +67/-0)
src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +79/-0)
src/agents/pi-embedded-helpers.ts (modified, +2/-0)
src/agents/pi-embedded-helpers/errors.ts (modified, +219/-4)
src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +22/-0)
src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +16/-3)
src/commands/openai-codex-oauth.test.ts (modified, +28/-3)
src/plugins/provider-openai-codex-oauth.ts (modified, +40/-1)

PR #64300: agents: add OpenAI/Codex tool compatibility and replay/liveness state

Repository: openclaw/openclaw
Author: 100yenadmin
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/64300

Description (problem / solution / changelog)

Summary

keep the provider-owned OpenAI/Codex tool-compat layer via the existing provider hook surface
add replay/liveness state surfacing so long-running embedded runs stop disappearing silently
compact the original Contracts 2 and 5 into one execution-correctness PR in the GPT-5.4 / Codex parity program tracked by #64227

Scope

Refs #64230
Refs #64232
Refs #64227
combines provider-owned tool compatibility with replay/liveness hardening
no auth / permission truthfulness changes in this PR
no self-elected continuation scope from #38780
no benchmark harness work from #64233

What changed

add an openai tool-compat family to buildProviderToolCompatFamilyHooks(...)
gate the family to native OpenAI/OpenAI Codex response routes only
normalize provider-owned parameter-free and missing-object-shape tool schemas for strict OpenAI/Codex routes
surface provider-owned diagnostics for remaining strict-schema incompatibilities
attach the compat hooks in extensions/openai/index.ts so OpenAI and OpenAI Codex providers both expose them
add replay/liveness state to embedded run results and lifecycle surfaces
classify replay/liveness outcomes as observable working, paused, blocked, or abandoned states instead of silent disappearance
preserve replay-invalid truth across compaction retries after mutating tool side effects
add focused regressions for replay/liveness surfacing alongside the existing tool-compat coverage

Validation

pnpm build
CI=1 pnpm exec vitest run src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-subscribe.handlers.compaction.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts

Non-goals

does not supersede #64229 or #64231
does not add tool-name or argument aliases
does not change generic runner behavior outside provider-owned hooks and replay/liveness surfacing

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/openai/index.test.ts (modified, +78/-0)
extensions/openai/index.ts (modified, +3/-0)
src/agents/pi-embedded-runner/run.incomplete-turn.test.ts (modified, +43/-0)
src/agents/pi-embedded-runner/run.overflow-compaction.test.ts (modified, +23/-0)
src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts (modified, +1/-0)
src/agents/pi-embedded-runner/run.ts (modified, +80/-0)
src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts (modified, +6/-0)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +18/-5)
src/agents/pi-embedded-runner/run/incomplete-turn.ts (modified, +45/-0)
src/agents/pi-embedded-runner/run/retry-limit.ts (modified, +5/-0)
src/agents/pi-embedded-runner/run/types.ts (modified, +7/-0)
src/agents/pi-embedded-runner/types.ts (modified, +4/-0)
src/agents/pi-embedded-subscribe.handlers.compaction.ts (modified, +4/-0)
src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +67/-0)
src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +27/-1)
src/agents/pi-embedded-subscribe.handlers.tools.test.ts (modified, +92/-0)
src/agents/pi-embedded-subscribe.handlers.tools.ts (modified, +5/-0)
src/agents/pi-embedded-subscribe.handlers.types.ts (modified, +6/-0)
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts (modified, +38/-0)
src/agents/pi-embedded-subscribe.ts (modified, +21/-0)
src/agents/pi-embedded-subscribe.types.ts (modified, +2/-0)
src/auto-reply/reply/dispatch-from-config.ts (modified, +2/-2)
src/plugin-sdk/provider-tools.test.ts (modified, +244/-0)
src/plugin-sdk/provider-tools.ts (modified, +286/-1)
src/plugins/contracts/provider-family-plugin-tests.test.ts (modified, +1/-0)

PR #64332: agents: make elevated full truthful and emit explicit full-access hints

Repository: openclaw/openclaw
Author: 100yenadmin
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/64332

Description (problem / solution / changelog)

Summary

make /elevated full truth-surfacing explicit in embedded sandbox metadata
stop advertising auto-approved full access when it is unavailable for the current runtime
tell the agent not to suggest /elevated full when the session cannot provide it

Refs #64231 Part of #64227

What changed

extends embedded elevated metadata with fullAccessAvailable and fullAccessBlockedReason
adds resolveEmbeddedFullAccessState(...) so the truth state is computed once and reused
updates the embedded system prompt to advertise /elevated full only when auto-approved host exec is actually available
updates the current exec session hint to call out unavailable full access and steer the model toward ask / on
adds focused regression coverage for unavailable-full prompt and sandbox-info behavior

Non-goals

no new permission system
no exec enforcement rewrite in bash-tools.exec
no replay / continuation changes
no auth or tool-compat scope

Validation

direct targeted assertions on the new sandbox/prompt path using node --import tsx
added focused tests in:
- src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts
- src/agents/system-prompt.test.ts
- src/auto-reply/reply/get-reply-run.exec-hint.test.ts

Notes

this stays intentionally narrow to the truth-surfacing slice for #64231
existing blocked/unavailable exec enforcement remains in the current runtime path; this PR makes the agent-facing contract match reality sooner

Changed files

src/agents/bash-tools.exec-types.ts (modified, +3/-0)
src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts (modified, +34/-1)
src/agents/pi-embedded-runner/sandbox-info.ts (modified, +40/-3)
src/agents/pi-embedded-runner/types.ts (modified, +4/-0)
src/agents/system-prompt.test.ts (modified, +30/-1)
src/agents/system-prompt.ts (modified, +46/-7)
src/auto-reply/reply/get-reply-run.exec-hint.test.ts (modified, +13/-0)
src/auto-reply/reply/get-reply-run.ts (modified, +30/-1)

PR #64439: openai-codex: classify runtime failures and make full access truthful

Repository: openclaw/openclaw
Author: 100yenadmin
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/64439

Description (problem / solution / changelog)

Summary

This is the compact runtime-truthfulness slice of the GPT-5.4 / Codex parity program tracked in #64227.

It combines the original Contract 1 auth/runtime truthfulness work from #64229 with the Contract 4 permission truthfulness work from #64231, so OpenClaw tells the truth about both provider/runtime failures and whether /elevated full is actually available.

Scope

Closes #64229
Closes #64231
Refs #64227
combines auth/runtime failure classification with truthful full-access surfacing
no tool-compat or replay/liveness scope in this PR
no benchmark harness scope in this PR

What changed

normalize OpenAI Codex authorize URLs so the required scopes are always present:
- openid
- profile
- email
- offline_access
- model.request
- api.responses.write
add typed provider/runtime failure classification for:
- auth_scope
- auth_refresh
- auth_html_403
- proxy
- dns
- timeout
- schema
- sandbox_blocked
- replay_invalid
- unknown
thread providerRuntimeFailureKind through embedded-run observation fields and lifecycle logging
surface more truthful user-facing copy for scope failures, refresh failures, HTML 403 auth failures, proxy/tunnel misroutes, and replay-invalid failures
extend embedded elevated metadata with fullAccessAvailable and fullAccessBlockedReason
advertise /elevated full only when auto-approved host exec is actually available for the current runtime
update current exec hints so unavailable full access is explained precisely instead of being suggested as if it were always possible

Validation

full repo check stack completed while landing the combined branch commits
pnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts src/agents/system-prompt.test.ts src/auto-reply/reply/get-reply-run.exec-hint.test.ts

Non-goals

does not supersede #64230 or #64232
does not widen the generic failover-reason enum for every caller in this slice
does not introduce a new permission system
does not change exec enforcement in bash-tools.exec

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/qa-lab/src/live-transports/telegram/telegram-live.runtime.test.ts (modified, +6/-2)
src/agents/bash-tools.exec-types.ts (modified, +3/-0)
src/agents/pi-embedded-error-observation.test.ts (modified, +14/-0)
src/agents/pi-embedded-error-observation.ts (modified, +23/-4)
src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts (modified, +71/-0)
src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +84/-0)
src/agents/pi-embedded-helpers.ts (modified, +2/-0)
src/agents/pi-embedded-helpers/errors.ts (modified, +224/-4)
src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts (modified, +65/-1)
src/agents/pi-embedded-runner/sandbox-info.ts (modified, +33/-3)
src/agents/pi-embedded-runner/types.ts (modified, +4/-0)
src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +22/-0)
src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +6/-2)
src/agents/system-prompt.test.ts (modified, +30/-1)
src/agents/system-prompt.ts (modified, +46/-7)
src/auto-reply/reply.directive.directive-behavior.defaults-think-low-reasoning-capable-models-no.test.ts (modified, +2/-1)
src/auto-reply/reply/commands-system-prompt.test.ts (modified, +34/-0)
src/auto-reply/reply/commands-system-prompt.ts (modified, +12/-0)
src/auto-reply/reply/get-reply-run.exec-hint.test.ts (modified, +13/-0)
src/auto-reply/reply/get-reply-run.media-only.test.ts (modified, +8/-4)
src/auto-reply/reply/get-reply-run.ts (modified, +25/-1)
src/commands/openai-codex-oauth.test.ts (modified, +53/-3)
src/media/base64.ts (modified, +32/-3)
src/plugins/provider-openai-codex-oauth.test.ts (added, +24/-0)
src/plugins/provider-openai-codex-oauth.ts (modified, +47/-1)

RAW_BUFFERClick to expand / collapse

OpenClaw GPT-5.4 / Codex Parity Program

This tracker is the source of truth for the GPT-5.4 / Codex parity program.

Execution model

The original product architecture is still the same 6-contract design:

provider transport/auth correctness
tool contract/schema compatibility
same-turn execution correctness
permission truthfulness
replay/continuation/liveness correctness
benchmark/release gate

Execution is compacted into 4 real upstream PRs for reviewability and active-PR-limit pressure. The 6 child issues remain the product map; the 4 PRs are the merge units.

Live merge units

PR A — strict-agentic execution: #64241
- owns Contract 3
PR B — runtime truthfulness: #64439
- owns Contracts 1 and 4
PR C — execution correctness: #64300
- owns Contracts 2 and 5
PR D — parity harness / release gate: #64441
- owns Contract 6

Mapping back to the original child issues

#64228 -> PR A (#64241)
#64229 + #64231 -> PR B (#64439)
#64230 + #64232 -> PR C (#64300)
#64233 -> PR D (#64441)

Merge order

PR A (#64241)
PR B (#64439)
PR C (#64300)
PR D (#64441)

PR D is the proof layer. It should not block review of runtime-correctness slices.

Completion gate

The project is only complete when all of these are true:

PR A, PR B, and PR C are merged and stable
PR D is merged and produces a readable parity report plus a machine-readable verdict
GPT-5.4 no longer stalls after planning
GPT-5.4 no longer fakes progress or fake tool completion
GPT-5.4 no longer gives false /elevated full guidance
replay/liveness failures are surfaced as explicit states, not silent disappearance
the parity gate shows GPT-5.4 matches or beats Opus 4.6 on the agreed metrics

extent analysis

TL;DR

To complete the GPT-5.4 / Codex parity program, ensure all four PRs (A, B, C, and D) are merged and stable, and verify that GPT-5.4 meets the specified completion criteria.

Guidance

Review and merge PRs in the specified order (A, B, C, and then D) to maintain dependency and correctness.
Verify that each PR is stable before proceeding to the next one, especially ensuring that PR A, B, and C are merged and stable before finalizing PR D.
After merging all PRs, check that GPT-5.4 no longer exhibits issues like stalling, faking progress, or providing false guidance, and that replay/liveness failures are properly surfaced.
Use the parity gate to confirm that GPT-5.4 matches or beats Opus 4.6 on the agreed metrics, indicating successful completion of the parity program.

Notes

The completion of the project depends on the successful merge and stability of all four PRs, as well as the fulfillment of specific behavioral and performance criteria for GPT-5.4.

Recommendation

Apply the workaround by following the specified merge order and verifying each PR's stability before proceeding, as this approach ensures that dependencies are correctly managed and that the final parity report and verdict from PR D are reliable.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#environment setup #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix GPT-5.4 / Codex agentic runtime parity in OpenClaw [5 pull requests, 11 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #64241: agents: add strict-agentic execution contract and revise update_plan semantics

Description (problem / solution / changelog)

Summary

What changed

Why

Non-goals

Builds on prior groundwork

Validation

Linked issues

Changed files

PR #64286: openai-codex: fix auth scope handling and classify provider/runtime failures

Description (problem / solution / changelog)

Summary

What changed

Why

Non-goals

Builds on prior groundwork

Validation

Linked issues

Changed files

PR #64300: agents: add OpenAI/Codex tool compatibility and replay/liveness state

Description (problem / solution / changelog)

Summary

Scope

What changed

Validation

Non-goals

Changed files

PR #64332: agents: make elevated full truthful and emit explicit full-access hints

Description (problem / solution / changelog)

Summary

What changed

Non-goals

Validation

Notes

Changed files

PR #64439: openai-codex: classify runtime failures and make full access truthful

Description (problem / solution / changelog)

Summary

Scope

What changed

Validation

Non-goals

Changed files

OpenClaw GPT-5.4 / Codex Parity Program

Execution model

Live merge units

Mapping back to the original child issues

Merge order

Completion gate

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING