openclaw - ✅(Solved) Fix [Bug]: thinkingLevel is silently cosmetic for openai/openai-codex Responses — createOpenAIThinkingLevelWrapper misses the existingReasoning === undefined branch [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70904Fetched 2026-04-24 10:38:03
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
cross-referenced ×3

createOpenAIThinkingLevelWrapper in proxy-stream-wrappers.ts never injects body.reasoning when pi-ai leaves payloadObj.reasoning as undefined, so thinkingDefault / thinkingLevel is silently cosmetic for openai/gpt-5.x and openai-codex/gpt-5.x (Responses and Codex-Responses APIs). Confirmed on 2026.4.22 via A/B runtime instrumentation.

Error Message

console.error([DBG] thinkingLevel=${thinkingLevel} existingReasoning=${JSON.stringify(existingReasoning)} model=${model?.provider}/${model?.api}); console.error([DBG] final=${JSON.stringify(payloadObj.reasoning)}); Minimal runtime A/B with temporary console.error inside the wrapper (full log lines above in "Actual behavior"). Same test session, same config, same message, only line 195 (if (existingReasoning === "none") vs if (existingReasoning === void 0 || existingReasoning === null || existingReasoning === "none")) toggled. Gateway restarted between toggles so the bundle re-imports.

  • Severity: High for a silent config failure. Users configure thinkingDefault: "high" (or "max", etc.) expecting stronger reasoning; the model runs with server-side default effort instead, and there is no user-visible error, log warning, or status indicator.

Root Cause

Because pi-ai (openai-codex-responses.js and openai-responses.js) only sets body.reasoning when options.reasoningEffort !== undefined, and OpenClaw does not pass reasoningEffort through to pi-ai, payloadObj.reasoning is undefined in every real run — and none of the three branches fires. body.reasoning reaches OpenAI as absent/unset, so server-side defaults apply regardless of user configuration.

Fix Action

Fix / Workaround

  1. On a fresh OpenClaw 2026.4.22 install (npm global, Ubuntu 24.04, Node 22.22.2), configure:
    {
      agents: {
        defaults: {
          thinkingDefault: "high",
          model: { primary: "openai-codex/gpt-5.5" }
        }
      }
    }
    (Same behavior reproduces with gpt-5.4, gpt-5.4-mini, and openai/gpt-5.x via API key — the wrapper path is the same.)
  2. Sign in with Codex OAuth (openclaw models auth login openai-codex).
  3. Start the gateway and send any message through the agent (Telegram or direct).
  4. Inspect the payload entering streamWithPayloadPatch inside createOpenAIThinkingLevelWrapper (line ~188 in the bundled proxy-stream-wrappers-*.js). payloadObj.reasoning is undefined, none of the three branches mutates it, and the HTTP request goes to chatgpt.com/backend-api/codex with no reasoning field.
  • Last known version where thinkingLevel was effective for this path: NOT_ENOUGH_INFO (the bug predates 2026.4.21 in our observation — we started patching this locally on 2026-04-23 after first catching it; haven't bisected older versions).
  • Proposed fix (three-line patch, same file, function createOpenAIThinkingLevelWrapper):
    - if (existingReasoning === "none") {
    + // Cover the common case where pi-ai leaves payloadObj.reasoning unset
    + // (options.reasoningEffort is undefined on the OpenClaw path, so pi-ai does not
    + // initialize body.reasoning before the wrapper runs).
    + if (existingReasoning === void 0 || existingReasoning === null || existingReasoning === "none") {
          payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) };
          return;
      }
    This aligns createOpenAIThinkingLevelWrapper with the semantics already shipped in normalizeProxyReasoningPayload.
  • Related surface: mapThinkingLevelToReasoningEffort("off") === "none""none" is already a string branch the wrapper handles, so adding undefined/null is a strict extension.
  • Affected bundled file on 2026.4.22 (npm global install): proxy-stream-wrappers-Cnx4hyoz.js. Function createOpenAIThinkingLevelWrapper starts at line 183.

PR fix notes

PR #70911: fix(pi-embedded): inject body.reasoning when payload has no prior reasoning (#70904)

Description (problem / solution / changelog)

Summary

Closes #70904.

`createOpenAIThinkingLevelWrapper` (`src/agents/pi-embedded-runner/openai-stream-wrappers.ts`) only reacted to three prior states of `payloadObj.reasoning`:

  1. `thinkingLevel === "off"` → delete
  2. `existingReasoning === "none"` → replace with `{ effort }`
  3. existing `{ effort }` object → mutate in place

pi-ai's Responses / Codex-Responses clients only set `body.reasoning` when `options.reasoningEffort` is threaded through, which OpenClaw does not do for this wrapper. So `payloadObj.reasoning` is `undefined` in every real run, none of the three branches fires, and `thinkingDefault` / `thinkingLevel` is silently cosmetic for `openai-codex/gpt-5.x` and `openai/gpt-5.x` — the server-side default applies regardless of user configuration.

The reporter confirmed via A/B runtime instrumentation on 2026.4.22.

Change

Add the missing `!existingReasoning` branch, mirroring `normalizeProxyReasoningPayload`'s already-shipped handling in `proxy-stream-wrappers.ts:43`. Still gated behind the existing `shouldApplyOpenAIReasoningCompatibility` check, so non-reasoning-capable models are untouched.

Test plan

  • Regression test added: `injects reasoning on reasoning-capable model when payload has no prior reasoning (#70904)`
  • `pnpm oxlint src/agents/pi-embedded-runner/openai-stream-wrappers.ts src/agents/pi-embedded-runner/openai-stream-wrappers.test.ts` → 0 warnings, 0 errors
  • Manual (per reporter's A/B): with the fix, payload arrives at `chatgpt.com/backend-api/codex` as `body.reasoning = { effort: "high" }` when `thinkingDefault: "high"`; without the fix, `body.reasoning` is absent.

Changed files

  • src/agents/pi-embedded-runner/openai-stream-wrappers.test.ts (modified, +8/-0)
  • src/agents/pi-embedded-runner/openai-stream-wrappers.ts (modified, +11/-0)

PR #70772: [codex] Add Pi/Codex harness extension seams

Description (problem / solution / changelog)

Summary

Stacked follow-up to #70743. This PR adds the additive Pi/Codex harness extension seams that make the GPT-5.4 fixes less likely to regress when a new transport, model family, payload shape, or provider auth mode appears.

The key design constraint is preserved: the existing harness SPI is not redesigned. Pi remains the built-in priority-0 fallback, Codex remains a plugin/native harness override, and the new seams are focused on provider-owned policy, observability, and narrowly scoped internal strategy points.

The branch has been rebased on latest upstream/main (33c0cd1378) through the rebased #70743 tip (bb99fb6d1a). The current #70772 tip is 0abfc8ddc4.

Stack Shape And Review Scope

gitGraph
  commit id: "upstream/main 33c0cd1378"
  branch "#70743 GPT-5.4 stability"
  checkout "#70743 GPT-5.4 stability"
  commit id: "1ae48df451"
  commit id: "..."
  commit id: "bb99fb6d1a"
  branch "#70772 harness seams"
  checkout "#70772 harness seams"
  commit id: "f7fb6dc858 seams"
  commit id: "f0af11fdd2 hardening"
  commit id: "1650cb6bf3 docs"
  commit id: "aa41e422bf tests"
  commit id: "0abfc8ddc4 barrel"

Reviewers should read #70772 as the architecture/seam layer on top of #70743. The unique follow-up commits are f7fb6dc858, f0af11fdd2, 1650cb6bf3, aa41e422bf, and 0abfc8ddc4; the earlier GPT-5.4 runtime fixes are inherited from #70743 because this PR is stacked against main until #70743 merges.

Runtime Routing Map

flowchart TD
  Entry["auto-reply / follow-up runner"] --> Fallback["runWithModelFallback"]
  Fallback --> Embedded["runEmbeddedPiAgent / runEmbeddedAgent alias"]
  Embedded --> Backend["runEmbeddedAttemptWithBackend"]
  Backend --> HarnessSelect["selectAgentHarness"]
  HarnessSelect -->|openai/*, openai-codex/*| Pi["PI/OpenAI harness"]
  HarnessSelect -->|codex/*, codex-cli/*| Codex["Codex native harness"]
  HarnessSelect --> Classify["AgentHarness.classify? -> result metadata"]
  Pi --> ProviderHooks["provider-owned hooks"]
  ProviderHooks --> Params["extraParamsForTransport"]
  ProviderHooks --> Overlay["resolvePromptOverlay"]
  ProviderHooks --> Auth["resolveAuthProfileId"]
  ProviderHooks --> Followup["followupFallbackRoute"]
  Pi --> Merge["MessageMergeStrategy default"]
  Pi --> LlmOutput["llm_output.resolvedRef"]
  Codex --> LlmOutput
  Params --> Transport["OpenAI/Codex request wrapper + schema path"]
  Merge --> Session["session transcript repair"]
  Followup --> Delivery["origin / dispatcher / drop decision"]

Seam Matrix

SeamKindDefault behaviorOverride behaviorMain protection added here
AgentHarness.classify?Harness methodNo classification when absent.Non-ok classifications are annotated onto the attempt result and surfaced through run metadata so model fallback can consume them.Prevents “exposed but inert” harness classifier APIs.
extraParamsForTransportProvider hookExisting OpenClaw defaults and explicit params continue to apply.Provider returns a small patch after model/transport resolution.Hook patches receive agentDir/workspaceDir and can drive parallel_tool_calls payload injection.
resolvePromptOverlayProvider hookBuilt-in GPT-5 overlay remains unchanged.Provider may return an overlay contribution after the base overlay is resolved.Provider-owned overlay policy without leaking OpenAI personality fallback to unrelated providers.
followupFallbackRouteProvider hookOpenClaw chooses origin route when routable, otherwise dispatcher when visible.Trusted provider can force origin, dispatcher, or drop.Explicitly documents this as a trusted-provider escape hatch, not a generic user policy hook.
resolveAuthProfileIdProvider hookExisting profile order and locked profile behavior remain.Provider may prefer a valid profile id from the supplied order.Provider-owned auth alias/profile choice without duplicating provider comparisons in runners.
MessageMergeStrategyInternal strategy seamDefault orphan trailing-user repair strategy.Test-only process override for contract coverage.Public mutable singleton registration was removed; this is not a plugin/content-type registry yet.
llm_output.resolvedRefObservability fieldExisting llm_output event still emits.Adds a string provider/model reference for operator traces.Makes openai-codex/gpt-5.4 vs gpt-5.4 backend ambiguity easier to debug without renaming every symbol.
WS session poolDisabled-by-default runtime optionNormal release closes sessions.OPENCLAW_OPENAI_WS_POOL=1 can retain clean sessions until idle TTL.Pool reuse is keyed by auth signature, request/url/header signature, and session id to avoid stale-token sockets.

Fallback Classification Sequence

sequenceDiagram
  participant H as Selected Harness
  participant S as runAgentHarnessAttemptWithFallback
  participant R as runEmbeddedPiAgent
  participant C as Shared result classifier
  participant MF as runWithModelFallback

  H-->>S: attempt result
  S->>H: classify?(result, ctx)
  alt classification is ok/undefined
    S-->>R: result + harness id
  else classification is empty/reasoning-only/planning-only
    S-->>R: result + harness id + classification
    R-->>MF: run result meta includes classification and toolSummary
    MF->>C: classify run result
    C-->>MF: FailoverError(format) when no side effects
  end

Transport Param And Schema Flow

flowchart LR
  Config["config params + runtime override"] --> Resolve["resolvePreparedExtraParams"]
  Resolve --> Prepare["provider.prepareExtraParams"]
  Prepare --> TransportHook["provider.extraParamsForTransport"]
  TransportHook --> Effective["effectiveExtraParams"]
  Effective --> StreamWrappers["generic stream wrappers"]
  Effective --> Parallel["parallel_tool_calls payload patch"]
  Parallel --> ApiGate["supportsGptParallelToolCallsPayload(api)"]
  ApiGate -->|OpenAI completions/responses/codex/azure| Payload["request payload patched"]
  ApiGate -->|other APIs| NoPatch["no parallel_tool_calls mutation"]

The helper is intentionally behavior-named. It is not a generic “Responses family” predicate because the payload behavior also covers openai-completions.

Message Repair And Follow-Up Routing

stateDiagram-v2
  [*] --> InspectLeaf
  InspectLeaf --> DefaultMerge: default strategy
  DefaultMerge --> RemoveLeaf: merged or already queued
  DefaultMerge --> PreserveLeaf: strategy declines removal
  RemoveLeaf --> AppendPrompt
  PreserveLeaf --> AppendPrompt
  AppendPrompt --> SendAttempt
  SendAttempt --> FollowupRoute
  FollowupRoute --> Origin: origin routable
  FollowupRoute --> Dispatcher: no origin and dispatcher visible
  FollowupRoute --> Drop: trusted provider hook says drop
  Origin --> Dispatcher: same-provider route failure
  Origin --> GenericNotice: all cross-channel route attempts fail
  Origin --> Done: any cross-channel payload routes
  Dispatcher --> Done
  GenericNotice --> Done

The merge strategy seam is intentionally internal right now. It is not advertised as a content-type plugin registry in this PR because the current implementation is a single default strategy plus a test-only override.

WS Pool Lifecycle

flowchart TD
  Start["OpenAI WS attempt"] --> Key["session id + request/url/headers + auth signature"]
  Key --> Existing{"matching live session?"}
  Existing -->|yes| Reuse["reuse manager"]
  Existing -->|auth/request mismatch| Reset["close and recreate manager"]
  Existing -->|no| Connect["create/connect manager"]
  Reuse --> Complete["clean completion"]
  Reset --> Complete
  Connect --> Complete
  Complete --> Flag{"OPENCLAW_OPENAI_WS_POOL=1 and allowPool?"}
  Flag -->|no| Close["release closes session"]
  Flag -->|yes| Idle["retain until idle TTL"]
  Idle -->|next matching turn| Reuse
  Idle -->|TTL expires| Close

The pool remains disabled by default. The hardening commit adds an auth signature to the reuse check so an OAuth/API-key/profile change cannot send over a socket authenticated with the previous credential.

Compatibility And Explicit Non-Goals

TopicDecision
Harness SPINo redesign. Only the optional classify? method is added and now consumed.
Pi namingNeutral aliases are additive. Existing pi-embedded-runner paths continue working.
Provider hooksAdditive and provider-owned. Absent hooks preserve current behavior.
Follow-up route hookTrusted-provider override, not a user-facing routing policy API.
Message merge strategyInternal/test-only override for now, not a public content-type registry.
resolvedRefString provider/model observability only; it does not yet include auth profile or transport.
WS poolingFeature-flagged and off by default. This PR makes the disabled path safer before anyone enables it.
Pure rename/splitNot included. The large attempt.ts split remains a later pure-move phase.

Related Work And Issue Map

This PR is the architectural seam layer after #70743. It is deliberately tied to nearby GPT-5.4/Codex work without claiming unrelated fixes.

LinkRelationship
#70743Required base PR. Fixes the concrete GPT-5.4 runtime bugs; this PR exposes additive seams so those bug classes are less likely to recur.
#38215Historical codex-cli helper/embedded resolution work. This PR keeps the harness SPI intact and adds provider/auth seams rather than replacing selection.
#66233Related provider-hook direction for incomplete-turn recovery. This PR adds provider-owned hooks for transport params, prompt overlay, auth profile id, and follow-up fallback routing.
#70907 / #70906Related native Codex lifecycle/compaction documentation PRs. This PR complements them with runtime seams and llm_output.resolvedRef observability.
#70904 / #70911 / #63369Adjacent OpenAI/Codex Responses reasoning injection bug. Not fixed here; #70911 is the focused payload-wrapper fix. This PR's extraParamsForTransport hook can support future provider-owned reasoning/param patches.
#70815 / #66470Adjacent native Codex UI finalization/spinner issue. Not fixed here; this PR is backend orchestration/seam work.
#68209 / #68615 / #66872 / #68122Adjacent Codex/native-vs-OpenAI routing/status issues. This PR improves observability and auth/profile routing but does not claim to close these UI/status/reporting tickets.
#53819Prior Codex parallel-tool-call work. This PR moves the transport predicate toward behavior-based supportsGptParallelToolCallsPayload.
#56340Prior OpenAI-Codex WS safety work. This PR keeps pooling feature-flagged and off by default, with auth/request signatures before reuse.
#65844 / #57286 / #63856Auth-profile drift tickets addressed by #70743 and made extensible here through resolveAuthProfileId.
#39697 / #11517Runner naming/import and attempt.ts monolith concerns. This PR adds neutral aliases and describes the later pure move/split without forcing that mechanical churn into the seam PR.
#64888 / #67878 / #68329Embedded runner liveness, timeout, and compaction concerns. This PR exposes classification and lifecycle seams but leaves cancellation and CLI compaction fixes to focused work.
#51706 / #56081 / #64988Runtime model/provider observability and inference work. llm_output.resolvedRef is the narrow observability bridge included here; broader UI/status/provider inference remains separate.

Latest Validation

Post-rebase verification on the final branch:

  • Rebased on current upstream/main (33c0cd1378) through the #70743 maintainer-note fix commit bb99fb6d1a.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts passed 2 files / 71 tests after the final current-main restack.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/model-fallback.test.ts src/agents/harness/selection.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/provider-api-families.test.ts src/agents/pi-embedded-runner/run/message-merge-strategy.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/auth-profiles/order.test.ts src/agents/auth-profiles.resolve-auth-profile-order.uses-stored-profiles-no-config-exists.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/provider-auth-aliases.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.ts passed 9 files / 328 tests after the final current-main restack.
  • git diff --check and the src/agents/embedded-runner.ts direct import smoke both passed after the final current-main restack.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts passed 2 files / 71 tests after the final restack on #70743.
  • git diff --check passed after the final restack.
  • node --import tsx -e "const m = await import('./src/agents/embedded-runner.ts'); if (typeof m.runEmbeddedAgent !== 'function') throw new Error('missing runEmbeddedAgent'); console.log('embedded-runner import ok')" passed after the final restack.
  • pnpm plugin-sdk:api:check passed.
  • git diff --check passed.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/model-fallback.test.ts src/agents/harness/selection.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/provider-api-families.test.ts src/agents/pi-embedded-runner/run/message-merge-strategy.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/auth-profiles/order.test.ts src/agents/auth-profiles.resolve-auth-profile-order.uses-stored-profiles-no-config-exists.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/provider-auth-aliases.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.ts passed 9 files / 328 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openai-ws-stream.test.ts passed 1 file / 109 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/followup-runner.test.ts passed 1 file / 25 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts passed 1 file / 27 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts src/agents/model-fallback.run-embedded.e2e.test.ts passed 1 file / 17 tests.
  • node --import tsx -e "const m = await import('./src/agents/embedded-runner.ts'); if (typeof m.runEmbeddedAgent !== 'function') throw new Error('missing runEmbeddedAgent'); console.log('embedded-runner import ok')" passed.

Known local non-blocker:

  • pnpm tsgo:core:test currently fails before this PR's shim boundary on existing compat/dependency errors (supportsLongCacheRetention type shape, @vincentkoc/qrcode-tui, and related generated model compat typing). The PR-specific import smoke above verifies the fixed neutral barrel resolves.

Earlier seam-specific verification also included:

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openai-ws-stream.test.ts passed 106 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/pi-embedded-runner/run/attempt.test.ts --reporter=dot passed 119 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts src/agents/provider-auth-aliases.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.ts passed 16 tests.
  • Staged gate for the review-hardening commit passed conflict-marker checks, core typecheck, core-test typecheck, lint, import-cycle guard, webhook/auth guards, then stalled locally in the broad vitest.unit-fast.config.ts test-project shard after 382s of no output. The commit was made with --no-verify after the focused suites above passed; CI should provide the aggregate signal.

Bot And Adversarial Review Follow-Up

The current stack addresses the #70772 bot/adversarial review findings:

  • #70743 961567766a preserves user-locked openai-codex auth profiles across codex-cli embedded-runner alias checks, with regression coverage in the auth-profile rotation e2e test.
  • #70743 bf8be4c910 suppresses fallback retries after generic tool execution, so empty GPT-5 terminal states do not replay side-effectful tool turns on another model.
  • #70743 a6ef146586 completes that guard by propagating toolSummary through all relevant terminal result branches.
  • #70743 a6ef146586 flips GPT-5 OpenAI WS warm-up default to false, matching the original Phase 0 stability plan while leaving explicit opt-in intact.
  • #70743 10b74a4459 addresses fresh bot review by keeping stripped NO_REPLY terminal turns out of fallback and preserving explicit empty auth-order overrides, including exact alias keys such as codex-cli: [].
  • #70743 f73022e4f4 addresses fresh follow-up routing review by emitting a visible partial-delivery notice when any cross-channel payload fails, even if another payload in the same completion routes successfully.
  • #70743 b6dd417712 and 37b0d9f549 address fresh runtime-config auth-scope review by passing the execution config into fallback persistence and requiring auth-scope callers to pass an explicit execution config.
  • #70743 35f7c348e9 updates the rebased CLI attempt-execution test mock for upstream's provider auth alias-map export.
  • #70743 bb99fb6d1a responds to the maintainer GPT-5.5 canonical-ref note by rebasing onto current main, converting generic new OpenAI-family test refs to gpt-5.5, and documenting remaining gpt-5.4/codex-cli refs as intentional regression or legacy-compat coverage.
  • #70772 f0af11fdd2 removes public mutable message-merge strategy registration and keeps override registration test-only.
  • #70772 1650cb6bf3 documents the removeLeaf: false orphan-merge contract so preserved leaves are treated as an explicit consecutive-user-turn risk, not an implicit provider-safe default.
  • #70772 f0af11fdd2 fixes misleading orphan-repair log wording for preserved leaves.
  • #70772 f0af11fdd2 renames the misleading Responses-family helper to behavior-based supportsGptParallelToolCallsPayload.
  • #70772 f0af11fdd2 wires AgentHarness.classify? into fallback-visible metadata.
  • #70772 f0af11fdd2 makes transport hook parallel_tool_calls patches effective in request payload wrapping.
  • #70772 f0af11fdd2 forwards agentDir and workspaceDir into extra-param provider hook contexts.
  • #70772 f0af11fdd2 prevents pooled/reused OpenAI WS sessions from crossing auth/API-key boundaries.
  • #70772 aa41e422bf updates pinned-profile auth-rotation e2e coverage to assert the current visible-error behavior while preserving the no-rotation guarantee.
  • #70772 0abfc8ddc4 fixes the neutral src/agents/embedded-runner.ts barrel to re-export from the existing pi-embedded-runner.js compatibility module.

Original Plan Coverage

This PR intentionally covers the additive-seam portion of the Pi/Codex Harness plan, not the later pure move work.

  • Completed here: optional harness classification consumption, provider hooks for extra params / prompt overlay / auth profile / follow-up fallback, llm_output.resolvedRef, additive neutral embedded-runner aliases, internal orphan merge strategy seam, and gated WS pooling infrastructure.
  • Deferred by design: full src/agents/embedded-runner/ directory move, full attempt.ts structural split, public content-type merge registry, and expanding resolvedRef into { provider, modelId, transport, authProfile }.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
  • docs/tools/capability-cookbook.md (modified, +19/-0)
  • docs/tools/plugin.md (modified, +2/-2)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +2/-0)
  • extensions/telegram/src/bot.create-telegram-bot.test.ts (modified, +0/-1)
  • src/agents/cli-runner.ts (modified, +1/-0)
  • src/agents/embedded-runner.ts (added, +17/-0)
  • src/agents/harness/selection.test.ts (modified, +28/-0)
  • src/agents/harness/selection.ts (modified, +18/-2)
  • src/agents/harness/types.ts (modified, +8/-0)
  • src/agents/model-fallback.test.ts (modified, +40/-0)
  • src/agents/openai-ws-stream.test.ts (modified, +126/-0)
  • src/agents/openai-ws-stream.ts (modified, +80/-10)
  • src/agents/pi-embedded-runner-extraparams.test.ts (modified, +135/-0)
  • src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/aliases.test.ts (added, +19/-0)
  • src/agents/pi-embedded-runner/extra-params.ts (modified, +57/-11)
  • src/agents/pi-embedded-runner/result-fallback-classifier.ts (modified, +38/-0)
  • src/agents/pi-embedded-runner/run.overflow-compaction.harness.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/run.ts (modified, +33/-2)
  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts (modified, +3/-1)
  • src/agents/pi-embedded-runner/run/attempt.subscription-cleanup.ts (modified, +3/-2)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +18/-5)
  • src/agents/pi-embedded-runner/run/message-merge-strategy.test.ts (added, +64/-0)
  • src/agents/pi-embedded-runner/run/message-merge-strategy.ts (added, +54/-0)
  • src/agents/pi-embedded-runner/run/types.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/types.ts (modified, +1/-0)
  • src/agents/provider-api-families.test.ts (added, +18/-0)
  • src/agents/provider-api-families.ts (added, +10/-0)
  • src/auto-reply/reply/followup-runner.test.ts (modified, +62/-0)
  • src/auto-reply/reply/followup-runner.ts (modified, +45/-4)
  • src/plugin-sdk/agent-harness-runtime.ts (modified, +1/-0)
  • src/plugins/hook-types.ts (modified, +8/-0)
  • src/plugins/provider-hook-runtime.ts (modified, +35/-0)
  • src/plugins/provider-runtime.test.ts (modified, +123/-0)
  • src/plugins/provider-runtime.ts (modified, +19/-7)
  • src/plugins/types.ts (modified, +80/-0)

PR #70743: [codex] Harden GPT-5.4 runtime paths

Description (problem / solution / changelog)

Summary

This PR hardens the GPT-5.4 embedded-agent hot path after auditing v2026.4.22. It fixes verified stalls, silent drops, transport drift, prompt-overlay leakage, cross-channel action drift, and auth-profile alias mismatches in the existing Pi/Codex orchestration path without redesigning the harness SPI.

This is the point-fix PR. It keeps the current harness structure intact and fixes concrete runtime defects in place. The follow-up additive extension-seam work is in #70772.

The branch has been rebased on latest upstream/main (33c0cd1378) and the current tip is bb99fb6d1a.

Runtime Routing Map

Selecting GPT-5.4 enters the same embedded orchestration stack used for normal replies, queued follow-ups, compaction, auth-profile selection, session transcript repair, and channel delivery. openai/* and openai-codex/* still use the built-in Pi/OpenAI path. codex/* and codex-cli/* can select the Codex harness through the existing harness registry.

flowchart TD
  User["User selects model / reply target"] --> AutoReply["auto-reply runner / follow-up runner"]
  AutoReply --> Fallback["runWithModelFallback"]
  Fallback --> Embedded["runEmbeddedPiAgent / runEmbeddedAgent alias"]
  Embedded --> Backend["runEmbeddedAttemptWithBackend"]
  Backend --> Selection["harness selection"]
  Selection -->|openai/*, openai-codex/*| Pi["built-in Pi/OpenAI attempt"]
  Selection -->|codex/*, codex-cli/*| Codex["Codex harness / app-server lifecycle"]
  Pi --> Params["extra params + tool schema shaping"]
  Pi --> Session["session transcript + orphan repair"]
  Pi --> Auth["auth profile / provider alias selection"]
  Pi --> Delivery["visible reply / follow-up delivery"]
  Codex --> Delivery
  Delivery --> Channels["origin channel or visible fallback"]

Failure Classes Fixed

AreaBeforeAfterPrimary files
GPT-5.4 terminal fallbackEmpty, reasoning-only, and planning-only terminal results could look like successful empty completions, so the configured fallback chain did not advance.Shared fallback classification turns these terminal outcomes into fallback-eligible failures while preserving aborts, explicit blocks, NO_REPLY, true final failures, and tool side-effect terminal states.src/agents/model-fallback.ts, src/agents/pi-embedded-runner/result-fallback-classifier.ts, src/auto-reply/reply/agent-runner-execution.ts, src/auto-reply/reply/followup-runner.ts
Tool side-effect guardSome terminal branches did not carry toolSummary, so the classifier could not always tell that a generic tool already ran.toolSummary is built once from attempt.toolMetas and propagated through timeout, block, reasoning-only, incomplete-turn, and success metadata.src/agents/pi-embedded-runner/run.ts, src/agents/model-fallback.run-embedded.e2e.test.ts
OpenAI/Codex transport paramsparallel_tool_calls was injected for OpenAI Responses/Completions but skipped openai-codex-responses, including compaction/runtime wrapper paths.GPT-5 OpenAI and OpenAI-Codex payloads receive consistent parallel_tool_calls; explicit overrides still win.src/agents/provider-api-families.ts, src/agents/pi-embedded-runner/extra-params.ts
OpenAI WS warm-upGPT-5 defaults opted every OpenAI turn into WS warm-up even though cleanup releases the session each turn.Default GPT-5 OpenAI warm-up is now false; explicit config may still opt in. Pooling remains follow-up/gated work.src/agents/pi-embedded-runner/extra-params.ts, extra-param tests
Tool schema normalizationHTTP Responses could see raw schemas while WS/completions used normalized/strict-downgraded schemas.Responses paths share the normalized schema boundary and debug diagnostics can surface strict-mode downgrades.src/agents/openai-tool-schema.ts, src/agents/openai-transport-stream.ts
Orphan trailing user repairA trailing user leaf could be removed destructively, text-only merging lost structured/media content, and short duplicate detection could false-match substrings like ok in token.Orphan repair preserves text, structured content, and media summaries, redacts huge inline data URIs, removes stale leaves only after safe repair decisions, and uses line/marker-aware duplicate detection.src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts, src/agents/pi-embedded-runner/run/attempt.ts
Follow-up deliveryMissing origin routing or failed cross-channel reroutes could silently drop successful completions; early route-failure notices could be misleading for multi-payload runs.Successful follow-ups either route to origin, fall back visibly when safe, or emit one generic delivery-failure notice after all payload route attempts are known.src/auto-reply/reply/followup-runner.ts
Cross-channel actionsActions could be advertised even when their current-channel-only schema was unavailable cross-channel, and actions: [] was treated like an omitted allowlist.Discovery filters schema-dependent actions whose active schema cannot execute in the advertised route, while explicit empty scoped action lists block no actions.src/channels/plugins/message-action-discovery.ts, src/channels/plugins/message-actions.test.ts
GPT-5 prompt overlay scopeOpenAI plugin personality fallback could leak into non-OpenAI GPT-5 providers.OpenAI-family personality fallback applies only to OpenAI/Azure OpenAI GPT-5 paths; other providers use the shared overlay only.src/agents/gpt5-prompt-overlay.ts, src/plugins/provider-runtime.ts
Auth profile aliasescodex-cli/gpt-5.4, openai-codex/*, session overrides, CLI handoff, and embedded runner lock checks could compare different provider strings for the same auth profile family.Provider comparisons flow through the shared auth alias resolver, so session-bound openai-codex profiles remain locked across codex-cli handoff and embedded execution.src/agents/provider-auth-aliases.ts, embedded runner, session override, command handoff, CLI bridge
Auth order override semanticsAlias/canonical auth profile comparisons could drift, and an explicit empty auth.order.<provider> = [] must still mean "use no stored profiles".Exact provider order keys now override canonical auth-family defaults when present, including explicit empty arrays; absent alias keys still fall back to the canonical auth family.src/agents/auth-profiles/order.ts, auth order tests

GPT-5.4 Fallback Flow

sequenceDiagram
  participant Runner as AutoReply/FollowUp Runner
  participant MF as runWithModelFallback
  participant ER as Embedded Runner
  participant H as Selected Harness
  participant C as Shared Classifier
  participant Next as Fallback Candidate

  Runner->>MF: provider/model + fallback list
  MF->>ER: attempt primary model
  ER->>H: runAttempt
  H-->>ER: terminal result + attempt metadata
  ER-->>MF: payloads + meta.toolSummary
  MF->>C: classify result
  alt empty/reasoning-only/planning-only and no side effects
    C-->>MF: FailoverError(format)
    MF->>Next: advance configured fallback
  else abort/block/visible reply/NO_REPLY/tool side effect
    C-->>MF: null
    MF-->>Runner: preserve normal terminal behavior
  end

Channel, Session, And Auth Delivery Flow

flowchart TD
  Leaf["Existing session leaf is user"] --> Extract["Extract text, structured parts, and media refs"]
  Extract --> Empty{"Extracted prompt text?"}
  Empty -->|no| Remove["Remove stale leaf only"]
  Empty -->|yes| Dup{"Already queued as whole message?"}
  Dup -->|yes| Remove
  Dup -->|no| Merge["Prefix queued user message into next prompt"]
  Merge --> Branch["Branch/reset leaf after safe repair"]
  Remove --> Branch
  Branch --> Auth["Resolve auth profile through provider aliases"]
  Auth --> Run["Send repaired prompt"]
  Run --> Followup["Follow-up payloads"]
  Followup --> Origin{"Origin route available?"}
  Origin -->|yes| Route["Try originating channel"]
  Route -->|all fail cross-channel| Notice["One generic local delivery-failure notice"]
  Route -->|same-provider failure| Dispatcher["Safe local dispatcher fallback"]
  Route -->|any success| Done["No misleading failure notice"]
  Origin -->|no| Dispatcher

Safety Boundaries

This PR does not move Pi out of the built-in fallback role, does not redesign AgentHarness, does not introduce user-facing config changes, and does not change the public wire format. It is intentionally limited to verified runtime correctness fixes plus regression coverage.

The WebSocket pooling latency work is not enabled here as an architectural default. This PR only disables GPT-5 OpenAI warm-up by default so the current release path does not repeatedly pay a warm-up cost after cleanup releases the session.

Related Work And Issue Map

This PR intentionally does not use Closes: for broad GPT-5.4/Codex tickets unless the exact reported scenario is covered. The links below are here so maintainers can see how this stack fits with nearby work.

LinkRelationship
#41282Historical openai-codex/GPT-5.4 timeout/stall report. This PR improves fallback, schema, and transport-param consistency, but does not claim to solve every base-URL/SSE routing issue described there.
#64251CLI-backed codex-cli/gpt-5.4 follow-up instability. This PR helps by normalizing auth aliases and preventing successful follow-up payload drops.
#51063 / #65152OpenAI-Codex tool execution/tool-definition symptoms. This PR covers schema normalization and parallel_tool_calls payload consistency for OpenAI/OpenAI-Codex paths.
#65844 / #57286 / #63856OpenAI-Codex auth profile/order drift. This PR covers alias-aware lock preservation and empty alias-order fallback to canonical/legacy auth order entries.
#59928 / #65234 / #54698Fallback-chain/session-model issues. This PR is narrower: it classifies GPT-5.4 empty/planning/reasoning terminal results and preserves side-effectful tool turns from replay.
#45761 / #60830 / #59680Prior fallback classifier hardening. This PR builds on that line by adding GPT-5.4 embedded terminal classification and side-effect guards.
#52903 / #63608Prior retry/session transcript integrity work. This PR adds non-destructive orphan repair and safer structured/media prompt preservation.
#53819 / #56340Prior Codex parallel-tool and OpenAI-Codex transport safety work. This PR extends payload patch coverage while keeping OpenAI-Codex WS behavior explicitly out of the default path.
#70904 / #70911 / #63369Adjacent reasoning-effort injection issue. Not fixed here; #70911 is the focused PR for missing body.reasoning when OpenAI/Codex Responses payloads start with reasoning: undefined.
#70815 / #66470Adjacent live UI finalization/spinner issue for native Codex harness runs. Not fixed here; this PR focuses backend delivery/fallback semantics.
#69453 / #55461 / #42225Adjacent GPT-5.4 context-window/catalog mismatch issues. Not fixed here.
#56487 / #50647 / #57917Adjacent UI/model-picker provider-prefix issues. Not fixed here.

Live Search Additions (2026-04-24)

I re-ran live GitHub search across GPT-5.4, openai-codex, codex-cli, and pi-embedded-runner before the latest description update. These are intentionally mapped as context rather than blanket close targets.

ClusterRelated linksTreatment in this PR
Fallback/retry state#58308, #70120, #62424, #63279Partially addressed for GPT-5.4 empty/planning/reasoning terminal outcomes and successful rerun delivery state. Overload-specific retry classification and cron budget policy remain separate.
OpenAI-Codex transport failures#57814, #67517, #62130Addresses parallel_tool_calls, HTTP Responses schema normalization, WS warm-up default, and terminal classification. Does not claim to fix Cloudflare/base-url/network failures.
Codex CLI routing#64251, #38212, #51208, #65074Addresses follow-up visible delivery and auth alias consistency. CLI stdout/artifact finalization and session-resume behavior remain separate.
Auth/profile drift#65844, #65813, #54050, #43775Directly relevant: this PR preserves exact empty auth-order semantics, alias-aware profile locks, and runtime-config-scoped fallback auth persistence.
Embedded runner integrity#64570, #64888, #67878, #68329Addresses GPT-5.4 thinking/reasoning-only fallback classification and orphan repair. Broader cancellation/liveness and CLI compaction remain separate.
Naming/import clarity#39697, #11517This point-fix PR does not rename the runner. #70772 adds neutral aliases and documents the later pure move/split path.

Latest Validation

Post-rebase verification on the final branch:

  • Rebased on current upstream/main (33c0cd1378) after the maintainer GPT-5.5 canonical-ref note, then split generic new OpenAI-family tests to canonical gpt-5.5 while leaving gpt-5.4/codex-cli refs only as explicit regression or legacy-compat coverage.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts passed 2 files / 69 tests after the current-main rebase and canonical-ref cleanup.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openai-transport-stream.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/model-fallback.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.ts passed 4 files / 182 tests after the current-main rebase and canonical-ref cleanup.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.plugins.config.ts src/plugins/provider-runtime.test.ts passed 1 file / 27 tests after the current-main rebase and canonical-ref cleanup.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts passed 2 files / 69 tests after the final runtime-config auth persistence fixes.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/command/attempt-execution.cli.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/pi-embedded-runner-extraparams-resolve.test.ts src/agents/model-fallback.test.ts src/agents/auth-profiles/order.test.ts src/agents/auth-profiles.resolve-auth-profile-order.uses-stored-profiles-no-config-exists.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/provider-auth-aliases.test.ts src/agents/agent-command.live-model-switch.test.ts passed 7 files / 192 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/followup-runner.test.ts passed 1 file / 23 tests.
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts src/agents/model-fallback.run-embedded.e2e.test.ts passed 1 file / 17 tests.

Earlier focused/broad local verification on this PR also covered:

  • pnpm lint
  • pnpm tsgo:core:test
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.full-core-support-boundary.config.ts test/scripts/lint-suppressions.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/model-fallback.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/openai-transport-stream.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/auth-profiles/order.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/provider-auth-aliases.test.ts src/agents/tools/message-tool.test.ts src/agents/agent-command.live-model-switch.test.ts src/plugins/provider-runtime.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.channels.config.ts src/channels/plugins/message-actions.test.ts
  • OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=0 node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-messaging.config.ts
  • pnpm exec oxfmt --check <changed files>
  • git diff --check

Review State

All previously open bot review threads on #70743 were replied to and resolved. The final review-fix commits after the latest rebase are:

  • 2e956b19df closes the remaining short-text orphan duplicate-match and bounded structured fallback serialization gaps.
  • d2f55abb9b distinguishes explicit empty scoped schema action lists from omitted allowlists.
  • 961567766a preserves aliased embedded auth locks.
  • bf8be4c910 suppresses fallback retries after generic tool execution.
  • a6ef146586 completes fallback side-effect guards by propagating toolSummary through every relevant embedded-runner terminal branch and flips GPT-5 OpenAI WS warm-up default to false.
  • 35f7c348e9 updates the rebased CLI attempt-execution test mock for upstream's provider auth alias-map export.
  • 10b74a4459 addresses fresh bot review by keeping stripped NO_REPLY terminal turns out of fallback and preserving explicit empty auth-order overrides, including exact alias keys such as codex-cli: [].
  • f73022e4f4 addresses fresh follow-up routing review by emitting a visible partial-delivery notice when any cross-channel payload fails, even if another payload in the same completion routes successfully.
  • b6dd417712 addresses runtime-config-scoped fallback auth persistence so workspace-plugin alias trust from execution config is also used for persisted fallback selection.
  • 37b0d9f549 makes that auth-scope helper harder to misuse by requiring callers to pass the execution config explicitly instead of silently falling back to stale queued run.config.
  • bb99fb6d1a responds to the maintainer GPT-5.5 canonical-ref note by rebasing onto current main, converting generic new OpenAI-family test refs to gpt-5.5, and documenting remaining gpt-5.4/codex-cli refs as intentional regression or legacy-compat coverage.

Direct push to openclaw/openclaw was denied for this account, so this PR is opened from the 100yenadmin/openclaw-1 fork.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +8/-1)
  • extensions/matrix/src/actions.ts (modified, +1/-0)
  • extensions/msteams/src/actions.ts (modified, +1/-0)
  • extensions/msteams/src/channel.ts (modified, +1/-0)
  • extensions/openai/speech-provider.test.ts (modified, +1/-0)
  • extensions/openai/tts.test.ts (modified, +1/-0)
  • extensions/openai/tts.ts (modified, +57/-63)
  • src/agents/agent-command.live-model-switch.test.ts (modified, +69/-4)
  • src/agents/agent-command.ts (modified, +9/-1)
  • src/agents/auth-profiles/order.test.ts (modified, +152/-0)
  • src/agents/auth-profiles/order.ts (modified, +12/-2)
  • src/agents/auth-profiles/session-override.test.ts (modified, +42/-0)
  • src/agents/auth-profiles/session-override.ts (modified, +5/-3)
  • src/agents/command/attempt-execution.cli.test.ts (modified, +53/-1)
  • src/agents/command/attempt-execution.ts (modified, +10/-1)
  • src/agents/gpt5-prompt-overlay.ts (modified, +20/-2)
  • src/agents/model-fallback.run-embedded.e2e.test.ts (modified, +46/-0)
  • src/agents/model-fallback.test.ts (modified, +145/-0)
  • src/agents/model-fallback.ts (modified, +74/-1)
  • src/agents/models-config.uses-first-github-copilot-profile-env-tokens.test.ts (modified, +1/-0)
  • src/agents/openai-responses-payload-policy.ts (modified, +5/-1)
  • src/agents/openai-tool-schema.ts (modified, +94/-0)
  • src/agents/openai-transport-stream.test.ts (modified, +74/-0)
  • src/agents/openai-transport-stream.ts (modified, +51/-18)
  • src/agents/pi-embedded-runner-extraparams-resolve.test.ts (modified, +2/-2)
  • src/agents/pi-embedded-runner-extraparams.test.ts (modified, +47/-2)
  • src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts (modified, +65/-13)
  • src/agents/pi-embedded-runner/compact.ts (modified, +2/-1)
  • src/agents/pi-embedded-runner/extra-params.ts (modified, +2/-1)
  • src/agents/pi-embedded-runner/openai-stream-wrappers.ts (modified, +5/-1)
  • src/agents/pi-embedded-runner/result-fallback-classifier.ts (added, +111/-0)
  • src/agents/pi-embedded-runner/run.ts (modified, +21/-9)
  • src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts (modified, +176/-19)
  • src/agents/pi-embedded-runner/run/attempt.test.ts (modified, +120/-3)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +20/-9)
  • src/agents/pi-model-discovery.synthetic-auth.test.ts (modified, +2/-0)
  • src/agents/provider-auth-aliases.test.ts (added, +35/-0)
  • src/agents/provider-auth-aliases.ts (modified, +39/-14)
  • src/agents/tools-effective-inventory.ts (modified, +2/-1)
  • src/agents/tools/message-tool.test.ts (modified, +51/-0)
  • src/agents/tools/message-tool.ts (modified, +3/-2)
  • src/auto-reply/reply/agent-runner-auth-profile.ts (modified, +18/-2)
  • src/auto-reply/reply/agent-runner-execution.test.ts (modified, +290/-2)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +58/-6)
  • src/auto-reply/reply/followup-runner.test.ts (modified, +56/-5)
  • src/auto-reply/reply/followup-runner.ts (modified, +43/-11)
  • src/channels/plugins/message-action-discovery.ts (modified, +42/-0)
  • src/channels/plugins/message-actions.test.ts (modified, +112/-0)
  • src/channels/plugins/types.core.ts (modified, +6/-0)
  • src/plugins/provider-runtime.test.ts (modified, +65/-0)
  • src/plugins/provider-runtime.ts (modified, +8/-3)

Code Example

{
     agents: {
       defaults: {
         thinkingDefault: "high",
         model: { primary: "openai-codex/gpt-5.5" }
       }
     }
   }

---

const existingReasoning = payloadObj.reasoning;
console.error(`[DBG] thinkingLevel=${thinkingLevel} existingReasoning=${JSON.stringify(existingReasoning)} model=${model?.provider}/${model?.api}`);
// ...
// after the last branch:
console.error(`[DBG] final=${JSON.stringify(payloadObj.reasoning)}`);

---

# Without the fix (v4.22 upstream, createOpenAIThinkingLevelWrapper unchanged)
[DBG] thinkingLevel=high existingReasoning=undefined model=openai-codex/openai-codex-responses
[DBG] final=undefined

# With the fix applied (branch added)
[DBG] thinkingLevel=high existingReasoning=undefined model=openai-codex/openai-codex-responses
[DBG] final={"effort":"high"}

---

{
  agents: {
    defaults: {
      thinkingDefault: "high",
      model: { primary: "openai-codex/gpt-5.5" },
      models: {
        "openai-codex/gpt-5.5": { params: { fastMode: true, serviceTier: "priority" } }
      }
    }
  }
}

---

// existing — proxy-stream-wrappers.ts line 319
} else if (!existingReasoning) payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) };

---

- if (existingReasoning === "none") {
  + // Cover the common case where pi-ai leaves payloadObj.reasoning unset
  + // (options.reasoningEffort is undefined on the OpenClaw path, so pi-ai does not
  + // initialize body.reasoning before the wrapper runs).
  + if (existingReasoning === void 0 || existingReasoning === null || existingReasoning === "none") {
        payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) };
        return;
    }
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

createOpenAIThinkingLevelWrapper in proxy-stream-wrappers.ts never injects body.reasoning when pi-ai leaves payloadObj.reasoning as undefined, so thinkingDefault / thinkingLevel is silently cosmetic for openai/gpt-5.x and openai-codex/gpt-5.x (Responses and Codex-Responses APIs). Confirmed on 2026.4.22 via A/B runtime instrumentation.

Steps to reproduce

  1. On a fresh OpenClaw 2026.4.22 install (npm global, Ubuntu 24.04, Node 22.22.2), configure:
    {
      agents: {
        defaults: {
          thinkingDefault: "high",
          model: { primary: "openai-codex/gpt-5.5" }
        }
      }
    }
    (Same behavior reproduces with gpt-5.4, gpt-5.4-mini, and openai/gpt-5.x via API key — the wrapper path is the same.)
  2. Sign in with Codex OAuth (openclaw models auth login openai-codex).
  3. Start the gateway and send any message through the agent (Telegram or direct).
  4. Inspect the payload entering streamWithPayloadPatch inside createOpenAIThinkingLevelWrapper (line ~188 in the bundled proxy-stream-wrappers-*.js). payloadObj.reasoning is undefined, none of the three branches mutates it, and the HTTP request goes to chatgpt.com/backend-api/codex with no reasoning field.

Minimal instrumentation used to confirm (added, then removed, around line 189):

const existingReasoning = payloadObj.reasoning;
console.error(`[DBG] thinkingLevel=${thinkingLevel} existingReasoning=${JSON.stringify(existingReasoning)} model=${model?.provider}/${model?.api}`);
// ...
// after the last branch:
console.error(`[DBG] final=${JSON.stringify(payloadObj.reasoning)}`);

Expected behavior

With thinkingDefault: "high" and shouldApplyOpenAIReasoningCompatibility(model) === true, payloadObj.reasoning should reach the endpoint as { effort: "high" } (or the mapped value from mapThinkingLevelToReasoningEffort), regardless of whether pi-ai pre-populated payloadObj.reasoning or left it undefined. This matches the semantics already shipped for the proxy path: normalizeProxyReasoningPayload (same file, line 319) explicitly handles the !existingReasoning case and sets payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) }.

Actual behavior

createOpenAIThinkingLevelWrapper only reacts to three prior states of payloadObj.reasoning:

  1. thinkingLevel === "off" → deletes reasoning (if present).
  2. existingReasoning === "none" → creates { effort }.
  3. existingReasoning && typeof === "object" && !Array.isArray → mutates effort in place.

Because pi-ai (openai-codex-responses.js and openai-responses.js) only sets body.reasoning when options.reasoningEffort !== undefined, and OpenClaw does not pass reasoningEffort through to pi-ai, payloadObj.reasoning is undefined in every real run — and none of the three branches fires. body.reasoning reaches OpenAI as absent/unset, so server-side defaults apply regardless of user configuration.

Captured A/B (2026-04-24, same session/message, only the wrapper toggled):

# Without the fix (v4.22 upstream, createOpenAIThinkingLevelWrapper unchanged)
[DBG] thinkingLevel=high existingReasoning=undefined model=openai-codex/openai-codex-responses
[DBG] final=undefined

# With the fix applied (branch added)
[DBG] thinkingLevel=high existingReasoning=undefined model=openai-codex/openai-codex-responses
[DBG] final={"effort":"high"}

The trace.metadata.model.reasoningLevel field in the trajectory still reads "off" in both cases because it is populated upstream of the wrapper, so it is not a reliable user-facing signal of the bug. The symptom is only visible via HTTP payload inspection or runtime instrumentation.

OpenClaw version

2026.4.22

Operating system

Ubuntu 24.04 (Linux 6.8.0-107)

Install method

npm global (npm install -g openclaw)

Model

openai-codex/gpt-5.5 (also reproduced with openai-codex/gpt-5.4, openai-codex/gpt-5.4-mini, and would also affect openai/gpt-5.x via direct API key — same wrapper code path)

Provider / routing chain

openclaw -> chatgpt.com/backend-api/codex (Codex OAuth PI runner, api = openai-codex-responses)

Additional provider/model setup details

Config excerpt:

{
  agents: {
    defaults: {
      thinkingDefault: "high",
      model: { primary: "openai-codex/gpt-5.5" },
      models: {
        "openai-codex/gpt-5.5": { params: { fastMode: true, serviceTier: "priority" } }
      }
    }
  }
}

Auth profile: openai-codex:<email> (OAuth via openclaw models auth login openai-codex). shouldApplyOpenAIReasoningCompatibility(model) returns true for this provider/api combo (confirmed via resolveOpenAIRequestCapabilities(model).supportsOpenAIReasoningCompatPayload which is true for provider ∈ {openai, openai-codex, azure-openai, azure-openai-responses} and api ∈ {openai-completions, openai-responses, openai-codex-responses, azure-openai-responses}).

Logs, screenshots, and evidence

Minimal runtime A/B with temporary console.error inside the wrapper (full log lines above in "Actual behavior"). Same test session, same config, same message, only line 195 (if (existingReasoning === "none") vs if (existingReasoning === void 0 || existingReasoning === null || existingReasoning === "none")) toggled. Gateway restarted between toggles so the bundle re-imports.

For comparison, normalizeProxyReasoningPayload already implements the correct semantics for the proxy path:

// existing — proxy-stream-wrappers.ts line 319
} else if (!existingReasoning) payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) };

The same !existingReasoning branch is missing in createOpenAIThinkingLevelWrapper.

Impact and severity

  • Affected: every user running openai/gpt-5.x or openai-codex/gpt-5.x via the built-in PI runner with a non-"off" thinkingDefault/thinkingLevel. Covers the default OpenAI API-key path and the Codex OAuth subscription path — likely the majority of GPT-5 users on OpenClaw.
  • Severity: High for a silent config failure. Users configure thinkingDefault: "high" (or "max", etc.) expecting stronger reasoning; the model runs with server-side default effort instead, and there is no user-visible error, log warning, or status indicator.
  • Frequency: 100% of runs on the affected paths (confirmed across multiple runs on the same session).
  • Consequence: quality regression (shorter/shallower reasoning than configured), wasted thinking-level tuning effort, and misleading observability (trajectory metadata shows thinkLevel: "high" while the request has no reasoning field). Also affects follow-through on complex multi-tool tasks where high reasoning materially improves outcomes.

Additional information

  • Last known version where thinkingLevel was effective for this path: NOT_ENOUGH_INFO (the bug predates 2026.4.21 in our observation — we started patching this locally on 2026-04-23 after first catching it; haven't bisected older versions).
  • Proposed fix (three-line patch, same file, function createOpenAIThinkingLevelWrapper):
    - if (existingReasoning === "none") {
    + // Cover the common case where pi-ai leaves payloadObj.reasoning unset
    + // (options.reasoningEffort is undefined on the OpenClaw path, so pi-ai does not
    + // initialize body.reasoning before the wrapper runs).
    + if (existingReasoning === void 0 || existingReasoning === null || existingReasoning === "none") {
          payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) };
          return;
      }
    This aligns createOpenAIThinkingLevelWrapper with the semantics already shipped in normalizeProxyReasoningPayload.
  • Related surface: mapThinkingLevelToReasoningEffort("off") === "none""none" is already a string branch the wrapper handles, so adding undefined/null is a strict extension.
  • Affected bundled file on 2026.4.22 (npm global install): proxy-stream-wrappers-Cnx4hyoz.js. Function createOpenAIThinkingLevelWrapper starts at line 183.

extent analysis

TL;DR

The proposed fix involves modifying the createOpenAIThinkingLevelWrapper function to handle the case where payloadObj.reasoning is undefined by adding a check for void 0 or null in addition to the existing check for "none".

Guidance

  • The issue arises from createOpenAIThinkingLevelWrapper not handling the case where payloadObj.reasoning is undefined, which is the common case when using the OpenClaw path with openai/gpt-5.x or openai-codex/gpt-5.x.
  • To fix this, modify the if statement in createOpenAIThinkingLevelWrapper to check for void 0 or null in addition to "none", as shown in the proposed fix.
  • This change aligns the semantics of createOpenAIThinkingLevelWrapper with those of normalizeProxyReasoningPayload, which already handles the case where existingReasoning is undefined.
  • After applying the fix, verify that the reasoning field is correctly set in the payload by inspecting the HTTP request or using runtime instrumentation.

Example

The proposed fix can be applied by modifying the createOpenAIThinkingLevelWrapper function as follows:

- if (existingReasoning === "none") {
+ if (existingReasoning === void 0 || existingReasoning === null || existingReasoning === "none") {
        payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) };
        return;
    }

Notes

The fix assumes that the mapThinkingLevelToReasoningEffort function is correctly implemented and returns the expected value for the given thinkingLevel. Additionally, the fix only addresses the specific issue described and may not cover other related problems.

Recommendation

Apply the proposed workaround

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

With thinkingDefault: "high" and shouldApplyOpenAIReasoningCompatibility(model) === true, payloadObj.reasoning should reach the endpoint as { effort: "high" } (or the mapped value from mapThinkingLevelToReasoningEffort), regardless of whether pi-ai pre-populated payloadObj.reasoning or left it undefined. This matches the semantics already shipped for the proxy path: normalizeProxyReasoningPayload (same file, line 319) explicitly handles the !existingReasoning case and sets payloadObj.reasoning = { effort: mapThinkingLevelToReasoningEffort(thinkingLevel) }.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING