openclaw - ✅(Solved) Fix [Feature]: track remaining Codex harness parity and hook-policy follow-ups [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70478Fetched 2026-04-24 05:57:34
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Participants
Assignees
Timeline (top)
labeled ×4assigned ×1closed ×1

Track the remaining Codex app-server parity and policy follow-ups after the landed native hook stack.

Root Cause

Alternatives considered

  • Keep handling parity follow-ups ad hoc in individual PRs.
    • weaker because the remaining questions are cross-cutting and easy to forget once the main PRs are merged.
  • Chase full literal Pi parity everywhere.
    • weaker because some surfaces, especially tool_result_persist, are the wrong abstraction for Codex and would create fake symmetry instead of clean host-native seams.

PR fix notes

PR #70307: feat(codex): add tool hook parity

Description (problem / solution / changelog)

Summary

  • Problem: Codex app-server runs used the shared before_tool_call path, but missed the post-tool parity hooks that Pi exposes.
  • Why it matters: bundled tool integrations like token reducers and transcript write hooks diverge on Codex, so the same plugin behaves differently across harnesses.
  • What changed: add a bundled-plugin Codex app-server extension seam for async tool_result middleware, fire after_tool_call for Codex tool runs, and route mirrored Codex transcript writes through before_message_write.
  • What did NOT change (scope boundary): this does not add llm_input / llm_output / agent_end, and it does not tackle compaction or before_prompt_build yet.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #69946
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the Codex harness executes tools and mirrors transcript writes through its own app-server bridge, so Pi-only post-tool and write hooks never ran there.
  • Missing detection / guardrail: we had coverage for shared tool execution and Codex transcript mirroring, but no seam-level tests asserting after_tool_call, before_message_write, or async tool-result reducers on the Codex path.
  • Contributing context (if known): tokenjuice exposed the drift because it relies on the new async extension seam opened for Pi and had no equivalent entry point in Codex.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/codex/src/app-server/dynamic-tools.test.ts, extensions/codex/src/app-server/transcript-mirror.test.ts, src/agents/codex-app-server.extensions.test.ts
  • Scenario the test should lock in: Codex tool calls run async tool_result reducers plus after_tool_call, and mirrored transcript writes respect before_message_write.
  • Why this is the smallest reliable guardrail: the gap lives in the Codex app-server bridge layer, not in isolated helper math.
  • Existing test that already covers this (if any): before_tool_call coverage already existed through shared tool execution.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Bundled plugins can register Codex app-server tool_result middleware through the plugin SDK.
  • Codex tool runs now fire after_tool_call.
  • Codex transcript mirroring now respects before_message_write.

Diagram (if applicable)

Before:
[codex tool execute] -> [raw tool result] -> [serialize to Codex items]
[codex transcript mirror] -> [append message directly]

After:
[codex tool execute] -> [codex tool_result middleware] -> [after_tool_call] -> [serialize to Codex items]
[codex transcript mirror] -> [before_message_write] -> [append message]

Security Impact (required)

  • New permissions/capabilities? (Yes)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: this adds a new bundled-plugin-only Codex app-server extension seam. registration is gated to bundled plugins that explicitly declare contracts.embeddedExtensionFactories: ["codex-app-server"], matching the Pi embedded extension model instead of exposing a broad ungated registry.

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 25 / pnpm workspace
  • Model/provider: Codex app-server harness
  • Integration/channel (if any): N/A
  • Relevant config (redacted): default local repo setup

Steps

  1. Register a bundled plugin Codex app-server extension factory and load the active plugin registry.
  2. Execute a Codex app-server tool call or mirror a Codex transcript write.
  3. Verify the extension runner / hook runner mutates the result or message before projection/persist.

Expected

  • Codex tool results can be rewritten through async middleware, after_tool_call fires, and mirrored transcript writes respect before_message_write.

Actual

  • Matches expected after this patch.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: ran focused Codex dynamic-tools, transcript-mirror, extension registration, and plugin-sdk contract lanes locally.
  • Edge cases checked: bundled-only registration, manifest contract gating, duplicate registration handling, thrown middleware containment, blocked before_message_write.
  • What you did not verify: full repo pnpm check / pnpm build, and a live end-to-end Codex server session.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: another harness-specific hook seam could drift if future Codex lifecycle work lands inconsistently.
    • Mitigation: this keeps the seam narrow, adds targeted Codex harness tests, and leaves the remaining lifecycle work for follow-up PRs instead of overloading one patch.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
  • docs/plugins/sdk-agent-harness.md (modified, +9/-0)
  • extensions/codex/src/app-server/dynamic-tools.test.ts (modified, +91/-1)
  • extensions/codex/src/app-server/dynamic-tools.ts (modified, +41/-1)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +12/-1)
  • extensions/codex/src/app-server/transcript-mirror.test.ts (modified, +154/-60)
  • extensions/codex/src/app-server/transcript-mirror.ts (modified, +17/-1)
  • src/agents/codex-app-server.extensions.test.ts (added, +263/-0)
  • src/agents/harness/codex-app-server-extensions.ts (added, +53/-0)
  • src/agents/harness/hook-helpers.ts (added, +74/-0)
  • src/gateway/server-plugins.test.ts (modified, +1/-0)
  • src/gateway/test-helpers.plugin-registry.ts (modified, +1/-0)
  • src/plugin-sdk/agent-harness.ts (modified, +12/-0)
  • src/plugins/api-builder.ts (modified, +5/-0)
  • src/plugins/captured-registration.ts (modified, +7/-0)
  • src/plugins/codex-app-server-extension-factory.ts (added, +9/-0)
  • src/plugins/codex-app-server-extension-types.ts (added, +38/-0)
  • src/plugins/loader.ts (modified, +3/-0)
  • src/plugins/registry-empty.ts (modified, +1/-0)
  • src/plugins/registry-types.ts (modified, +10/-0)
  • src/plugins/registry.ts (modified, +68/-0)
  • src/plugins/status.test-helpers.ts (modified, +1/-0)
  • src/plugins/types.ts (modified, +3/-0)
  • src/test-utils/channel-plugins.ts (modified, +1/-0)
  • test/helpers/plugins/plugin-api.ts (modified, +1/-0)

PR #70312: feat(codex): add llm lifecycle hooks

Description (problem / solution / changelog)

Summary

  • Problem: native Codex app-server turns did not emit the shared llm_input, llm_output, or agent_end plugin hooks.
  • Why it matters: lifecycle automation and diagnostics that work on PI runs silently miss Codex harness turns, so hook behavior drifts by runtime.
  • What changed: add shared harness helpers for those lifecycle hooks, fire llm_input before Codex turn/start, and fire llm_output plus agent_end after Codex turn completion using mirrored session history and the projected attempt result.
  • What did NOT change (scope boundary): this does not cover before_prompt_build, compaction hooks, or Codex tool-result middleware.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #69946
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the Codex harness owns its own turn-start and turn-complete flow, but those lifecycle points never forwarded into the shared plugin hook runner.
  • Missing detection / guardrail: we had no Codex harness tests asserting LLM lifecycle hook parity, so the gap sat there until the parity audit.
  • Contributing context (if known): the same hook families already run on PI, so native Codex runs looked healthy while still skipping automation that depended on those hooks.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/codex/src/app-server/run-attempt.test.ts
  • Scenario the test should lock in: a Codex run fires llm_input, llm_output, and agent_end with the expected payloads, including failure metadata.
  • Why this is the smallest reliable guardrail: the bug is in the Codex harness attempt lifecycle, not inside isolated hook-runner helpers.
  • Existing test that already covers this (if any): plugin-sdk-subpaths keeps the new helper exports honest.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Native Codex app-server turns now emit llm_input, llm_output, and agent_end plugin hooks.
  • Codex llm_input uses the mirrored OpenClaw session transcript as its history snapshot.

Diagram (if applicable)

Before:
[codex turn/start] -> [native Codex turn] -> [result]
                 x llm_input
[result]         x llm_output / agent_end

After:
[codex turn/start] -> [llm_input] -> [native Codex turn]
[result] -> [llm_output] -> [agent_end]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 25 / pnpm workspace
  • Model/provider: Codex app-server harness
  • Integration/channel (if any): N/A
  • Relevant config (redacted): default local repo setup

Steps

  1. Initialize the global hook runner with llm_input, llm_output, or agent_end handlers.
  2. Start a Codex app-server attempt and drive a turn to completion or failure in the test harness.
  3. Verify the handlers receive the projected Codex lifecycle payloads.

Expected

  • Codex turns trigger the same LLM lifecycle hooks that PI already emits.

Actual

  • Matches expected after this patch.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: ran Codex run-attempt tests for success and failure hook payloads, plugin-sdk subpath contract tests, and plugin-sdk API baseline checks locally.
  • Edge cases checked: mirrored transcript history inclusion, failure-path agent_end, public helper export drift.
  • What you did not verify: full repo pnpm check / pnpm build, and a live Codex app-server session outside the test harness.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: Codex hook payloads use the mirrored OpenClaw transcript, which can still differ slightly from the native Codex thread state.
    • Mitigation: that mirror is the same local transcript OpenClaw already persists and exposes elsewhere, and the remaining prompt-build/compaction parity work stays split into follow-up PRs.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/plugins/codex-harness.md (modified, +4/-0)
  • extensions/codex/src/app-server/run-attempt.test.ts (modified, +206/-0)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +85/-14)
  • src/agents/harness/lifecycle-hook-helpers.ts (added, +73/-0)
  • src/plugin-sdk/agent-harness.ts (modified, +5/-0)

PR #70313: feat(codex): add prompt and compaction hooks

Description (problem / solution / changelog)

Summary

  • Problem: native Codex app-server runs still skipped before_prompt_build, before_compaction, and after_compaction even after the other Codex parity seams were added.
  • Why it matters: prompt shims and compaction-aware plugins can behave differently across PI and Codex harnesses even when they only rely on shared hooks.
  • What changed: route Codex turn setup through before_prompt_build, let that hook rewrite developer instructions and the outgoing prompt text, and fire before_compaction / after_compaction when Codex reports native contextCompaction items.
  • What did NOT change (scope boundary): this does not add any new Codex-native hook format, and it does not try to fake opaque Codex compaction internals beyond surfacing the event and an honest compactedCount: -1 when the native backend does not expose a concrete count.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #69946
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Codex thread start/resume and Codex-native compaction events lived entirely inside the app-server bridge, so the shared prompt-build and compaction hooks never saw them.
  • Missing detection / guardrail: no Codex harness tests asserted prompt hook rewrites or compaction hook delivery.
  • Contributing context (if known): PI already owned those seams, so the parity gap only showed up when auditing hook-by-hook behavior across runtimes.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/codex/src/app-server/run-attempt.test.ts, extensions/codex/src/app-server/event-projector.test.ts
  • Scenario the test should lock in: Codex before_prompt_build rewrites the developer instructions and outgoing prompt, and Codex contextCompaction items trigger shared compaction hooks.
  • Why this is the smallest reliable guardrail: the bug is at the Codex harness boundary where prompt assembly and native compaction notifications are bridged into OpenClaw hooks.
  • Existing test that already covers this (if any): plugin-sdk-subpaths covers the new public helper exports.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Native Codex app-server turns now respect before_prompt_build.
  • Native Codex compaction events now trigger before_compaction and after_compaction.

Diagram (if applicable)

Before:
[codex prompt/developer instructions] -> [turn/start]
                                   x before_prompt_build
[codex contextCompaction item]     x before_compaction / after_compaction

After:
[codex prompt/developer instructions] -> [before_prompt_build] -> [turn/start]
[codex contextCompaction item] -> [before_compaction] ... [after_compaction]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 25 / pnpm workspace
  • Model/provider: Codex app-server harness
  • Integration/channel (if any): N/A
  • Relevant config (redacted): default local repo setup

Steps

  1. Register before_prompt_build or compaction hooks in the global hook runner.
  2. Start a Codex app-server attempt or feed contextCompaction notifications into the projector.
  3. Verify the hooks receive the bridged prompt/compaction payloads.

Expected

  • Codex runs expose the same shared prompt-build and compaction hook seams PI already exposes.

Actual

  • Matches expected after this patch.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: ran Codex run-attempt and event-projector tests plus plugin-sdk subpath and API baseline checks locally.
  • Edge cases checked: mirrored history passed to before_prompt_build, developer instruction rewrites, native compaction event delivery, explicit compactedCount: -1 for opaque Codex compaction.
  • What you did not verify: full repo pnpm check / pnpm build, and a live Codex app-server session outside the test harness.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: Codex compaction is still a native black box, so hook payloads cannot expose a real compacted-message delta.
    • Mitigation: the bridge is explicit about that and reports compactedCount: -1 instead of inventing fake precision.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/plugins/codex-harness.md (modified, +4/-0)
  • extensions/codex/src/app-server/event-projector.test.ts (modified, +135/-23)
  • extensions/codex/src/app-server/event-projector.ts (modified, +46/-7)
  • extensions/codex/src/app-server/run-attempt.test.ts (modified, +77/-0)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +37/-1)
  • extensions/codex/src/app-server/thread-lifecycle.ts (modified, +14/-6)
  • src/agents/harness/prompt-compaction-hook-helpers.ts (added, +156/-0)
  • src/plugin-sdk/agent-harness.ts (modified, +5/-0)
RAW_BUFFERClick to expand / collapse

Summary

Track the remaining Codex app-server parity and policy follow-ups after the landed native hook stack.

Problem to solve

The main Codex parity stack is now landed:

That closed the big gaps for after_tool_call, Codex-native tool_result middleware, before_message_write, before_prompt_build, before_compaction, after_compaction, llm_input, llm_output, and agent_end.

What is still missing is one place to track the smaller remaining parity and policy decisions so they do not drift or get re-litigated across random PRs.

Proposed solution

Use this issue as the follow-up tracker for the remaining Codex harness work:

  1. Audit whether before_model_resolve needs native Codex parity or whether the current prompt/build path is the right stopping point.
  2. Decide explicitly whether exact tool_result_persist parity is a non-goal for Codex now that Codex has a native async post-tool-result seam.
  3. Decide whether the new Codex lifecycle/tool/transcript hook payloads need extra trust gating, redaction, or documentation, or whether the current plugin trust model is the intended answer.
  4. Evaluate whether any tiny cleanup remains around transcript-mirror behavior, such as avoiding downstream transcript-update emits when every mirrored message is blocked.
  5. Keep one hook-by-hook parity matrix current after follow-up decisions land.

Alternatives considered

  • Keep handling parity follow-ups ad hoc in individual PRs.
    • weaker because the remaining questions are cross-cutting and easy to forget once the main PRs are merged.
  • Chase full literal Pi parity everywhere.
    • weaker because some surfaces, especially tool_result_persist, are the wrong abstraction for Codex and would create fake symmetry instead of clean host-native seams.

Impact

  • Affected: Codex harness users, plugin authors, and maintainers trying to reason about hook parity vs Pi.
  • Severity: medium. The big gaps are closed, but the remaining unanswered parts will create confusion and design drift if left implicit.
  • Frequency: intermittent but recurring whenever hook behavior is extended or security/policy questions come up.
  • Consequence: repeated re-audits, contradictory assumptions about parity, and a higher chance of host divergence creeping back in.

Evidence/examples

Additional information

This should stay scoped to Codex app-server harness behavior, not Codex CLI native hook bridging.

extent analysis

TL;DR

Use this issue as a tracker for remaining Codex harness parity and policy decisions to avoid drift and ensure consistency.

Guidance

  • Audit the need for native Codex parity in before_model_resolve and decide on the stopping point for prompt/build path.
  • Evaluate the need for exact tool_result_persist parity and consider the implications of Codex's native async post-tool-result seam.
  • Assess the need for extra trust gating, redaction, or documentation for new Codex lifecycle/tool/transcript hook payloads.
  • Review transcript-mirror behavior to avoid unnecessary downstream transcript-update emits.
  • Maintain a current hook-by-hook parity matrix after follow-up decisions are made.

Example

No code snippet is provided as the issue focuses on high-level design and policy decisions.

Notes

The solution should be scoped to Codex app-server harness behavior and not affect Codex CLI native hook bridging. The goal is to track and resolve remaining parity and policy questions to prevent design drift and ensure consistency.

Recommendation

Apply the proposed solution to use this issue as a tracker for remaining Codex harness parity and policy decisions, as it provides a centralized location for tracking and resolving outstanding questions, reducing the risk of confusion and design drift.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Feature]: track remaining Codex harness parity and hook-policy follow-ups [3 pull requests, 1 participants]