openclaw - ✅(Solved) Fix [Codex×Pi parity Phase 3] Codex-plugin lifecycle harness [2 pull requests, 2 comments, 2 participants]

100yenadmin · 2026-05-10T08:21:07Z

[openclaw] PR 80179: docs qa-lab : runtime-parity gate design Pi vs Codex harness - Repository: openclaw/openclaw - Author: 100yenadmin - State: open | merged:… # PR #80179: docs(qa-lab): runtime-parity gate design (Pi vs Codex harness) - Repository: openclaw/openclaw - Author: 100yenadmin - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/80179 ## Description (problem / solution / changelog) ## Summary Adds `extensions/qa-lab/transport-parity-gate.md`, a design-only doc covering the Codex-vs-Pi runtime parity QA harness scoped in #80171. The doc lifts forward the transport-parity-gate.md sketch from closed PR #78512 (which was originally tracking #78457) and expands it to include the surfaces the maintainer thread asked for: - Runtime-parity (`pi` vs `codex` for the same model+provider) — the higher-value gate now that Codex is the default for OpenAI turns - Per-tool fixture set so "tool X breaks under codex" surfaces at tool granularity, not session-level - Codex-plugin lifecycle stress (cold install, version pinning, install racing first turn, doctor migration safety) - Auth-shape coverage (oauth-only, apikey-only, mixed-profiles) for the #78499 class - Token-efficiency report — the side-by-side per-runtime cost table pash explicitly asked for - JSONL session-replay harness for Eva's "loop 3 agents on real jsonl" ask The doc is the shared artifact the implementing agent (and `@Eva-⚡🐑` / `@pash` for review) work against; sub-issues #80172, #80173, #80174, #80175, #80176 are the actual implementation work. ## Why this is design-only The original #78512 was closed because its `it.fails` reproduction test no longer encodes the right invariant against post-#79238 main. The design doc itself, however, is still load-bearing — it's the only place the matrix shape, drift classifier, capture format, and CI wiring intent are written down. Splitting it out as a design-only PR avoids re-litigating the closure on every implementation PR and gives reviewers something to react to before code lands. ## Verification - `pnpm exec oxfmt --check --threads=1 extensions/qa-lab/transport-parity-gate.md` — clean - No code, runtime, workflow, or test changes — pure docs - Markdown-only diff; refs all link to issues that exist (#80171–#80176, #74290, #79347, #78457, #78055, #78060, #78407, #78499, #79238, #74622) ## Test plan - [x] Format check passes (oxfmt) - [x] All referenced issues exist - [x] Design intent matches the maintainer thread (pash + Eva + ai-hpc, Yesterday) - [ ] Maintainer review on matrix shape and per-cell capture format before Phase 1 (#80172) starts implementation ## References - RFC + tracking: #80171 - Sub-issues: - Phase 1 (Runtime axis): #80172 - Phase 2 (Per-tool fixtures): #80173 - Phase 3 (Codex-plugin lifecycle): #80174 - Phase 4 (Token-efficiency report): #80175 - Phase 5 (JSONL replay): #80176 - Sibling model-axis parity: #74290 (closed) → #79347 (in flight) - Original transport-parity proposal: #78457 - Closed PR with the original draft of this doc: #78512 ## Changed files - `extensions/qa-lab/transport-parity-gate.md` (added, +148/-0) --- # PR #80323: [qa-lab] Complete Codex vs Pi runtime parity harness phases 2-5 - Repository: openclaw/openclaw - Author: 100yenadmin - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/80323 ## Description (problem / solution / changelog) ## Summary Adds the Codex-vs-Pi runtime parity QA harness across `extensions/qa-lab`, including runtime-pair execution, first-hour/depth suite selectors, harness-prompt parity, token-efficiency reporting, tool-default fixtures, JSONL replay scaffolding, and release-check wiring. This update also corrects the tool-defaults mock lane so the harness matches Codex app-server architecture: - Codex-native workspace tools (`read`, `write`, `edit`, `apply_patch`, `exec`, `process`, `update_plan`) are no longer expected to appear as duplicate OpenClaw dynamic tools. - OpenClaw integration tools (`image_generate`, sessions, web, etc.) remain dynamic-tool parity rows and are tracked separately from Codex-native behavior rows. - Optional/profile/plugin-dependent tools stay report-only unless explicitly enabled. - Mock provider planned tool calls are captured as provider-plan diagnostics, not as runtime transcript tool evidence. - Tool coverage reports now show bucket, expected layer, required/report-only status, product impact, QA impact, and action. ## Why OpenClaw needs a maintainer-runnable gate that compares the same scenario/model under Pi and Codex before Codex becomes the default runtime. The gate must surface real runtime drift without turning mock-provider limitations or intentional Codex-native tool ownership into production bug reports. ## Verification Passing targeted/current-scope checks: - `pnpm test extensions/qa-lab/src/runtime-tool-fixture.test.ts extensions/qa-lab/src/runtime-parity.test.ts extensions/qa-lab/src/tool-coverage-report.test.ts extensions/qa-lab/src/runti

openclaw2026-05-10 08:21:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80174•Fetched 2026-05-11 03:18:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

100yenadmin

Participants

100yenadmin

clawsweeper[bot]

Timeline (top)

cross-referenced ×3commented ×2mentioned ×1subscribed ×1

Error Message

New extensions/qa-lab/src/codex-plugin-lifecycle.test.ts — one describe block per cell. Asserted error messages are string-matched so wording regressions are caught.
Asserted error messages: when cell 3 reports a version mismatch, the assertion is on the literal string emitted (or a regex with high specificity) so any wording drift is caught.
Failure-mode error messages are asserted by string-match so wording regressions are caught.

Fix Action

Fixed

Fixed by PR: docs(qa-lab): runtime-parity gate design (Pi vs Codex harness) (https://github.com/openclaw/openclaw/pull/80179)
Fixed by PR: [qa-lab] Complete Codex vs Pi runtime parity harness phases 2-5 (https://github.com/openclaw/openclaw/pull/80323)

PR fix notes

PR #80179: docs(qa-lab): runtime-parity gate design (Pi vs Codex harness)

Repository: openclaw/openclaw
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/80179

Description (problem / solution / changelog)

Summary

Adds extensions/qa-lab/transport-parity-gate.md, a design-only doc covering the Codex-vs-Pi runtime parity QA harness scoped in #80171.

The doc lifts forward the transport-parity-gate.md sketch from closed PR #78512 (which was originally tracking #78457) and expands it to include the surfaces the maintainer thread asked for:

Runtime-parity (pi vs codex for the same model+provider) — the higher-value gate now that Codex is the default for OpenAI turns
Per-tool fixture set so "tool X breaks under codex" surfaces at tool granularity, not session-level
Codex-plugin lifecycle stress (cold install, version pinning, install racing first turn, doctor migration safety)
Auth-shape coverage (oauth-only, apikey-only, mixed-profiles) for the #78499 class
Token-efficiency report — the side-by-side per-runtime cost table pash explicitly asked for
JSONL session-replay harness for Eva's "loop 3 agents on real jsonl" ask

The doc is the shared artifact the implementing agent (and @Eva-⚡🐑 / @pash for review) work against; sub-issues #80172, #80173, #80174, #80175, #80176 are the actual implementation work.

Why this is design-only

The original #78512 was closed because its it.fails reproduction test no longer encodes the right invariant against post-#79238 main. The design doc itself, however, is still load-bearing — it's the only place the matrix shape, drift classifier, capture format, and CI wiring intent are written down. Splitting it out as a design-only PR avoids re-litigating the closure on every implementation PR and gives reviewers something to react to before code lands.

Verification

pnpm exec oxfmt --check --threads=1 extensions/qa-lab/transport-parity-gate.md — clean
No code, runtime, workflow, or test changes — pure docs
Markdown-only diff; refs all link to issues that exist (#80171–#80176, #74290, #79347, #78457, #78055, #78060, #78407, #78499, #79238, #74622)

Test plan

Format check passes (oxfmt)
All referenced issues exist
Design intent matches the maintainer thread (pash + Eva + ai-hpc, Yesterday)
Maintainer review on matrix shape and per-cell capture format before Phase 1 (#80172) starts implementation

References

RFC + tracking: #80171
Sub-issues:
- Phase 1 (Runtime axis): #80172
- Phase 2 (Per-tool fixtures): #80173
- Phase 3 (Codex-plugin lifecycle): #80174
- Phase 4 (Token-efficiency report): #80175
- Phase 5 (JSONL replay): #80176
Sibling model-axis parity: #74290 (closed) → #79347 (in flight)
Original transport-parity proposal: #78457
Closed PR with the original draft of this doc: #78512

Changed files

extensions/qa-lab/transport-parity-gate.md (added, +148/-0)

PR #80323: [qa-lab] Complete Codex vs Pi runtime parity harness phases 2-5

Repository: openclaw/openclaw
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/80323

Description (problem / solution / changelog)

Summary

Adds the Codex-vs-Pi runtime parity QA harness across extensions/qa-lab, including runtime-pair execution, first-hour/depth suite selectors, harness-prompt parity, token-efficiency reporting, tool-default fixtures, JSONL replay scaffolding, and release-check wiring.

This update also corrects the tool-defaults mock lane so the harness matches Codex app-server architecture:

Codex-native workspace tools (read, write, edit, apply_patch, exec, process, update_plan) are no longer expected to appear as duplicate OpenClaw dynamic tools.
OpenClaw integration tools (image_generate, sessions, web, etc.) remain dynamic-tool parity rows and are tracked separately from Codex-native behavior rows.
Optional/profile/plugin-dependent tools stay report-only unless explicitly enabled.
Mock provider planned tool calls are captured as provider-plan diagnostics, not as runtime transcript tool evidence.
Tool coverage reports now show bucket, expected layer, required/report-only status, product impact, QA impact, and action.

Why

OpenClaw needs a maintainer-runnable gate that compares the same scenario/model under Pi and Codex before Codex becomes the default runtime. The gate must surface real runtime drift without turning mock-provider limitations or intentional Codex-native tool ownership into production bug reports.

Verification

Passing targeted/current-scope checks:

pnpm test extensions/qa-lab/src/runtime-tool-fixture.test.ts extensions/qa-lab/src/runtime-parity.test.ts extensions/qa-lab/src/tool-coverage-report.test.ts extensions/qa-lab/src/runtime-suite.test.ts extensions/qa-lab/src/suite.test.ts extensions/qa-lab/src/scenario-catalog.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/cli.test.ts
pnpm tsgo:extensions:test
pnpm check:test-types
git diff --check

Real Behavior Proof

Behavior or issue addressed: Corrects the runtime parity tool-defaults harness so Codex-native workspace tools are no longer falsely required as duplicate OpenClaw dynamic tools, while OpenClaw dynamic integration rows remain visible and tracked.
Real environment tested: Local OpenClaw checkout at /Volumes/LEXAR/repos/openclaw-1 on branch codex-vs-pi-runtime-parity-tools, running the real pnpm openclaw qa CLI against the embedded gateway and mock OpenAI provider after this patch.
Exact steps or command run after this patch:

OPENCLAW_BUILD_PRIVATE_QA=1 pnpm openclaw qa suite --repo-root . --provider-mode mock-openai --runtime-suite tool-defaults --runtime-pair pi,codex --output-dir .artifacts/qa-e2e/runtime-tools-correction
pnpm openclaw qa tool-coverage --repo-root . --summary .artifacts/qa-e2e/runtime-tools-correction/qa-suite-summary.json --runtime-pair pi,codex --output .artifacts/qa-e2e/runtime-tools-correction/qa-tool-coverage-report.md
OPENCLAW_BUILD_PRIVATE_QA=1 pnpm openclaw qa suite --repo-root . --provider-mode mock-openai --runtime-suite openclaw-dynamic-tools --runtime-pair pi,codex --output-dir .artifacts/qa-e2e/openclaw-dynamic-tools-correction
pnpm openclaw qa parity-report --repo-root . --runtime-axis --summary .artifacts/qa-e2e/runtime-tools-correction/qa-suite-summary.json --output-dir .artifacts/qa-e2e/runtime-tools-correction/parity --token-efficiency

Evidence after fix: Terminal output produced these real local artifacts: .artifacts/qa-e2e/runtime-tools-correction/qa-suite-summary.json, .artifacts/qa-e2e/runtime-tools-correction/qa-suite-report.md, .artifacts/qa-e2e/runtime-tools-correction/qa-tool-coverage-report.md, .artifacts/qa-e2e/openclaw-dynamic-tools-correction/qa-suite-summary.json, and .artifacts/qa-e2e/runtime-tools-correction/parity/qa-runtime-token-efficiency-report.md.
Observed result after fix: tool-defaults completed with 20 scenarios, 15 pass, 5 report-only skip, 0 fail. Tool coverage verdict was pass with 13 required tools, 8 Codex-native workspace tools, 5 OpenClaw dynamic integration tools, 7 optional/profile/plugin tools, and 0 failing tools. The focused openclaw-dynamic-tools suite completed with 5 report-only rows tracked under #80319. Token efficiency report verdict was pass with usage source mock-estimate.
What was not tested: Live frontier token-efficiency proof was not completed because local direct OpenAI auth is missing; optional scheduled/Testbox soak-100 proof was not completed; broad first-hour-20 remains red and is tracked in #80434.

Known Broad/Latest Blockers

First first-hour-20 attempt hit a pre-suite tsdown SIGSEGV; retry reached QA.
OPENCLAW_BUILD_PRIVATE_QA=1 pnpm openclaw qa suite --repo-root . --provider-mode mock-openai --runtime-suite first-hour-20 --runtime-pair pi,codex --output-dir .artifacts/qa-e2e/first-hour-20-correction-retry is not green: 18 total, 6 pass, 12 fail; tracked in #80434.
pnpm check fails unrelated Discord lint: #80428.
pnpm test fails unrelated agents-core / ACPx / Mattermost shards: #80429, #80430, #80431, #67784.
Live token-efficiency proof path renders artifacts, but local direct OpenAI auth is missing so the attempted live run is not valid proof; tracked in #80175.
Optional soak-100 exists but is not scheduled/Testbox-wired; tracked in #80433.

Linked Issues

Umbrella/spec: #80171

Phase issues: #80172, #80173, #80174, #80175, #80176

Harness correction issues: #80236, #80312, #80319, #80320; #80321 is closed as fixed by this PR branch.

Fresh broad-rerun follow-ups: #80428, #80429, #80430, #80431, #80433, #80434, #67784

Changed files

.github/workflows/openclaw-release-checks.yml (modified, +115/-0)
.github/workflows/qa-live-transports-convex.yml (modified, +77/-0)
apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift (modified, +4/-0)
extensions/codex/src/app-server/schema-normalization-runtime-contract.test.ts (modified, +9/-4)
extensions/lmstudio/src/models.test.ts (modified, +1/-1)
extensions/qa-lab/src/agentic-parity-report.test.ts (modified, +120/-0)
extensions/qa-lab/src/agentic-parity-report.ts (modified, +218/-0)
extensions/qa-lab/src/auth-profile-fixture.ts (added, +177/-0)
extensions/qa-lab/src/cli.runtime.test.ts (modified, +282/-0)
extensions/qa-lab/src/cli.runtime.ts (modified, +416/-3)
extensions/qa-lab/src/cli.ts (modified, +175/-7)
extensions/qa-lab/src/codex-plugin-fixture.ts (added, +282/-0)
extensions/qa-lab/src/codex-plugin-lifecycle.test.ts (added, +190/-0)
extensions/qa-lab/src/gateway-child.ts (modified, +7/-0)
extensions/qa-lab/src/harness-parity.test.ts (added, +144/-0)
extensions/qa-lab/src/harness-parity.ts (added, +415/-0)
extensions/qa-lab/src/jsonl-replay.test.ts (added, +169/-0)
extensions/qa-lab/src/jsonl-replay.ts (added, +270/-0)
extensions/qa-lab/src/multipass.runtime.test.ts (modified, +11/-0)
extensions/qa-lab/src/multipass.runtime.ts (modified, +6/-0)
extensions/qa-lab/src/providers/mock-openai/server.ts (modified, +74/-3)
extensions/qa-lab/src/runtime-parity.test.ts (added, +427/-0)
extensions/qa-lab/src/runtime-parity.ts (added, +1119/-0)
extensions/qa-lab/src/runtime-suite.test.ts (added, +75/-0)
extensions/qa-lab/src/runtime-suite.ts (added, +147/-0)
extensions/qa-lab/src/runtime-tool-fixture.test.ts (added, +156/-0)
extensions/qa-lab/src/runtime-tool-fixture.ts (added, +291/-0)
extensions/qa-lab/src/runtime-tool-metadata.ts (added, +142/-0)
extensions/qa-lab/src/scenario-catalog.test.ts (modified, +10/-0)
extensions/qa-lab/src/scenario-catalog.ts (modified, +4/-0)
extensions/qa-lab/src/scenario-flow-runner.ts (modified, +1/-1)
extensions/qa-lab/src/scenario-runtime-api.test.ts (modified, +1/-0)
extensions/qa-lab/src/scenario-runtime-api.ts (modified, +3/-0)
extensions/qa-lab/src/suite-runtime-flow.ts (modified, +13/-1)
extensions/qa-lab/src/suite-summary.ts (modified, +4/-1)
extensions/qa-lab/src/suite.summary-json.test.ts (modified, +53/-0)
extensions/qa-lab/src/suite.test.ts (modified, +100/-0)
extensions/qa-lab/src/suite.ts (modified, +449/-2)
extensions/qa-lab/src/token-efficiency-report.test.ts (added, +218/-0)
extensions/qa-lab/src/token-efficiency-report.ts (added, +379/-0)
extensions/qa-lab/src/tool-coverage-report.test.ts (added, +288/-0)
extensions/qa-lab/src/tool-coverage-report.ts (added, +285/-0)
extensions/qa-lab/transport-parity-gate.md (added, +66/-0)
extensions/qqbot/src/bridge/tools/remind.test.ts (modified, +1/-1)
extensions/qqbot/src/engine/gateway/outbound-dispatch.test.ts (modified, +1/-1)
extensions/slack/src/monitor/media.test.ts (modified, +3/-3)
extensions/tavily/src/tavily-tools.test.ts (modified, +3/-1)
qa/scenarios/agents/instruction-followthrough-repo-contract.md (modified, +1/-0)
qa/scenarios/agents/subagent-fanout-synthesis.md (modified, +1/-0)
qa/scenarios/agents/subagent-handoff.md (modified, +1/-0)
qa/scenarios/agents/subagent-stale-child-links.md (modified, +1/-0)
qa/scenarios/channels/channel-chat-baseline.md (modified, +1/-0)
qa/scenarios/config/config-restart-capability-flip.md (modified, +1/-0)
qa/scenarios/jsonl-replay/plan-mode-boundaries.jsonl (added, +8/-0)
qa/scenarios/jsonl-replay/recovery-partial-session.jsonl (added, +4/-0)
qa/scenarios/jsonl-replay/repo-triage-tool-loop.jsonl (added, +7/-0)
qa/scenarios/memory/memory-recall.md (modified, +1/-0)
qa/scenarios/memory/thread-memory-isolation.md (modified, +1/-0)
qa/scenarios/models/model-switch-tool-continuity.md (modified, +1/-0)
qa/scenarios/runtime/approval-turn-tool-followthrough.md (modified, +1/-0)
qa/scenarios/runtime/auth-profile-codex-mixed-profiles.md (added, +39/-0)
qa/scenarios/runtime/auth-profile-doctor-migration-safety.md (added, +44/-0)
qa/scenarios/runtime/codex-plugin-cold-install.md (added, +42/-0)
qa/scenarios/runtime/codex-plugin-install-race.md (added, +38/-0)
qa/scenarios/runtime/codex-plugin-pinned-new.md (added, +39/-0)
qa/scenarios/runtime/codex-plugin-pinned-old.md (added, +39/-0)
qa/scenarios/runtime/compaction-retry-mutating-tool.md (modified, +1/-0)
qa/scenarios/runtime/first-hour-20-turn.md (added, +68/-0)
qa/scenarios/runtime/soak-100-turn.md (added, +68/-0)
qa/scenarios/runtime/tools/apply-patch.md (added, +54/-0)
qa/scenarios/runtime/tools/bash.md (added, +55/-0)
qa/scenarios/runtime/tools/edit.md (added, +54/-0)
qa/scenarios/runtime/tools/exec.md (added, +54/-0)
qa/scenarios/runtime/tools/fs-list.md (added, +54/-0)
qa/scenarios/runtime/tools/fs-read.md (added, +54/-0)
qa/scenarios/runtime/tools/fs-write.md (added, +54/-0)
qa/scenarios/runtime/tools/grep.md (added, +54/-0)
qa/scenarios/runtime/tools/image-generate.md (added, +55/-0)
qa/scenarios/runtime/tools/memory-add.md (added, +54/-0)
qa/scenarios/runtime/tools/memory-recall.md (added, +54/-0)
qa/scenarios/runtime/tools/message-tool.md (added, +52/-0)
qa/scenarios/runtime/tools/session-status.md (added, +54/-0)
qa/scenarios/runtime/tools/sessions-spawn.md (added, +54/-0)
qa/scenarios/runtime/tools/skill-invocation.md (added, +54/-0)
qa/scenarios/runtime/tools/tavily-extract.md (added, +53/-0)
qa/scenarios/runtime/tools/tavily-search.md (added, +53/-0)
qa/scenarios/runtime/tools/tts.md (added, +54/-0)
qa/scenarios/runtime/tools/web-fetch.md (added, +54/-0)
qa/scenarios/runtime/tools/web-search.md (added, +54/-0)
qa/scenarios/workspace/source-docs-discovery-report.md (modified, +1/-0)
scripts/deadcode-unused-files.allowlist.mjs (modified, +2/-0)
src/agents/model-runtime-policy.test.ts (added, +91/-0)
src/agents/model-runtime-policy.ts (modified, +16/-0)

Code Example

export async function seedCodexPluginAt(version: "missing" | "current" | "head" | string, agentDir: string): Promise<void>;
  export async function snapshotCodexPluginState(agentDir: string): Promise<{ version?: string; installed: boolean }>;

RAW_BUFFERClick to expand / collapse

Tracking parent: #80171 Depends on: Phase 1 #80172

Goal

Stress the codex-as-plugin install / update / version-pinning lifecycle that pash flagged: "codex is a plugin like anything else, so it needs to be downloaded and installed before you can use it. obviously, considering openai use codex harness be default now, this is a source of stress for me, and I want to make sure all the edge cases are covered."

This phase codifies @ai-hpc's manual 4-cell doctor-migration verification plus the additional plugin-lifecycle cells the maintainer thread surfaced.

Scope

Six cells, automated, mock-openai mode, per-cell <60s. Live-mode variant gated to scheduled runs.

Cells

Cold install — clean home, no codex plugin → openclaw doctor --fix from a config that needs codex. Assert: clear remediation message, install completes, retry succeeds, no $ leakage to api-key path.
OAuth-only with mixed-profiles — both openai-codex:* and openai:* profiles in auth-profiles.json → assert codex auth picked, not the api-key path. This is the residual #78499 case (Codex app-server auth profile "openai:media-api" must belong to provider "openai-codex" or a supported alias).
Pinned-old codex plugin + new openclaw — codex plugin pinned to release N-1, openclaw on N → assert version mismatch detected and reported with a clear remediation hint. Sets up the regression coverage pash asked for ("pinning a certain version of the codex harness with a version of openclaw, which is another potential source of bugs").
Pinned-new codex plugin + old openclaw — same axis flipped.
Codex plugin install racing first agent turn — concurrent install + agent run → assert ordering doesn't lose tokens or produce a duplicate response. Uses deterministic ordering primitives, not timing-based assertions.
Doctor migration safety (@ai-hpc's 4-cell matrix) — codify the four manual cells:
- oauth-only host (no OPENAI_API_KEY) → openai-codex profile picked, codex harness used
- mixed-profile (codex OAuth + raw openai api-key) with no pin → openai-codex still picked
- mixed-profile + agents.defaults.agentRuntime.id="pi" pin → doctor strips pin, codex auto-routes
- mixed-profile + per-agent agents.list[main].agentRuntime.id="pi" pin → same, doctor strips pin, codex auto-routes

Concrete deliverables

Code

New extensions/qa-lab/src/codex-plugin-fixture.ts — helpers:

export async function seedCodexPluginAt(version: "missing" | "current" | "head" | string, agentDir: string): Promise<void>;
export async function snapshotCodexPluginState(agentDir: string): Promise<{ version?: string; installed: boolean }>;

New extensions/qa-lab/src/codex-plugin-lifecycle.test.ts — one describe block per cell. Asserted error messages are string-matched so wording regressions are caught.
New extensions/qa-lab/src/auth-profile-fixture.ts — helpers to seed auth-profiles.json to a known shape (oauth-only, apikey-only, mixed).
Extend extensions/qa-lab/src/runtime-parity.ts — add a pluginState axis to per-cell capture so the cells above plug into the unified report.
Extend .github/workflows/openclaw-release-checks.yml — add a qa_lab_codex_lifecycle_release_checks step that runs the six cells.

Tests

Each cell is its own test, deterministic, mock-mode by default.
Live-mode variant gated to OPENCLAW_LIVE_TEST=1 and the scheduled cron, not on every release.
Asserted error messages: when cell 3 reports a version mismatch, the assertion is on the literal string emitted (or a regex with high specificity) so any wording drift is caught.

Acceptance criteria

All six cells implemented, automated, mock-openai mode, complete <60s each.
Failure-mode error messages are asserted by string-match so wording regressions are caught.
Live-mode variant gated to scheduled runs (OPENCLAW_LIVE_TEST=1), not on every PR.
Cell 5 (install race) uses deterministic ordering primitives — no setTimeout / sleep-based assertions.
@ai-hpc's manual 4-cell matrix is fully reproduced as automated cells.
Each cell, when it fails, emits a remediation hint that is also asserted by the test (so the user-visible remediation doesn't drift).
pnpm check:test-types and pnpm exec oxlint clean.

Out of scope

Token efficiency (Phase 4).
JSONL replay (Phase 5).
Real-customer transcript ingestion.

References

Tracking parent: #80171
Phase 1: #80172
@ai-hpc's manual matrix: maintainer thread (Yesterday)
#78499 — Codex auth profile selection (cell 2 covers this)
#78407 — original migration bug (cell 6 covers the fixed-on-main paths)
#79238 — most recent runtime-policy fix (the migration safety cells must hold against this surface)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #dependency conflict #environment setup #docker error #permission error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Codex×Pi parity Phase 3] Codex-plugin lifecycle harness [2 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #80179: docs(qa-lab): runtime-parity gate design (Pi vs Codex harness)

Description (problem / solution / changelog)

Summary

Why this is design-only

Verification

Test plan

References

Changed files

PR #80323: [qa-lab] Complete Codex vs Pi runtime parity harness phases 2-5

Description (problem / solution / changelog)

Summary

Why

Verification

Real Behavior Proof

Known Broad/Latest Blockers

Linked Issues

Changed files

Code Example

Goal

Scope

Cells

Concrete deliverables

Code

Tests

Acceptance criteria

Out of scope

References

Still need to ship something?

RELATED_DISCOVERY

TRENDING