openclaw - ✅(Solved) Fix [QA harness] Mock approval followthrough emits undeclared read for Codex app-server lane [2 pull requests, 5 comments, 2 participants]

openclaw2026-05-10 11:33:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80236•Fetched 2026-05-11 03:17:11

View on GitHub

Comments

Participants

Timeline

Reactions

Author

100yenadmin

Participants

100yenadmin

clawsweeper[bot]

Timeline (top)

cross-referenced ×6commented ×5renamed ×1

Error Message

Codex should execute and surface the same read result shape as Pi for this scenario, or the runtime should fail with a clear structured tool error before final-answer synthesis. It should not synthesize a successful-looking answer from unsupported call: read.

Root Cause

The Codex default-runtime flip needs tool-level parity against the existing Pi path. During the Phase 1 harness proof run, the new runtime-pair lane caught a deterministic drift in an existing agentic scenario: both runtimes planned the same read call, but Codex produced an unsupported-call result instead of the file contents Pi received.

This is exactly the class of regression the runtime-parity harness is meant to make visible before Codex becomes the default OpenAI runtime.

Fix Action

Fix / Workaround

The original issue overclaimed this as a P1 Codex runtime problem. A higher-confidence code-path audit shows the mock provider emits a provider-level read function call from prompt text even when the Codex app-server lane does not declare read as an OpenClaw dynamic tool. Codex intentionally owns workspace tools such as read/write/edit/exec/apply_patch natively rather than exposing them through the OpenClaw dynamic-tool bridge.

Codex dynamic tools intentionally exclude read, write, edit, apply_patch, exec, process, and update_plan.
The mock provider can still emit read based on prompt text.
Runtime parity previously preferred /debug/requests provider-plan snapshots over transcript-derived tool events.

PR fix notes

PR #80238: test(qa-lab): add Codex vs Pi runtime parity harness

Repository: openclaw/openclaw
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/80238

Description (problem / solution / changelog)

Why

Codex is moving toward the default OpenAI runtime, but the existing release parity checks compare model behavior, not runtime behavior. That leaves a known blind spot: the same scenario and same model can pass under Pi while drifting under Codex at the tool layer.

This adds the Phase 1 runtime axis from #80172 so qa-lab can run each scenario once as pi and once as codex, capture per-runtime cells, and classify drift at the tool/result/structure/failure level instead of only reporting a session-level pass/fail.

Part of #80171. Closes #80172. Detected follow-up drift: #80236.

What Changed

Adds the private-QA-only OPENCLAW_QA_FORCE_RUNTIME=pi|codex override in resolveModelRuntimePolicy, gated by OPENCLAW_BUILD_PRIVATE_QA=1.
Adds extensions/qa-lab/src/runtime-parity.ts with the runtime cell shape, assistant-message usage capture, provider-side mock /debug/requests tool capture, and six-bucket drift classifier.
Adds qa suite --runtime-pair pi,codex and runtime-axis qa parity-report --runtime-axis --summary <path>.
Extends suite summaries and runtime parity Markdown reporting with per-runtime cells and aggregate drift counts.
Wires qa_lab_runtime_parity_release_checks into openclaw-release-checks.yml next to the existing model-axis parity lane.

Real-Behavior Proof

The harness caught a real drift in approval-turn-tool-followthrough:

OPENCLAW_BUILD_PRIVATE_QA=1 \
OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
OPENCLAW_QA_SUITE_PROGRESS=1 \
pnpm openclaw qa suite \
  --provider-mode mock-openai \
  --scenario approval-turn-tool-followthrough \
  --concurrency 1 \
  --runtime-pair pi,codex \
  --output-dir .artifacts/qa-e2e/runtime-parity-proof-approval-remap5

The suite exits nonzero because drift is present, but it writes the runtime summary. The follow-up report command:

OPENCLAW_BUILD_PRIVATE_QA=1 \
OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
pnpm openclaw qa parity-report \
  --repo-root . \
  --runtime-axis \
  --summary .artifacts/qa-e2e/runtime-parity-proof-approval-remap5/qa-suite-summary.json \
  --output-dir .artifacts/qa-e2e/runtime-parity-proof-approval-remap5-report

Observed report excerpt:

| Tool-result-shape drift | 1 |

- Approval turn tool followthrough drift=tool-result-shape (tool result 1 differs (read)).

pi: pass (1 tool calls, 256 tokens)
codex: fail (1 tool calls, 176 tokens)

The captured cells show both runtimes planned read with the same args hash, while Codex returned unsupported call: read. I filed that runtime bug as #80236 instead of hiding it in the harness PR.

Verification

pnpm exec vitest run --config test/vitest/vitest.extension-qa.config.ts \
  extensions/qa-lab/src/runtime-parity.test.ts \
  extensions/qa-lab/src/suite.summary-json.test.ts \
  extensions/qa-lab/src/agentic-parity-report.test.ts \
  extensions/qa-lab/src/cli.runtime.test.ts \
  extensions/qa-lab/src/multipass.runtime.test.ts \
  extensions/qa-lab/src/suite.test.ts

pnpm exec vitest run --config test/vitest/vitest.agents.config.ts \
  src/agents/model-runtime-policy.test.ts

pnpm tsgo:core
pnpm tsgo:core:test
pnpm tsgo:extensions:test
pnpm check:test-types
pnpm exec oxlint --type-aware --tsconfig config/tsconfig/oxlint.json --allow eslint/no-underscore-dangle \
  extensions/qa-lab/src/runtime-parity.ts \
  extensions/qa-lab/src/runtime-parity.test.ts \
  extensions/qa-lab/src/suite.ts \
  extensions/qa-lab/src/suite.test.ts \
  extensions/qa-lab/src/suite-summary.ts \
  extensions/qa-lab/src/suite.summary-json.test.ts \
  extensions/qa-lab/src/agentic-parity-report.ts \
  extensions/qa-lab/src/agentic-parity-report.test.ts \
  extensions/qa-lab/src/cli.ts \
  extensions/qa-lab/src/cli.runtime.ts \
  extensions/qa-lab/src/cli.runtime.test.ts \
  extensions/qa-lab/src/multipass.runtime.ts \
  extensions/qa-lab/src/multipass.runtime.test.ts \
  extensions/qa-lab/src/gateway-child.ts \
  src/agents/model-runtime-policy.ts \
  src/agents/model-runtime-policy.test.ts

Non-Goals

Does not add the Phase 2 per-tool fixture set yet.
Does not add Phase 3 Codex plugin lifecycle cells yet.
Does not add Phase 4 token-efficiency reporting beyond capturing per-cell assistant-message usage.
Does not add Phase 5 JSONL replay yet.

Changed files

.github/workflows/openclaw-release-checks.yml (modified, +73/-0)
extensions/qa-lab/src/agentic-parity-report.test.ts (modified, +108/-0)
extensions/qa-lab/src/agentic-parity-report.ts (modified, +205/-0)
extensions/qa-lab/src/cli.runtime.test.ts (modified, +109/-0)
extensions/qa-lab/src/cli.runtime.ts (modified, +62/-2)
extensions/qa-lab/src/cli.ts (modified, +17/-7)
extensions/qa-lab/src/gateway-child.ts (modified, +7/-0)
extensions/qa-lab/src/multipass.runtime.test.ts (modified, +11/-0)
extensions/qa-lab/src/multipass.runtime.ts (modified, +6/-0)
extensions/qa-lab/src/runtime-parity.test.ts (added, +313/-0)
extensions/qa-lab/src/runtime-parity.ts (added, +899/-0)
extensions/qa-lab/src/suite-summary.ts (modified, +3/-0)
extensions/qa-lab/src/suite.summary-json.test.ts (modified, +53/-0)
extensions/qa-lab/src/suite.test.ts (modified, +47/-0)
extensions/qa-lab/src/suite.ts (modified, +372/-0)
src/agents/model-runtime-policy.test.ts (added, +91/-0)
src/agents/model-runtime-policy.ts (modified, +16/-0)

PR #80323: [qa-lab] Complete Codex vs Pi runtime parity harness phases 2-5

Repository: openclaw/openclaw
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/80323

Description (problem / solution / changelog)

Summary

Adds the Codex-vs-Pi runtime parity QA harness across extensions/qa-lab, including runtime-pair execution, first-hour/depth suite selectors, harness-prompt parity, token-efficiency reporting, tool-default fixtures, JSONL replay scaffolding, and release-check wiring.

This update also corrects the tool-defaults mock lane so the harness matches Codex app-server architecture:

Codex-native workspace tools (read, write, edit, apply_patch, exec, process, update_plan) are no longer expected to appear as duplicate OpenClaw dynamic tools.
OpenClaw integration tools (image_generate, sessions, web, etc.) remain dynamic-tool parity rows and are tracked separately from Codex-native behavior rows.
Optional/profile/plugin-dependent tools stay report-only unless explicitly enabled.
Mock provider planned tool calls are captured as provider-plan diagnostics, not as runtime transcript tool evidence.
Tool coverage reports now show bucket, expected layer, required/report-only status, product impact, QA impact, and action.

Why

OpenClaw needs a maintainer-runnable gate that compares the same scenario/model under Pi and Codex before Codex becomes the default runtime. The gate must surface real runtime drift without turning mock-provider limitations or intentional Codex-native tool ownership into production bug reports.

Verification

Passing targeted/current-scope checks:

pnpm test extensions/qa-lab/src/runtime-tool-fixture.test.ts extensions/qa-lab/src/runtime-parity.test.ts extensions/qa-lab/src/tool-coverage-report.test.ts extensions/qa-lab/src/runtime-suite.test.ts extensions/qa-lab/src/suite.test.ts extensions/qa-lab/src/scenario-catalog.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/cli.test.ts
pnpm tsgo:extensions:test
pnpm check:test-types
git diff --check

Real Behavior Proof

Behavior or issue addressed: Corrects the runtime parity tool-defaults harness so Codex-native workspace tools are no longer falsely required as duplicate OpenClaw dynamic tools, while OpenClaw dynamic integration rows remain visible and tracked.
Real environment tested: Local OpenClaw checkout at /Volumes/LEXAR/repos/openclaw-1 on branch codex-vs-pi-runtime-parity-tools, running the real pnpm openclaw qa CLI against the embedded gateway and mock OpenAI provider after this patch.
Exact steps or command run after this patch:

OPENCLAW_BUILD_PRIVATE_QA=1 pnpm openclaw qa suite --repo-root . --provider-mode mock-openai --runtime-suite tool-defaults --runtime-pair pi,codex --output-dir .artifacts/qa-e2e/runtime-tools-correction
pnpm openclaw qa tool-coverage --repo-root . --summary .artifacts/qa-e2e/runtime-tools-correction/qa-suite-summary.json --runtime-pair pi,codex --output .artifacts/qa-e2e/runtime-tools-correction/qa-tool-coverage-report.md
OPENCLAW_BUILD_PRIVATE_QA=1 pnpm openclaw qa suite --repo-root . --provider-mode mock-openai --runtime-suite openclaw-dynamic-tools --runtime-pair pi,codex --output-dir .artifacts/qa-e2e/openclaw-dynamic-tools-correction
pnpm openclaw qa parity-report --repo-root . --runtime-axis --summary .artifacts/qa-e2e/runtime-tools-correction/qa-suite-summary.json --output-dir .artifacts/qa-e2e/runtime-tools-correction/parity --token-efficiency

Evidence after fix: Terminal output produced these real local artifacts: .artifacts/qa-e2e/runtime-tools-correction/qa-suite-summary.json, .artifacts/qa-e2e/runtime-tools-correction/qa-suite-report.md, .artifacts/qa-e2e/runtime-tools-correction/qa-tool-coverage-report.md, .artifacts/qa-e2e/openclaw-dynamic-tools-correction/qa-suite-summary.json, and .artifacts/qa-e2e/runtime-tools-correction/parity/qa-runtime-token-efficiency-report.md.
Observed result after fix: tool-defaults completed with 20 scenarios, 15 pass, 5 report-only skip, 0 fail. Tool coverage verdict was pass with 13 required tools, 8 Codex-native workspace tools, 5 OpenClaw dynamic integration tools, 7 optional/profile/plugin tools, and 0 failing tools. The focused openclaw-dynamic-tools suite completed with 5 report-only rows tracked under #80319. Token efficiency report verdict was pass with usage source mock-estimate.
What was not tested: Live frontier token-efficiency proof was not completed because local direct OpenAI auth is missing; optional scheduled/Testbox soak-100 proof was not completed; broad first-hour-20 remains red and is tracked in #80434.

Known Broad/Latest Blockers

First first-hour-20 attempt hit a pre-suite tsdown SIGSEGV; retry reached QA.
OPENCLAW_BUILD_PRIVATE_QA=1 pnpm openclaw qa suite --repo-root . --provider-mode mock-openai --runtime-suite first-hour-20 --runtime-pair pi,codex --output-dir .artifacts/qa-e2e/first-hour-20-correction-retry is not green: 18 total, 6 pass, 12 fail; tracked in #80434.
pnpm check fails unrelated Discord lint: #80428.
pnpm test fails unrelated agents-core / ACPx / Mattermost shards: #80429, #80430, #80431, #67784.
Live token-efficiency proof path renders artifacts, but local direct OpenAI auth is missing so the attempted live run is not valid proof; tracked in #80175.
Optional soak-100 exists but is not scheduled/Testbox-wired; tracked in #80433.

Linked Issues

Umbrella/spec: #80171

Phase issues: #80172, #80173, #80174, #80175, #80176

Harness correction issues: #80236, #80312, #80319, #80320; #80321 is closed as fixed by this PR branch.

Fresh broad-rerun follow-ups: #80428, #80429, #80430, #80431, #80433, #80434, #67784

Changed files

.github/workflows/openclaw-release-checks.yml (modified, +115/-0)
.github/workflows/qa-live-transports-convex.yml (modified, +77/-0)
apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift (modified, +4/-0)
extensions/codex/src/app-server/schema-normalization-runtime-contract.test.ts (modified, +9/-4)
extensions/lmstudio/src/models.test.ts (modified, +1/-1)
extensions/qa-lab/src/agentic-parity-report.test.ts (modified, +120/-0)
extensions/qa-lab/src/agentic-parity-report.ts (modified, +218/-0)
extensions/qa-lab/src/auth-profile-fixture.ts (added, +177/-0)
extensions/qa-lab/src/cli.runtime.test.ts (modified, +282/-0)
extensions/qa-lab/src/cli.runtime.ts (modified, +416/-3)
extensions/qa-lab/src/cli.ts (modified, +175/-7)
extensions/qa-lab/src/codex-plugin-fixture.ts (added, +282/-0)
extensions/qa-lab/src/codex-plugin-lifecycle.test.ts (added, +190/-0)
extensions/qa-lab/src/gateway-child.ts (modified, +7/-0)
extensions/qa-lab/src/harness-parity.test.ts (added, +144/-0)
extensions/qa-lab/src/harness-parity.ts (added, +415/-0)
extensions/qa-lab/src/jsonl-replay.test.ts (added, +169/-0)
extensions/qa-lab/src/jsonl-replay.ts (added, +270/-0)
extensions/qa-lab/src/multipass.runtime.test.ts (modified, +11/-0)
extensions/qa-lab/src/multipass.runtime.ts (modified, +6/-0)
extensions/qa-lab/src/providers/mock-openai/server.ts (modified, +74/-3)
extensions/qa-lab/src/runtime-parity.test.ts (added, +427/-0)
extensions/qa-lab/src/runtime-parity.ts (added, +1119/-0)
extensions/qa-lab/src/runtime-suite.test.ts (added, +75/-0)
extensions/qa-lab/src/runtime-suite.ts (added, +147/-0)
extensions/qa-lab/src/runtime-tool-fixture.test.ts (added, +156/-0)
extensions/qa-lab/src/runtime-tool-fixture.ts (added, +291/-0)
extensions/qa-lab/src/runtime-tool-metadata.ts (added, +142/-0)
extensions/qa-lab/src/scenario-catalog.test.ts (modified, +10/-0)
extensions/qa-lab/src/scenario-catalog.ts (modified, +4/-0)
extensions/qa-lab/src/scenario-flow-runner.ts (modified, +1/-1)
extensions/qa-lab/src/scenario-runtime-api.test.ts (modified, +1/-0)
extensions/qa-lab/src/scenario-runtime-api.ts (modified, +3/-0)
extensions/qa-lab/src/suite-runtime-flow.ts (modified, +13/-1)
extensions/qa-lab/src/suite-summary.ts (modified, +4/-1)
extensions/qa-lab/src/suite.summary-json.test.ts (modified, +53/-0)
extensions/qa-lab/src/suite.test.ts (modified, +100/-0)
extensions/qa-lab/src/suite.ts (modified, +449/-2)
extensions/qa-lab/src/token-efficiency-report.test.ts (added, +218/-0)
extensions/qa-lab/src/token-efficiency-report.ts (added, +379/-0)
extensions/qa-lab/src/tool-coverage-report.test.ts (added, +288/-0)
extensions/qa-lab/src/tool-coverage-report.ts (added, +285/-0)
extensions/qa-lab/transport-parity-gate.md (added, +66/-0)
extensions/qqbot/src/bridge/tools/remind.test.ts (modified, +1/-1)
extensions/qqbot/src/engine/gateway/outbound-dispatch.test.ts (modified, +1/-1)
extensions/slack/src/monitor/media.test.ts (modified, +3/-3)
extensions/tavily/src/tavily-tools.test.ts (modified, +3/-1)
qa/scenarios/agents/instruction-followthrough-repo-contract.md (modified, +1/-0)
qa/scenarios/agents/subagent-fanout-synthesis.md (modified, +1/-0)
qa/scenarios/agents/subagent-handoff.md (modified, +1/-0)
qa/scenarios/agents/subagent-stale-child-links.md (modified, +1/-0)
qa/scenarios/channels/channel-chat-baseline.md (modified, +1/-0)
qa/scenarios/config/config-restart-capability-flip.md (modified, +1/-0)
qa/scenarios/jsonl-replay/plan-mode-boundaries.jsonl (added, +8/-0)
qa/scenarios/jsonl-replay/recovery-partial-session.jsonl (added, +4/-0)
qa/scenarios/jsonl-replay/repo-triage-tool-loop.jsonl (added, +7/-0)
qa/scenarios/memory/memory-recall.md (modified, +1/-0)
qa/scenarios/memory/thread-memory-isolation.md (modified, +1/-0)
qa/scenarios/models/model-switch-tool-continuity.md (modified, +1/-0)
qa/scenarios/runtime/approval-turn-tool-followthrough.md (modified, +1/-0)
qa/scenarios/runtime/auth-profile-codex-mixed-profiles.md (added, +39/-0)
qa/scenarios/runtime/auth-profile-doctor-migration-safety.md (added, +44/-0)
qa/scenarios/runtime/codex-plugin-cold-install.md (added, +42/-0)
qa/scenarios/runtime/codex-plugin-install-race.md (added, +38/-0)
qa/scenarios/runtime/codex-plugin-pinned-new.md (added, +39/-0)
qa/scenarios/runtime/codex-plugin-pinned-old.md (added, +39/-0)
qa/scenarios/runtime/compaction-retry-mutating-tool.md (modified, +1/-0)
qa/scenarios/runtime/first-hour-20-turn.md (added, +68/-0)
qa/scenarios/runtime/soak-100-turn.md (added, +68/-0)
qa/scenarios/runtime/tools/apply-patch.md (added, +54/-0)
qa/scenarios/runtime/tools/bash.md (added, +55/-0)
qa/scenarios/runtime/tools/edit.md (added, +54/-0)
qa/scenarios/runtime/tools/exec.md (added, +54/-0)
qa/scenarios/runtime/tools/fs-list.md (added, +54/-0)
qa/scenarios/runtime/tools/fs-read.md (added, +54/-0)
qa/scenarios/runtime/tools/fs-write.md (added, +54/-0)
qa/scenarios/runtime/tools/grep.md (added, +54/-0)
qa/scenarios/runtime/tools/image-generate.md (added, +55/-0)
qa/scenarios/runtime/tools/memory-add.md (added, +54/-0)
qa/scenarios/runtime/tools/memory-recall.md (added, +54/-0)
qa/scenarios/runtime/tools/message-tool.md (added, +52/-0)
qa/scenarios/runtime/tools/session-status.md (added, +54/-0)
qa/scenarios/runtime/tools/sessions-spawn.md (added, +54/-0)
qa/scenarios/runtime/tools/skill-invocation.md (added, +54/-0)
qa/scenarios/runtime/tools/tavily-extract.md (added, +53/-0)
qa/scenarios/runtime/tools/tavily-search.md (added, +53/-0)
qa/scenarios/runtime/tools/tts.md (added, +54/-0)
qa/scenarios/runtime/tools/web-fetch.md (added, +54/-0)
qa/scenarios/runtime/tools/web-search.md (added, +54/-0)
qa/scenarios/workspace/source-docs-discovery-report.md (modified, +1/-0)
scripts/deadcode-unused-files.allowlist.mjs (modified, +2/-0)
src/agents/model-runtime-policy.test.ts (added, +91/-0)
src/agents/model-runtime-policy.ts (modified, +16/-0)

Code Example

OPENCLAW_BUILD_PRIVATE_QA=1 \
OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
OPENCLAW_QA_SUITE_PROGRESS=1 \
pnpm openclaw qa suite \
  --provider-mode mock-openai \
  --scenario approval-turn-tool-followthrough \
  --concurrency 1 \
  --runtime-pair pi,codex \
  --output-dir .artifacts/qa-e2e/runtime-parity-proof-approval-remap5

OPENCLAW_BUILD_PRIVATE_QA=1 \
OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
pnpm openclaw qa parity-report \
  --repo-root . \
  --runtime-axis \
  --summary .artifacts/qa-e2e/runtime-parity-proof-approval-remap5/qa-suite-summary.json \
  --output-dir .artifacts/qa-e2e/runtime-parity-proof-approval-remap5-report

RAW_BUFFERClick to expand / collapse

Correction TLDR

Status: harness/mock-provider artifact, not a proven user-facing Codex app-server bug.

What actually breaks: the QA parity harness is comparing a malformed mock-provider plan against the Codex app-server lane. This is not enough evidence that real users lose approval-followthrough reads.

Impact if OpenClaw moved fully to Codex today: P4 until live/native proof says otherwise. The remaining risk is harness fidelity and malformed mock-provider robustness, not a demonstrated production approval-read regression.

Correct Fix

Gate mock read planning on declared/available tools, or model Codex-native read through the real Codex app-server native tool protocol.
Keep mock provider-plan diagnostics separate from runtime transcript/tool-call evidence.
Reopen/escalate as a product bug only if a live/native Codex run shows approved reads fail outside this mock contract.

Evidence From Re-audit

Codex dynamic tools intentionally exclude read, write, edit, apply_patch, exec, process, and update_plan.
The mock provider can still emit read based on prompt text.
Runtime parity previously preferred /debug/requests provider-plan snapshots over transcript-derived tool events.

Superseded Original Report

Parent: #80171 Detected by: #80172 Phase 1 runtime-parity harness work

Why this matters

This is exactly the class of regression the runtime-parity harness is meant to make visible before Codex becomes the default OpenAI runtime.

Reproduction

From the Phase 1 branch:

OPENCLAW_BUILD_PRIVATE_QA=1 \
OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
OPENCLAW_QA_SUITE_PROGRESS=1 \
pnpm openclaw qa suite \
  --provider-mode mock-openai \
  --scenario approval-turn-tool-followthrough \
  --concurrency 1 \
  --runtime-pair pi,codex \
  --output-dir .artifacts/qa-e2e/runtime-parity-proof-approval-remap5

OPENCLAW_BUILD_PRIVATE_QA=1 \
OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
pnpm openclaw qa parity-report \
  --repo-root . \
  --runtime-axis \
  --summary .artifacts/qa-e2e/runtime-parity-proof-approval-remap5/qa-suite-summary.json \
  --output-dir .artifacts/qa-e2e/runtime-parity-proof-approval-remap5-report

Observed

Scenario: approval-turn-tool-followthrough
Drift class: tool-result-shape
Pi cell: plans read, args hash 462521a229a053d20c4c8121cecce65e885c7d2b0f94347c1d4922445a701263, receives the QA_KICKOFF_TASK.md mission text, and passes.
Codex cell: plans the same read with the same args hash, but the provider-side result hash differs and the final assistant text is Protocol note: I reviewed the requested material. Evidence snippet: unsupported call: read.
Codex then times out downstream, but the actionable difference is the tool result shape.

Expected

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [QA harness] Mock approval followthrough emits undeclared read for Codex app-server lane [2 pull requests, 5 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #80238: test(qa-lab): add Codex vs Pi runtime parity harness

Description (problem / solution / changelog)

Why

What Changed

Real-Behavior Proof

Verification

Non-Goals

Changed files

PR #80323: [qa-lab] Complete Codex vs Pi runtime parity harness phases 2-5

Description (problem / solution / changelog)

Summary

Why

Verification

Real Behavior Proof

Known Broad/Latest Blockers

Linked Issues

Changed files

Code Example

Correction TLDR

Correct Fix

Evidence From Re-audit

Superseded Original Report

Why this matters

Reproduction

Observed

Expected

Links

Still need to ship something?

RELATED_DISCOVERY

TRENDING