Fix Action

Fix / Workaround

Live confidence proof lanes: skipped by dispatch; classified as environment-blocked in the confidence report.


- **P4 for the old broad tool-defaults claims**: missing duplicate OpenClaw dynamic `read/write/edit/apply_patch/exec/update_plan` exposure is intentional Codex-native ownership, not a Codex product bug.
- **P1 proof gap for live Codex-native behavior**: approval/read/write/compaction rows still need native/live proof before being used as product evidence.
- **P3 proof gap for `soak-100`**: optional long-run coverage needs scheduled/Testbox artifacts, but it is not part of the default maintainer gate.

Code Example

QA Runtime Confidence Proof
run: 25719383976
repo: electricsheephq/openclaw-local-test
target_ref: codex-vs-pi-runtime-parity-tools
expected_sha: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
run_soak: false
run_live: false

---

Static and targeted QA unit proof: success
- pnpm check:test-types: success
- pnpm lint --threads=8: success
- targeted QA-lab/Codex dynamic-tools tests: success

Mock confidence proof bundle: success
- tool-defaults direct: success
- openclaw-dynamic-tools direct: success
- tool-defaults searchable: success
- first-hour-20 direct: success
- first-hour-20 token report: success
- fault-injection mock: success
- expanded JSONL replay: success
- confidence negative controls: success
- strict confidence report: success

Live confidence proof lanes: skipped by dispatch; classified as environment-blocked in the confidence report.

---

/Volumes/LEXAR/Codex/qa-runtime-confidence-artifacts-25719383976/qa-runtime-confidence-mock-3336dec6419c9cc9a87dc7cfa6f48118ca2d838e/

---

{
  "tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 },
  "openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 },
  "tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 },
  "first-hour-20-direct": { "total": 18, "passed": 15, "skipped": 3, "failed": 0 },
  "fault-injection-mock": { "total": 5, "passed": 3, "skipped": 2, "failed": 0 },
  "jsonl-expanded": { "curatedTranscripts": 7, "turnsCompared": 15, "driftedTurns": 0 },
  "confidence-self-test": { "pass": true, "detectedCanaries": "7/7" },
  "confidence-report": {
    "pass": true,
    "zeroUnknowns": true,
    "lanes": 12,
    "passed": 8,
    "blocked": 4,
    "unknown": 0,
    "failed": 0
  }
}

TLDR

PR #80323 has a beta.5 confidence proof run with zero unknowns in the defined mock/static matrix.

OpenClaw baseline: v2026.5.10-beta.5
PR head validated by workflow: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Strict confidence report: pass=true, zeroUnknowns=true
Current product-bug verdict: no confirmed Codex runner product bug from the mock proof lanes
Remaining proof gaps: live/OAuth Codex-native lanes, live token efficiency, and scheduled/Testbox soak-100

Why This Issue Exists

This is the maintainer-facing confidence tracker for PR #80323. It ties the beta.5 proof artifacts back to the original RFC (#80171), the corrected tool-defaults harness issue (#80319), and the remaining live/Testbox proof trackers (#80397, #80433).

The goal is not to claim every possible OpenClaw behavior is proven. The goal is stricter: every lane in the defined confidence manifest must either pass or have a classified, artifact-backed verdict. This run achieved that for the mock/static matrix.

Evidence

Remote workflow:

QA Runtime Confidence Proof
run: 25719383976
repo: electricsheephq/openclaw-local-test
target_ref: codex-vs-pi-runtime-parity-tools
expected_sha: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
run_soak: false
run_live: false

Remote job results:

Static and targeted QA unit proof: success
- pnpm check:test-types: success
- pnpm lint --threads=8: success
- targeted QA-lab/Codex dynamic-tools tests: success

Mock confidence proof bundle: success
- tool-defaults direct: success
- openclaw-dynamic-tools direct: success
- tool-defaults searchable: success
- first-hour-20 direct: success
- first-hour-20 token report: success
- fault-injection mock: success
- expanded JSONL replay: success
- confidence negative controls: success
- strict confidence report: success

Live confidence proof lanes: skipped by dispatch; classified as environment-blocked in the confidence report.

Downloaded artifact root used for inspection:

/Volumes/LEXAR/Codex/qa-runtime-confidence-artifacts-25719383976/qa-runtime-confidence-mock-3336dec6419c9cc9a87dc7cfa6f48118ca2d838e/

Machine-Readable Results

{
  "tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 },
  "openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 },
  "tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 },
  "first-hour-20-direct": { "total": 18, "passed": 15, "skipped": 3, "failed": 0 },
  "fault-injection-mock": { "total": 5, "passed": 3, "skipped": 2, "failed": 0 },
  "jsonl-expanded": { "curatedTranscripts": 7, "turnsCompared": 15, "driftedTurns": 0 },
  "confidence-self-test": { "pass": true, "detectedCanaries": "7/7" },
  "confidence-report": {
    "pass": true,
    "zeroUnknowns": true,
    "lanes": 12,
    "passed": 8,
    "blocked": 4,
    "unknown": 0,
    "failed": 0
  }
}

Classification

Product Impact If OpenClaw Moved Fully To Codex Today

P4 for the old broad tool-defaults claims: missing duplicate OpenClaw dynamic read/write/edit/apply_patch/exec/update_plan exposure is intentional Codex-native ownership, not a Codex product bug.
P1 proof gap for live Codex-native behavior: approval/read/write/compaction rows still need native/live proof before being used as product evidence.
P3 proof gap for soak-100: optional long-run coverage needs scheduled/Testbox artifacts, but it is not part of the default maintainer gate.

QA Impact

P0 resolved for deterministic mock CI gate: tool-defaults direct, openclaw-dynamic-tools direct, and first-hour-20 direct have 0 hard failures.
P1 still open for live proof: mock token efficiency is labeled mock-estimate; real live-usage is tracked separately.
P2 searchable/deferred mock limitation: searchable rows are report-only until the mock provider can model deferred Codex tool discovery honestly.

Important Boundaries

Mock-only failures are not Codex runner product bugs unless reproduced through native/live Codex behavior or source-level proof independent of the mock provider.
Codex-native workspace tools remain native-owned and must not be duplicated as OpenClaw dynamic tools in production.
OpenClaw integration tools are still tested through the dynamic openclaw bridge and passed the direct mock lane.
Token efficiency in this proof is mock-estimate, not live usage.

Linked Work

RFC/tracking: #80171
PR: #80323
Tool-defaults correction: #80319
Live token/Testbox proof: #80397
Scheduled/Testbox soak: #80433
Token live zero-usage guard: #80411
Stale first-hour-20 tracker updated by this proof: #80434

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [QA-lab] Codex runtime parity beta.5 confidence proof for PR #80323

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Code Example

TLDR

Why This Issue Exists

Evidence

Machine-Readable Results

Classification

Product Impact If OpenClaw Moved Fully To Codex Today

QA Impact

Important Boundaries

Linked Work

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [QA-lab] Codex runtime parity beta.5 confidence proof for PR #80323

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Code Example

TLDR

Why This Issue Exists

Evidence

Machine-Readable Results

Classification

Product Impact If OpenClaw Moved Fully To Codex Today

QA Impact

Important Boundaries

Linked Work

Still need to ship something?

RELATED_DISCOVERY

TRENDING