openclaw - 💡(How to fix) Fix [Codex×Pi parity] 100-turn soak surfaces structural transcript drift [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80395Fetched 2026-05-11 03:15:11
View on GitHub
Comments
2
Participants
2
Timeline
5
Reactions
2
Timeline (top)
cross-referenced ×3commented ×2

Root Cause

The parity harness expansion now exposes an optional soak-100 lane for long-run first-hour-style behavior. A local plugin-backed mock run completed successfully and showed structural transcript drift between the Pi and Codex runtime paths. This should be tracked separately from the required first-hour-20 maintainer gate because the 100-turn lane is intentionally optional/Testbox/scheduled, but it is still a real parity signal.

Code Example

OPENCLAW_QA_SUITE_PROGRESS=1 OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
  node scripts/run-openclaw-parity.mjs first-hour \
    --openclaw-root /Volumes/LEXAR/repos/openclaw-1 \
    --provider-mode mock-openai \
    --concurrency 1 \
    --runtime-suite soak-100 \
    --output-dir artifacts/closure-soak-100-mock

---

OPENCLAW_QA_SUITE_PROGRESS=1 OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
  node scripts/run-openclaw-parity.mjs soak-100 \
    --openclaw-root /Volumes/LEXAR/repos/openclaw-1 \
    --provider-mode mock-openai \
    --concurrency 1
RAW_BUFFERClick to expand / collapse

Parent: #80171 Related PR: #80323 Related plugin wrapper issue: #80365

Why this issue exists

The parity harness expansion now exposes an optional soak-100 lane for long-run first-hour-style behavior. A local plugin-backed mock run completed successfully and showed structural transcript drift between the Pi and Codex runtime paths. This should be tracked separately from the required first-hour-20 maintainer gate because the 100-turn lane is intentionally optional/Testbox/scheduled, but it is still a real parity signal.

Command run

OPENCLAW_QA_SUITE_PROGRESS=1 OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
  node scripts/run-openclaw-parity.mjs first-hour \
    --openclaw-root /Volumes/LEXAR/repos/openclaw-1 \
    --provider-mode mock-openai \
    --concurrency 1 \
    --runtime-suite soak-100 \
    --output-dir artifacts/closure-soak-100-mock

After the wrapper profile expansion, the equivalent first-class plugin command is:

OPENCLAW_QA_SUITE_PROGRESS=1 OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 \
  node scripts/run-openclaw-parity.mjs soak-100 \
    --openclaw-root /Volumes/LEXAR/repos/openclaw-1 \
    --provider-mode mock-openai \
    --concurrency 1

Observed result

The upstream suite command completed successfully and generated runtime parity/token-efficiency artifacts. The report failed because the single soak row drifted structurally:

  • Scenario: runtime-soak-100-turn
  • Drift: structural
  • Details: transcript/final-text structure differs (204 lines vs 201)
  • Pi: pass, 0 tool calls, 8800 mock-estimate tokens
  • Codex: pass, 0 tool calls, 8800 mock-estimate tokens

Artifacts from the local closure run:

  • /Volumes/LEXAR/repos/openclaw-1/.artifacts/openclaw-parity-harness/closure-soak-100-mock/runtime-suite/qa-suite-summary.json
  • /Volumes/LEXAR/repos/openclaw-1/.artifacts/openclaw-parity-harness/closure-soak-100-mock/runtime-report/qa-runtime-parity-report.md
  • /Volumes/LEXAR/repos/openclaw-1/.artifacts/openclaw-parity-harness/closure-soak-100-mock/runtime-report/qa-runtime-parity-summary.json
  • /Volumes/LEXAR/repos/openclaw-1/.artifacts/openclaw-parity-harness/closure-soak-100-mock/runtime-report/qa-runtime-token-efficiency-report.md

Expected follow-up

Decide whether the extra/missing transcript lines in the 100-turn mock run are acceptable text scaffolding or a real structural runtime divergence. If this is only synthetic fixture scaffolding, tighten the soak fixture/classifier so the optional lane does not create noisy long-run failures. If it reflects runtime behavior, keep this issue as the soak-specific parity bug until the runtimes converge.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Codex×Pi parity] 100-turn soak surfaces structural transcript drift [2 comments, 2 participants]