openclaw - 💡(How to fix) Fix Runtime parity needs 20-turn and 100-turn soak coverage [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80338Fetched 2026-05-11 03:15:58
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1
RAW_BUFFERClick to expand / collapse

Part of #80171 and follow-up to #80323.

The Phase 1-5 harness can compare Pi and Codex cells, but current runtime-pair depth is mostly short-turn coverage.

Audit snapshot from #80323:

  • Across the 97 parsed QA markdown scenarios, the maximum runAgentPrompt count in a scenario is 4.
  • Most scenarios are 0-1 direct agent prompts; only 10 scenarios have 2+ direct agent prompts.
  • Curated JSONL replay fixtures currently cover 2, 2, and 3 user turns respectively.
  • There is no 20-turn or 100-turn Codex-vs-Pi runtime soak lane.

Why this matters:

The known Codex-vs-Pi risk class is not only single-turn shape drift. First-hour usage can include repeated follow-ups, compaction pressure, tool-result replay, memory recall, subagent delivery, plugin availability after install/update, and auth/profile reuse. A short lane will catch immediate tool-call shape drift but can miss delayed runtime drift across accumulated session state.

Acceptance sketch:

  • Add a deterministic 20-turn runtime-pair scenario that mixes read-only tools, one safe write/edit, follow-up references, memory recall, and at least one subagent/tool-result continuation.
  • Add a slower 100-turn soak tier that can run outside the default PR gate but is available for release/testbox/scheduled validation.
  • Capture first drift turn and per-turn tool/usage deltas in the runtime parity report.
  • Keep the 100-turn lane optional or scheduled so release PR latency stays bounded.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Runtime parity needs 20-turn and 100-turn soak coverage [1 comments, 2 participants]