openclaw - ✅(Solved) Fix qa-lab: run the parity gate end-to-end in CI and publish artifacts [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64878Fetched 2026-04-12 13:26:25
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
cross-referenced ×3

Fix Action

Fix

Add a GitHub Actions workflow at .github/workflows/parity-gate.yml that runs on PRs touching extensions/qa-lab/**, qa/scenarios/**, or src/agents/**. The workflow should:

  1. Start the qa-lab mock server.
  2. Run openclaw qa suite --parity-pack agentic --model openai/gpt-5.4 --output-dir .artifacts/qa-e2e/gpt54.
  3. Run openclaw qa suite --parity-pack agentic --model anthropic/claude-opus-4-6 --output-dir .artifacts/qa-e2e/opus46.
  4. Run openclaw qa parity-report --candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json --baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json --output-dir .artifacts/qa-e2e/parity.
  5. Upload qa-suite-summary.json × 2 and qa-agentic-parity-{report.md,summary.json} as build artifacts.
  6. Fail the build on pass: false from the parity-report exit code.

This turns criterion 5 from "unverified" to "continuously verified against the mock baseline." Note: a mock-only workflow is necessary but not sufficient — see the companion issue on making the mock differentiate providers. This one is about having any proof at all.

PR fix notes

PR #64909: qa-lab: (GPT 5.4 Parity vs. Opus Agentic) stage mock auth profiles so the parity gate runs without real credentials

Description (problem / solution / changelog)

Summary

Makes the mock structural parity gate actually runnable without real provider credentials.

After the current rebase, this PR is intentionally narrow. The legacy runtime seam work that was in earlier drafts has been absorbed elsewhere; what this branch still owns is the auth-staging path that the mock parity gate needs in order to reach the mock providers at all.

Current scope:

  • stage placeholder auth profiles per provider / agent dir in mock-openai mode
  • apply the staged auth profile config once per provider after the profile files exist
  • default the gateway-child provider mode through the shared helper so omitted providerMode still stages mock auth correctly

Part of #64227.

Why this still matters

Without this branch, the mock parity lane can still fail before the mock server ever sees a request because the auth resolver refuses to route through the mock provider base URL until a matching auth profile exists.

This branch makes the structural gate self-contained:

  • openai and anthropic placeholder API-key profiles are staged for the agents the suite uses
  • the child config is patched to reference those profiles
  • callers that omit provider mode still get the mock-openai auth staging path through the default-provider-mode helper

The mock server does not validate the key contents. The placeholder is just enough to satisfy the auth resolver so the parity harness can actually hit the local mock server.

What changed

extensions/qa-lab/src/gateway-child.ts

  • stages placeholder mock auth profiles for openai and anthropic
  • applies the auth-profile config once per provider after staging
  • resolves the effective provider mode through the shared helper before deciding whether mock auth staging is needed

extensions/qa-lab/src/gateway-child.test.ts

  • covers the staged mock-auth profile path
  • covers the provider / agent override path
  • covers the default-provider-mode path so omitted providerMode still stages mock auth correctly

Validation

Current head validation after rebase:

CI=1 pnpm exec vitest run extensions/qa-lab/src/gateway-child.test.ts

Result: 26/26 passing.

Program-level verification update:

  • this branch is part of what made the offline structural parity rerun possible without real keys
  • on the integrated patched stack, with this branch plus the qa-lab mock/provider follow-ups, the full offline 10-scenario parity rerun passed end-to-end

Non-goals

  • no runtime strict-agentic behavior changes
  • no scenario YAML changes
  • no parity-report scoring changes
  • no live-frontier credential path changes

Changed files

  • extensions/qa-lab/src/gateway-child.test.ts (modified, +87/-0)
  • extensions/qa-lab/src/gateway-child.ts (modified, +87/-2)

Code Example

find . -path '*/.artifacts/qa-e2e/qa-agentic-parity*'   # returns nothing
rg -l 'parity-pack|qa-agentic-parity' .github/workflows/  # returns nothing
RAW_BUFFERClick to expand / collapse

Problem

The parity gate has never been run end-to-end. There are no .artifacts/qa-e2e/qa-agentic-parity-* files anywhere in the repo, and no CI workflow triggers openclaw qa suite --parity-pack agentic. This means the entire "parity gate shows GPT-5.4 matches or beats Opus 4.6" claim (completion criterion 5 in #64227) rests on code that has never produced a real measurement.

Verification that this is a real gap:

find . -path '*/.artifacts/qa-e2e/qa-agentic-parity*'   # returns nothing
rg -l 'parity-pack|qa-agentic-parity' .github/workflows/  # returns nothing

Fix

Add a GitHub Actions workflow at .github/workflows/parity-gate.yml that runs on PRs touching extensions/qa-lab/**, qa/scenarios/**, or src/agents/**. The workflow should:

  1. Start the qa-lab mock server.
  2. Run openclaw qa suite --parity-pack agentic --model openai/gpt-5.4 --output-dir .artifacts/qa-e2e/gpt54.
  3. Run openclaw qa suite --parity-pack agentic --model anthropic/claude-opus-4-6 --output-dir .artifacts/qa-e2e/opus46.
  4. Run openclaw qa parity-report --candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json --baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json --output-dir .artifacts/qa-e2e/parity.
  5. Upload qa-suite-summary.json × 2 and qa-agentic-parity-{report.md,summary.json} as build artifacts.
  6. Fail the build on pass: false from the parity-report exit code.

This turns criterion 5 from "unverified" to "continuously verified against the mock baseline." Note: a mock-only workflow is necessary but not sufficient — see the companion issue on making the mock differentiate providers. This one is about having any proof at all.

Acceptance

  • Workflow file exists and runs on PR check events for the relevant paths.
  • At least one successful CI run produces artifacts that can be downloaded from the Actions tab.
  • Workflow fails when the gate fails (verified with a deliberately broken scenario on a throwaway branch).

Part of

#64227 (wave 3 — parity proof). Companion to the mock-dispatch issue.

extent analysis

TL;DR

Create a GitHub Actions workflow to run the parity gate end-to-end, ensuring continuous verification of the "parity gate shows GPT-5.4 matches or beats Opus 4.6" claim.

Guidance

  • Add a new GitHub Actions workflow at .github/workflows/parity-gate.yml to run on PRs touching specific directories (extensions/qa-lab/**, qa/scenarios/**, or src/agents/**).
  • Implement the 6 steps outlined in the proposed fix, including running the qa suite with different models and generating a parity report.
  • Verify the workflow by checking for the existence of the workflow file, successful CI runs producing artifacts, and the workflow failing when the gate fails.
  • Ensure the workflow uploads required artifacts, such as qa-suite-summary.json and qa-agentic-parity-{report.md,summary.json}, for further analysis.

Example

name: Parity Gate
on:
  pull_request:
    paths:
      - 'extensions/qa-lab/**'
      - 'qa/scenarios/**'
      - 'src/agents/**'
jobs:
  parity-gate:
    runs-on: ubuntu-latest
    steps:
      # Start the qa-lab mock server
      # Run the qa suite with different models
      # Generate the parity report
      # Upload artifacts

Notes

This solution focuses on creating a workflow to run the parity gate end-to-end, providing continuous verification of the claim. However, it is noted that a mock-only workflow is necessary but not sufficient, and a companion issue addresses making the mock differentiate providers.

Recommendation

Apply the proposed workaround by creating the GitHub Actions workflow, as it provides a clear and concrete solution to the problem, ensuring continuous verification of the claim.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING