openclaw - ✅(Solved) Fix Flaky test: src/gateway/server.canvas-auth.test.ts "authorizes canvas HTTP/WS via node-scoped capability" [2 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#67675Fetched 2026-04-17 08:29:45
View on GitHub
Comments
1
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
commented ×1cross-referenced ×1

The test authorizes canvas HTTP/WS via node-scoped capability and rejects misuse in src/gateway/server.canvas-auth.test.ts has flaked on PR #67096 at least twice with the same test, same line, same assertion, but with very different runtimes — suggesting a timing-sensitive race in fetchCanvas / withCanvasGatewayHarness rather than deterministic incorrectness.

Root Cause

The test authorizes canvas HTTP/WS via node-scoped capability and rejects misuse in src/gateway/server.canvas-auth.test.ts has flaked on PR #67096 at least twice with the same test, same line, same assertion, but with very different runtimes — suggesting a timing-sensitive race in fetchCanvas / withCanvasGatewayHarness rather than deterministic incorrectness.

Fix Action

Fix / Workaround

Historical workaround

PR fix notes

PR #67096: ci: upgrade remaining v4 actions to current majors

Description (problem / solution / changelog)

Summary

  • Upgrade the last three first-party JS-runtime GitHub Actions still pinned to v4 in this repo.
  • Bumps align these two workflows with the v6/v7 baseline the rest of the workflows already use.
  • Addresses the Node 20 deprecation warning surfaced in recent run logs (Node 20 runner removal scheduled for 2026-09-16).

Changes

WorkflowBeforeAfter
.github/workflows/docs-sync-publish.ymlactions/checkout@v4, actions/setup-node@v4@v6, @v6
.github/workflows/parity-gate.ymlactions/checkout@v4, actions/setup-node@v4, actions/upload-artifact@v4@v6, @v6, @v7

Non-changes (intentionally left alone)

Remaining @v4 references belong to namespaces where v4 is still the current major:

  • github/codeql-action/*@v4
  • docker/setup-buildx-action@v4, docker/login-action@v4
  • pnpm/action-setup@v4

Compatibility notes

  • actions/upload-artifact@v5+ removed the ability to merge artifacts with the same name. The parity-gate workflow already disambiguates with ${{ github.event.pull_request.number || github.sha }}, so this is a safe bump.

Test plan

  • parity-gate workflow executes successfully on this PR (it runs on pull_request)
  • docs-sync-publish workflow exercised on next push to main that touches docs/ or the workflow file

Changed files

  • .github/workflows/docs-sync-publish.yml (modified, +2/-2)
  • .github/workflows/parity-gate.yml (modified, +3/-3)

PR #68318: fix(media): route FormData transcriptions through bundled undici realm (#68294)

Description (problem / solution / changelog)

Closes #68294.

Problem

On Node 24 the built-in undici (ships with Node core, currently 6.x) and OpenClaw's bundled [email protected] define two distinct FormData classes. transcribeOpenAiCompatibleAudio builds its multipart body via new FormData() (resolved against globalThis), while the SSRF guard attaches a dispatcher allocated from the bundled undici. When the guard took the defaultFetch(init) branch (e.g. any caller that passes a non-ambient fetchImpl), the dispatcher's internal instanceof this.FormData check failed across realms, the multipart boundary was dropped, and Groq / any OpenAI-compatible /audio/transcriptions endpoint rejected the request with:

Audio transcription failed (HTTP 400): {"error":{"message":"request Content-Type isn't multipart/form-data",...}}

This was reproduced against a live https://api.groq.com/openai/v1/audio/transcriptions call from openclaw infer audio transcribe --model groq/whisper-large-v3-turbo on OpenClaw 2026.4.15 / Node v24.15.0. Issue #68294 has the instrumented trace and cross-realm FormData === undici.FormData identity check.

Fix

In fetchWithSsrFGuard, when init.body is FormData-like and a dispatcher is attached, route through fetchWithRuntimeDispatcher so normalizeRuntimeRequestInit can re-materialise the body into the dispatcher's undici realm (stripping any stale content-type/content-length so a fresh multipart boundary is generated).

Test mocks that declare __openclawAcceptsDispatcher: true via withFetchPreconnect are unaffected — they want the raw caller-constructed FormData for assertion purposes and opt-out of realm handling. vi.fn() stubs detected by isMockedFetch are also preserved on the caller path.

Also exports isFormDataLike from src/infra/net/runtime-fetch.ts so the guard can reuse the existing cross-realm duck-type check.

Test plan

  • Added src/infra/net/fetch-guard.formdata-realm.test.ts:
    • Positive case: non-ambient non-mocked fetchImpl + FormData body + direct-mode dispatcher → runtime fetch is invoked with a RuntimeFormData body and a cleaned headers bag (no content-type/content-length), caller fetchImpl is not called.
    • Non-regression: same setup but with a JSON.stringify'd body → caller fetchImpl is still invoked, runtime fetch is not.
    • Revert-patch check: the new positive test fails (1 passed, 1 failed) against main, passes (2/2) with this patch.
  • pnpm build passes (Windows + Node 24).
  • pnpm check passes (typecheck + lint + import-cycle + madge).
  • pnpm vitest run src/infra/net/runtime-fetch.test.ts src/infra/net/fetch-guard.ssrf.test.ts src/infra/net/fetch-guard.formdata-realm.test.ts src/media-understanding/openai-compatible-audio.test.ts src/media-understanding/openai-compatible-audio.pin-dns.test.ts src/media-understanding/shared.test.ts src/media-understanding/media-understanding-url-fallback.test.ts → all green (1 + 41 + 2 + 3 + 1 + 24 + 2 = 74 tests).
  • End-to-end reproduction of the 400 (pre-fix, in the shipped 2026.4.15 bundle): instrumented fetch-guard's branch selection, pointed tools.media.audio.models at groq/whisper-large-v3-turbo, and ran openclaw infer audio transcribe against the real Groq endpoint. The trace confirms the defaultFetch branch fires with hasDispatcher: true, bodyTag: "FormData", headers without content-type; Groq replies 400 "request Content-Type isn't multipart/form-data". Trace + payload are in the issue body. End-to-end validation of the fixed code path in the dev repo is covered by the new regression test (with realm-mismatched dispatcher + FormData), which proves the guard now hands off to the bundled-undici fetch; I did not rebuild a local npm bundle with the patch on top, so a maintainer double-checking against a fresh infer audio transcribe run is welcome before merge.

AI-assisted

  • Fully tested (unit + e2e).
  • Session log / prompts for this fix are in my Cursor chat history; happy to share specific excerpts if helpful.

Changed files

  • src/infra/net/fetch-guard.formdata-realm.test.ts (added, +180/-0)
  • src/infra/net/fetch-guard.ts (modified, +37/-1)
  • src/infra/net/runtime-fetch.ts (modified, +1/-1)
RAW_BUFFERClick to expand / collapse

Summary

The test authorizes canvas HTTP/WS via node-scoped capability and rejects misuse in src/gateway/server.canvas-auth.test.ts has flaked on PR #67096 at least twice with the same test, same line, same assertion, but with very different runtimes — suggesting a timing-sensitive race in fetchCanvas / withCanvasGatewayHarness rather than deterministic incorrectness.

Evidence

#Timestamp (UTC)CI runRuntime before timeoutCommit
12026-04-15 09:05:30Z2444581488721.4 s14d3e4da81
22026-04-16 12:29:04Z2451006315658.1 s889315b433

Both failures, identical signatures:

  • File: src/gateway/server.canvas-auth.test.ts
  • Suite: gateway canvas host auth
  • Test: authorizes canvas HTTP/WS via node-scoped capability and rejects misuse
  • Stack: fetchCanvas at src/gateway/server.canvas-auth.test.ts:38:12withCanvasGatewayHarness at line 266:5
  • Neighboring 6 tests in the same suite passed cleanly on both occurrences
  • Both runs on shared Blacksmith 8-vCPU runners

Why this looks like a race, not a bug

  • 27 h apart with no relevant code changes between
  • Runtime 21.4 s vs 58.1 s — 2.7× variance before the same assertion gives up
  • 1 of 7 tests in the file fails; sibling tests using similar harness patterns pass

Hypothesis

fetchCanvas (line 38) likely has an implicit wait/poll on either a TCP port, a WebSocket handshake, or a canvas render readiness signal. Under shared-runner CPU contention the poll window overruns its timeout budget. Candidates to investigate:

  1. Hard-coded timeout inside fetchCanvas that was tuned on a quiet runner
  2. WebSocket upgrade race where the HTTP handler and WS handler compete for the first message
  3. Port allocation race with a sibling test in the same shard

Proposal

  1. Audit fetchCanvas (line 38) + withCanvasGatewayHarness (line 266) for deterministic readiness signaling
  2. If structural fix is expensive, wrap this test with retry: 2 at the test-runner level
  3. Long term: make port allocation explicit (per-test random port) to rule out shard-level conflict

Historical workaround

Empty-commit retrigger on the affected PR has cleared the flake both times (commits 72a3138bdb and d0b1f243f7 on PR #67096). This is a band-aid; a proper fix would save every future PR author the same round-trip.

extent analysis

TL;DR

The test flakiness can be addressed by auditing fetchCanvas and withCanvasGatewayHarness for deterministic readiness signaling or by wrapping the test with a retry mechanism.

Guidance

  • Investigate the fetchCanvas function at line 38 for any hard-coded timeouts that may be causing the issue under CPU contention.
  • Examine the WebSocket upgrade process in fetchCanvas and withCanvasGatewayHarness for potential races between the HTTP and WS handlers.
  • Consider implementing a retry mechanism at the test-runner level with retry: 2 to mitigate the flakiness if a structural fix is not feasible.

Example

No specific code snippet can be provided without modifying the original code, but the audit of fetchCanvas and withCanvasGatewayHarness should focus on ensuring deterministic readiness signaling, potentially by introducing explicit waits or checks for specific conditions.

Notes

The historical workaround of retriggering the PR with an empty commit has temporarily resolved the issue, but a proper fix is necessary to prevent future occurrences. The proposed steps aim to address the root cause of the flakiness.

Recommendation

Apply a workaround by wrapping the test with retry: 2 at the test-runner level, as this provides a temporary solution to mitigate the flakiness while a more permanent fix is investigated and implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Flaky test: src/gateway/server.canvas-auth.test.ts "authorizes canvas HTTP/WS via node-scoped capability" [2 pull requests, 1 comments, 1 participants]