openclaw - ✅(Solved) Fix Flaky test: src/gateway/server.canvas-auth.test.ts "authorizes canvas HTTP/WS via node-scoped capability" [2 pull requests, 1 comments, 1 participants]

openclaw2026-04-16 13:00:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#67675•Fetched 2026-04-17 08:29:45

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jeffchen1981-fu

Participants

jeffchen1981-fu

Timeline (top)

commented ×1cross-referenced ×1

The test authorizes canvas HTTP/WS via node-scoped capability and rejects misuse in src/gateway/server.canvas-auth.test.ts has flaked on PR #67096 at least twice with the same test, same line, same assertion, but with very different runtimes — suggesting a timing-sensitive race in fetchCanvas / withCanvasGatewayHarness rather than deterministic incorrectness.

Root Cause

Fix Action

Fix / Workaround

Historical workaround

PR fix notes

PR #67096: ci: upgrade remaining v4 actions to current majors

Repository: openclaw/openclaw
Author: jeffchen1981-fu
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/67096

Description (problem / solution / changelog)

Summary

Upgrade the last three first-party JS-runtime GitHub Actions still pinned to v4 in this repo.
Bumps align these two workflows with the v6/v7 baseline the rest of the workflows already use.
Addresses the Node 20 deprecation warning surfaced in recent run logs (Node 20 runner removal scheduled for 2026-09-16).

Changes

Workflow	Before	After
.github/workflows/docs-sync-publish.yml	actions/checkout@v4, actions/setup-node@v4	@v6, @v6
.github/workflows/parity-gate.yml	actions/checkout@v4, actions/setup-node@v4, actions/upload-artifact@v4	@v6, @v6, @v7

Non-changes (intentionally left alone)

Remaining @v4 references belong to namespaces where v4 is still the current major:

github/codeql-action/*@v4
docker/setup-buildx-action@v4, docker/login-action@v4
pnpm/action-setup@v4

Compatibility notes

actions/upload-artifact@v5+ removed the ability to merge artifacts with the same name. The parity-gate workflow already disambiguates with ${{ github.event.pull_request.number || github.sha }}, so this is a safe bump.

Test plan

parity-gate workflow executes successfully on this PR (it runs on pull_request)
docs-sync-publish workflow exercised on next push to main that touches docs/ or the workflow file

Changed files

.github/workflows/docs-sync-publish.yml (modified, +2/-2)
.github/workflows/parity-gate.yml (modified, +3/-3)

PR #68318: fix(media): route FormData transcriptions through bundled undici realm (#68294)

Repository: openclaw/openclaw
Author: Jcxu97
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/68318

Description (problem / solution / changelog)

Closes #68294.

Problem

On Node 24 the built-in undici (ships with Node core, currently 6.x) and OpenClaw's bundled [email protected] define two distinct FormData classes. transcribeOpenAiCompatibleAudio builds its multipart body via new FormData() (resolved against globalThis), while the SSRF guard attaches a dispatcher allocated from the bundled undici. When the guard took the defaultFetch(init) branch (e.g. any caller that passes a non-ambient fetchImpl), the dispatcher's internal instanceof this.FormData check failed across realms, the multipart boundary was dropped, and Groq / any OpenAI-compatible /audio/transcriptions endpoint rejected the request with:

Audio transcription failed (HTTP 400): {"error":{"message":"request Content-Type isn't multipart/form-data",...}}

This was reproduced against a live https://api.groq.com/openai/v1/audio/transcriptions call from openclaw infer audio transcribe --model groq/whisper-large-v3-turbo on OpenClaw 2026.4.15 / Node v24.15.0. Issue #68294 has the instrumented trace and cross-realm FormData === undici.FormData identity check.

Fix

In fetchWithSsrFGuard, when init.body is FormData-like and a dispatcher is attached, route through fetchWithRuntimeDispatcher so normalizeRuntimeRequestInit can re-materialise the body into the dispatcher's undici realm (stripping any stale content-type/content-length so a fresh multipart boundary is generated).

Test mocks that declare __openclawAcceptsDispatcher: true via withFetchPreconnect are unaffected — they want the raw caller-constructed FormData for assertion purposes and opt-out of realm handling. vi.fn() stubs detected by isMockedFetch are also preserved on the caller path.

Also exports isFormDataLike from src/infra/net/runtime-fetch.ts so the guard can reuse the existing cross-realm duck-type check.

Test plan

Added src/infra/net/fetch-guard.formdata-realm.test.ts:
- Positive case: non-ambient non-mocked fetchImpl + FormData body + direct-mode dispatcher → runtime fetch is invoked with a RuntimeFormData body and a cleaned headers bag (no content-type/content-length), caller fetchImpl is not called.
- Non-regression: same setup but with a JSON.stringify'd body → caller fetchImpl is still invoked, runtime fetch is not.
- Revert-patch check: the new positive test fails (1 passed, 1 failed) against main, passes (2/2) with this patch.
pnpm build passes (Windows + Node 24).
pnpm check passes (typecheck + lint + import-cycle + madge).
pnpm vitest run src/infra/net/runtime-fetch.test.ts src/infra/net/fetch-guard.ssrf.test.ts src/infra/net/fetch-guard.formdata-realm.test.ts src/media-understanding/openai-compatible-audio.test.ts src/media-understanding/openai-compatible-audio.pin-dns.test.ts src/media-understanding/shared.test.ts src/media-understanding/media-understanding-url-fallback.test.ts → all green (1 + 41 + 2 + 3 + 1 + 24 + 2 = 74 tests).
End-to-end reproduction of the 400 (pre-fix, in the shipped 2026.4.15 bundle): instrumented fetch-guard's branch selection, pointed tools.media.audio.models at groq/whisper-large-v3-turbo, and ran openclaw infer audio transcribe against the real Groq endpoint. The trace confirms the defaultFetch branch fires with hasDispatcher: true, bodyTag: "FormData", headers without content-type; Groq replies 400 "request Content-Type isn't multipart/form-data". Trace + payload are in the issue body. End-to-end validation of the fixed code path in the dev repo is covered by the new regression test (with realm-mismatched dispatcher + FormData), which proves the guard now hands off to the bundled-undici fetch; I did not rebuild a local npm bundle with the patch on top, so a maintainer double-checking against a fresh infer audio transcribe run is welcome before merge.

AI-assisted

Fully tested (unit + e2e).
Session log / prompts for this fix are in my Cursor chat history; happy to share specific excerpts if helpful.

Changed files

src/infra/net/fetch-guard.formdata-realm.test.ts (added, +180/-0)
src/infra/net/fetch-guard.ts (modified, +37/-1)
src/infra/net/runtime-fetch.ts (modified, +1/-1)

RAW_BUFFERClick to expand / collapse

Summary

Evidence

#	Timestamp (UTC)	CI run	Runtime before timeout	Commit
1	2026-04-15 09:05:30Z	24445814887	21.4 s	`14d3e4da81`
2	2026-04-16 12:29:04Z	24510063156	58.1 s	`889315b433`

Both failures, identical signatures:

File: src/gateway/server.canvas-auth.test.ts
Suite: gateway canvas host auth
Test: authorizes canvas HTTP/WS via node-scoped capability and rejects misuse
Stack: fetchCanvas at src/gateway/server.canvas-auth.test.ts:38:12 → withCanvasGatewayHarness at line 266:5
Neighboring 6 tests in the same suite passed cleanly on both occurrences
Both runs on shared Blacksmith 8-vCPU runners

Why this looks like a race, not a bug

27 h apart with no relevant code changes between
Runtime 21.4 s vs 58.1 s — 2.7× variance before the same assertion gives up
1 of 7 tests in the file fails; sibling tests using similar harness patterns pass

Hypothesis

fetchCanvas (line 38) likely has an implicit wait/poll on either a TCP port, a WebSocket handshake, or a canvas render readiness signal. Under shared-runner CPU contention the poll window overruns its timeout budget. Candidates to investigate:

Hard-coded timeout inside fetchCanvas that was tuned on a quiet runner
WebSocket upgrade race where the HTTP handler and WS handler compete for the first message
Port allocation race with a sibling test in the same shard

Proposal

Audit fetchCanvas (line 38) + withCanvasGatewayHarness (line 266) for deterministic readiness signaling
If structural fix is expensive, wrap this test with retry: 2 at the test-runner level
Long term: make port allocation explicit (per-test random port) to rule out shard-level conflict

Historical workaround

Empty-commit retrigger on the affected PR has cleared the flake both times (commits 72a3138bdb and d0b1f243f7 on PR #67096). This is a band-aid; a proper fix would save every future PR author the same round-trip.

extent analysis

TL;DR

The test flakiness can be addressed by auditing fetchCanvas and withCanvasGatewayHarness for deterministic readiness signaling or by wrapping the test with a retry mechanism.

Guidance

Investigate the fetchCanvas function at line 38 for any hard-coded timeouts that may be causing the issue under CPU contention.
Examine the WebSocket upgrade process in fetchCanvas and withCanvasGatewayHarness for potential races between the HTTP and WS handlers.
Consider implementing a retry mechanism at the test-runner level with retry: 2 to mitigate the flakiness if a structural fix is not feasible.

Example

No specific code snippet can be provided without modifying the original code, but the audit of fetchCanvas and withCanvasGatewayHarness should focus on ensuring deterministic readiness signaling, potentially by introducing explicit waits or checks for specific conditions.

Notes

The historical workaround of retriggering the PR with an empty commit has temporarily resolved the issue, but a proper fix is necessary to prevent future occurrences. The proposed steps aim to address the root cause of the flakiness.

Recommendation

Apply a workaround by wrapping the test with retry: 2 at the test-runner level, as this provides a temporary solution to mitigate the flakiness while a more permanent fix is investigated and implemented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory optimization #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Flaky test: src/gateway/server.canvas-auth.test.ts "authorizes canvas HTTP/WS via node-scoped capability" [2 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Historical workaround

PR fix notes

PR #67096: ci: upgrade remaining v4 actions to current majors

Description (problem / solution / changelog)

Summary

Changes

Non-changes (intentionally left alone)

Compatibility notes

Test plan

Changed files

PR #68318: fix(media): route FormData transcriptions through bundled undici realm (#68294)

Description (problem / solution / changelog)

Problem

Fix

Test plan

AI-assisted

Changed files

Summary

Evidence

Why this looks like a race, not a bug

Hypothesis

Proposal

Historical workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING