openclaw - ✅(Solved) Fix bug: pdf extraction fallback renderer omits standardFontDataUrl and emits PDF.js warnings in Node [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#51455Fetched 2026-04-08 01:11:01
View on GitHub
Comments
2
Participants
3
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×2cross-referenced ×1referenced ×1

The PDF extraction-fallback path for non-native PDF analysis can emit repeated PDF.js warnings during page-image rendering in Node:

UnknownErrorException: Ensure that the standardFontDataUrl API parameter is provided.

The analysis can still succeed, but the fallback renderer is missing the standard-font configuration PDF.js expects.

Root Cause

The PDF extraction-fallback path for non-native PDF analysis can emit repeated PDF.js warnings during page-image rendering in Node:

UnknownErrorException: Ensure that the standardFontDataUrl API parameter is provided.

The analysis can still succeed, but the fallback renderer is missing the standard-font configuration PDF.js expects.

Fix Action

Fix / Workaround

Working patch shape:

PR fix notes

PR #51465: fix(pdf): provide standardFontDataUrl for Node fallback rendering

Description (problem / solution / changelog)

Summary

  • pass standardFontDataUrl into the Node PDF extraction-fallback renderer
  • use the working filesystem-path form (.pathname) instead of file://...href
  • add focused regression coverage for the fallback render path

Closes #51455.

Problem

The non-native PDF extraction fallback could emit repeated PDF.js warnings during page-image rendering in Node:

UnknownErrorException: Ensure that the standardFontDataUrl API parameter is provided.

The analysis still succeeded, but the renderer was missing the standard-font configuration PDF.js expects.

Root cause

The fallback path called getDocument({ data, disableWorker: true }) without standardFontDataUrl before later calling page.render(...).

Fix

Provide:

const standardFontDataUrl = new URL(
  "../node_modules/pdfjs-dist/standard_fonts/",
  import.meta.url,
).pathname;

and pass it into getDocument(...).

Why .pathname

Local verification showed that .href changed the failure into font-load warnings in this Node runtime path, while .pathname eliminated the warning both locally and in a live post-restart check.

Testing

  • corepack pnpm exec vitest run --config vitest.config.ts src/media/pdf-extract.test.ts src/agents/tools/pdf-tool.test.ts
    • 52 passed

Notes

  • AI-assisted change
  • Narrow PR: PDF extraction-fallback only
  • No full repo pnpm test run

Changed files

  • src/media/pdf-extract.test.ts (added, +64/-0)
  • src/media/pdf-extract.ts (modified, +10/-1)
  • src/types/pdfjs-dist-legacy.d.ts (modified, +5/-1)

Code Example

const standardFontDataUrl = new URL(
  "../node_modules/pdfjs-dist/standard_fonts/",
  import.meta.url
).pathname;

const pdf = await getDocument({
  data: new Uint8Array(buffer),
  disableWorker: true,
  standardFontDataUrl,
}).promise;
RAW_BUFFERClick to expand / collapse

Summary

The PDF extraction-fallback path for non-native PDF analysis can emit repeated PDF.js warnings during page-image rendering in Node:

UnknownErrorException: Ensure that the standardFontDataUrl API parameter is provided.

The analysis can still succeed, but the fallback renderer is missing the standard-font configuration PDF.js expects.

Expected behavior

When the fallback path renders PDF pages to PNG images, it should provide PDF.js with access to standard font assets so rendering does not emit standardFontDataUrl warnings.

Actual behavior

When fallback image rendering is triggered, PDF.js logs warnings like:

UnknownErrorException: Ensure that the standardFontDataUrl API parameter is provided.

This appears once per rendered page in affected PDFs.

Scope

This is the PDF extraction-fallback renderer path, not the native Anthropic/Google PDF path, and not the separate Codex PDF instructions bug.

Local root-cause finding

The fallback renderer uses pdfjs-dist/legacy/build/pdf.mjs and calls getDocument({ data, disableWorker: true }) before later calling page.render(...), but does not provide standardFontDataUrl.

Local verification

I reproduced the warning locally on the fallback rendering path, then tested two fix shapes:

  1. no standardFontDataUrl
    • reproduced the warning
  2. standardFontDataUrl via .href
    • changed the problem into font-load warnings in this Node runtime path
  3. standardFontDataUrl via .pathname
    • eliminated the warning in local verification and in a live post-restart check

Working patch shape:

const standardFontDataUrl = new URL(
  "../node_modules/pdfjs-dist/standard_fonts/",
  import.meta.url
).pathname;

const pdf = await getDocument({
  data: new Uint8Array(buffer),
  disableWorker: true,
  standardFontDataUrl,
}).promise;

Impact

  • noisy gateway logs
  • can obscure real warnings/errors
  • non-fatal, but a real implementation gap

Suggested fix

Pass standardFontDataUrl in the extraction-fallback renderer and add regression coverage for a fallback-render case that needs standard font assets.

extent analysis

Fix Plan

To fix the issue, we need to provide the standardFontDataUrl API parameter when calling getDocument. Here are the steps:

  • Update the getDocument call to include the standardFontDataUrl parameter.
  • Use the pathname property to construct the standardFontDataUrl.

Example code:

const standardFontDataUrl = new URL(
  "../node_modules/pdfjs-dist/standard_fonts/",
  import.meta.url
).pathname;

const pdf = await getDocument({
  data: new Uint8Array(buffer),
  disableWorker: true,
  standardFontDataUrl,
}).promise;
  • Ensure that the standardFontDataUrl points to the correct location of the standard font assets.

Verification

To verify that the fix worked:

  • Run the fallback rendering path and check for the absence of standardFontDataUrl warnings.
  • Verify that the PDF pages are rendered correctly and that the font assets are loaded without errors.

Extra Tips

  • Make sure to update the regression tests to cover the fallback rendering case that needs standard font assets.
  • Consider adding logging or monitoring to detect any future issues with font loading or rendering.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the fallback path renders PDF pages to PNG images, it should provide PDF.js with access to standard font assets so rendering does not emit standardFontDataUrl warnings.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix bug: pdf extraction fallback renderer omits standardFontDataUrl and emits PDF.js warnings in Node [1 pull requests, 2 comments, 3 participants]