openclaw - ✅(Solved) Fix [Bug]: PDF page image extraction silently fails when @napi-rs/canvas is not installed — no fallback to pdftoppm [2 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75358Fetched 2026-05-01 05:34:44
View on GitHub
Comments
2
Participants
3
Timeline
10
Reactions
2
Author
Timeline (top)
commented ×2cross-referenced ×2labeled ×2referenced ×2

When processing PDFs with vision models on headless Linux servers where @napi-rs/canvas is not installed, the document-extract plugin silently fails to render page images. Only text extraction works. The docs state: "Extraction fallback requires pdfjs-dist (and @napi-rs/canvas for image rendering)" — but there is no graceful degradation when canvas is absent.

Error Message

@napi-rs/canvas is NOT installed on the host: $ node -e "require('@napi-rs/canvas')" Error: Cannot find module '@napi-rs/canvas' poppler-utils IS installed and functional: $ which pdftoppm pdftotext pdfinfo /usr/bin/pdftoppm /usr/bin/pdftotext /usr/bin/pdfinfo Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).

Root Cause

When processing PDFs with vision models on headless Linux servers where @napi-rs/canvas is not installed, the document-extract plugin silently fails to render page images. Only text extraction works. The docs state: "Extraction fallback requires pdfjs-dist (and @napi-rs/canvas for image rendering)" — but there is no graceful degradation when canvas is absent.

Fix Action

Fix / Workaround

@napi-rs/canvas is NOT installed on the host:
$ node -e "require('@napi-rs/canvas')"
Error: Cannot find module '@napi-rs/canvas'
poppler-utils IS installed and functional:
$ which pdftoppm pdftotext pdfinfo
/usr/bin/pdftoppm
/usr/bin/pdftotext
/usr/bin/pdfinfo
Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).

Regression: This became a reoccurring patch requirement in 2026.4.29 when OpenClaw moved local PDF extraction into the bundled document-extract plugin, making the file part of the distributed dist/ that gets overwritten on every upgrade.

Workaround: Manually patch ~/.npm-global/lib/node_modules/openclaw/dist/extensions/document-extract/document-extractor.js after each upgrade to add pdftoppm + pdftotext fallbacks when canvas is unavailable.

PR fix notes

PR #1: feat(document-extract): add pdftoppm fallback when @napi-rs/canvas is unavailable

Description (problem / solution / changelog)

Summary

When processing PDFs with vision models on headless Linux servers where is not installed, the document-extract plugin silently fails to render page images — returning only extracted text with no fallback.

This change adds pdftoppm (from poppler-utils) as a fallback renderer when is unavailable.

Changes

  • ****: checks if pdftoppm is present on the system
  • ****: writes PDF to a temp file, runs pdftoppm to render a single page as PNG, reads the result back, and cleans up temp files
  • Fallback branching: when throws and pdftoppm is available, the image rendering loop enters the pdftoppm path instead of returning \n

Fixes

Fixes #75358

Testing

  • Verified on a headless Ubuntu 24.04 LTS server with poppler-utils installed but no \n- pdftoppm rendering produces PNG images matching the expected scale and pixel budget
  • Temp files are cleaned up in all code paths (including error paths)

Checklist

  • Self-tested on target environment
  • No new dependencies introduced
  • CI checks pass
  • Commit message follows conventional commits format

Changed files

  • extensions/document-extract/document-extractor.ts (modified, +84/-11)

PR #75370: feat(document-extract): add pdftoppm fallback when @napi-rs/canvas is unavailable

Description (problem / solution / changelog)

Summary

When processing PDFs with vision models on headless Linux servers where @napi-rs/canvas is not installed, the document-extract plugin silently fails to render page images — returning only extracted text with no fallback.

This change adds pdftoppm (from poppler-utils) as a fallback renderer when @napi-rs/canvas is unavailable.

Changes

  • isPdftoppmAvailable(): checks if pdftoppm is present on the system
  • renderPageWithPdftoppm(): writes PDF to a private temp directory, runs pdftoppm to render a single page as PNG, reads the result back, and recursively cleans up the temp directory
  • Fallback branching: when loadCanvasModule() throws and pdftoppm is available, the image rendering loop enters the pdftoppm path instead of returning images: []

Security

  • Uses mkdtemp(3) with 0o700 permissions instead of writing PDFs to the shared /tmp with default permissions
  • 30-second timeout on execSync to prevent indefinite blocking on malformed PDFs
  • DPI computed from actual pixel budget rather than forcing a minimum 72 DPI regardless of scale

Fixes

Fixes #75358

Testing

  • Code review pass; no new dependencies introduced
  • Temp directory cleanup verified in all code paths (success, error, and exception paths)
  • Pixel budget tracking consistent between canvas and pdftoppm rendering paths

Checklist

  • Self-reviewed
  • No new dependencies introduced
  • Security issues addressed (temp permissions, timeout, pixel budget)
  • Commit message follows conventional commits format

Changed files

  • extensions/document-extract/document-extractor.ts (modified, +143/-35)

Code Example

@napi-rs/canvas is NOT installed on the host:
$ node -e "require('@napi-rs/canvas')"
Error: Cannot find module '@napi-rs/canvas'
poppler-utils IS installed and functional:
$ which pdftoppm pdftotext pdfinfo
/usr/bin/pdftoppm
/usr/bin/pdftotext
/usr/bin/pdfinfo
Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When processing PDFs with vision models on headless Linux servers where @napi-rs/canvas is not installed, the document-extract plugin silently fails to render page images. Only text extraction works. The docs state: "Extraction fallback requires pdfjs-dist (and @napi-rs/canvas for image rendering)" — but there is no graceful degradation when canvas is absent.

Steps to reproduce

Install OpenClaw on a headless Linux server without @napi-rs/canvas Send a PDF to a vision model (e.g., Qwen-VL via Ollama) with low/no text content (so image rendering is triggered) Observe that the model receives only extracted text (if any), not page images The onImageExtractionError callback fires with "Optional dependency @napi-rs/canvas is required for PDF image extraction"

Expected behavior

When @napi-rs/canvas is unavailable, the document-extract plugin should fall back to external tools like pdftoppm (from poppler-utils) to render PDF pages to PNG images. This is a common pattern on headless servers.

Actual behavior

Page image rendering is skipped entirely. The tool returns only extracted text, which may be insufficient for documents that rely on visual layout, diagrams, or scanned pages.

OpenClaw version

2026.4.29

Operating system

Ubuntu 24.04 LTS (headless server, no GUI)

Install method

npm

Model

ollama/kimi2.6-cloud

Provider / routing chain

openclaw -> Local Ollama -> Cloud Ollama -> Kimi2.6-cloud

Additional provider/model setup details

Ollama runs on a local Kubernetes cluster. The issue is not specific to Ollama — it affects any provider that triggers the document-extract fallback path (i.e., any non-native-PDF provider like Anthropic/Google). The PDF tool resolves to the session/default model when no native PDF provider is available.

Logs, screenshots, and evidence

@napi-rs/canvas is NOT installed on the host:
$ node -e "require('@napi-rs/canvas')"
Error: Cannot find module '@napi-rs/canvas'
poppler-utils IS installed and functional:
$ which pdftoppm pdftotext pdfinfo
/usr/bin/pdftoppm
/usr/bin/pdftotext
/usr/bin/pdfinfo
Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).

Impact and severity

Affected users/systems/channels: Any OpenClaw deployment on headless Linux servers without @napi-rs/canvas installed, using vision models for PDF analysis via non-native PDF providers.

Severity: Medium. PDF text extraction still works. Vision model analysis of diagrams, scanned pages, or visually structured documents is silently degraded to text-only.

Frequency: Always on affected setups. The canvas module load throws on every PDF that triggers image rendering fallback.

Consequence: Users receive incomplete analysis when asking vision models to interpret PDFs containing figures, diagrams, or layouts that depend on visual rendering. No user-facing error is shown — the tool silently returns empty images: [].

Additional information

Regression: This became a reoccurring patch requirement in 2026.4.29 when OpenClaw moved local PDF extraction into the bundled document-extract plugin, making the file part of the distributed dist/ that gets overwritten on every upgrade.

Workaround: Manually patch ~/.npm-global/lib/node_modules/openclaw/dist/extensions/document-extract/document-extractor.js after each upgrade to add pdftoppm + pdftotext fallbacks when canvas is unavailable.

extent analysis

TL;DR

Install @napi-rs/canvas or apply a patch to use pdftoppm as a fallback for PDF image rendering on headless Linux servers.

Guidance

  • Verify that poppler-utils is installed and functional on the headless Linux server, as it provides the pdftoppm command used in the fallback.
  • Consider patching the document-extractor.js file to add a fallback to pdftoppm when @napi-rs/canvas is not installed, as shown in the provided logs.
  • If using npm to install OpenClaw, ensure that the @napi-rs/canvas package is included in the installation or installed separately.
  • Test the PDF image rendering with a vision model after applying the patch or installing @napi-rs/canvas to ensure that page images are correctly rendered.

Example

No code snippet is provided, as the issue is resolved by either installing a package or applying a patch to an existing file.

Notes

The provided workaround of manually patching document-extractor.js after each upgrade may not be sustainable in the long term. A more permanent solution, such as installing @napi-rs/canvas or modifying the OpenClaw installation process to include the fallback, should be considered.

Recommendation

Apply a workaround by patching document-extractor.js to use pdftoppm as a fallback, as this provides a functional solution without requiring the installation of @napi-rs/canvas. However, installing @napi-rs/canvas may be a more straightforward solution if possible.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When @napi-rs/canvas is unavailable, the document-extract plugin should fall back to external tools like pdftoppm (from poppler-utils) to render PDF pages to PNG images. This is a common pattern on headless servers.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING