openclaw - ✅(Solved) Fix [Bug]: PDF page image extraction silently fails when @napi-rs/canvas is not installed — no fallback to pdftoppm [2 pull requests, 2 comments, 3 participants]

openclaw2026-05-01 02:02:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75358•Fetched 2026-05-01 05:34:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×2cross-referenced ×2labeled ×2referenced ×2

When processing PDFs with vision models on headless Linux servers where @napi-rs/canvas is not installed, the document-extract plugin silently fails to render page images. Only text extraction works. The docs state: "Extraction fallback requires pdfjs-dist (and @napi-rs/canvas for image rendering)" — but there is no graceful degradation when canvas is absent.

Error Message

@napi-rs/canvas is NOT installed on the host: $ node -e "require('@napi-rs/canvas')" Error: Cannot find module '@napi-rs/canvas' poppler-utils IS installed and functional: $ which pdftoppm pdftotext pdfinfo /usr/bin/pdftoppm /usr/bin/pdftotext /usr/bin/pdfinfo Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).

Root Cause

Fix Action

Fix / Workaround

@napi-rs/canvas is NOT installed on the host:
$ node -e "require('@napi-rs/canvas')"
Error: Cannot find module '@napi-rs/canvas'
poppler-utils IS installed and functional:
$ which pdftoppm pdftotext pdfinfo
/usr/bin/pdftoppm
/usr/bin/pdftotext
/usr/bin/pdfinfo
Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).

Regression: This became a reoccurring patch requirement in 2026.4.29 when OpenClaw moved local PDF extraction into the bundled document-extract plugin, making the file part of the distributed dist/ that gets overwritten on every upgrade.

Workaround: Manually patch ~/.npm-global/lib/node_modules/openclaw/dist/extensions/document-extract/document-extractor.js after each upgrade to add pdftoppm + pdftotext fallbacks when canvas is unavailable.

PR fix notes

PR #1: feat(document-extract): add pdftoppm fallback when @napi-rs/canvas is unavailable

Repository: mgustimz/openclaw
Author: mgustimz
State: closed | merged: False
Link: https://github.com/mgustimz/openclaw/pull/1

Description (problem / solution / changelog)

Summary

When processing PDFs with vision models on headless Linux servers where is not installed, the document-extract plugin silently fails to render page images — returning only extracted text with no fallback.

This change adds pdftoppm (from poppler-utils) as a fallback renderer when is unavailable.

Changes

****: checks if pdftoppm is present on the system
****: writes PDF to a temp file, runs pdftoppm to render a single page as PNG, reads the result back, and cleans up temp files
Fallback branching: when throws and pdftoppm is available, the image rendering loop enters the pdftoppm path instead of returning \n

Fixes

Fixes #75358

Testing

Verified on a headless Ubuntu 24.04 LTS server with poppler-utils installed but no \n- pdftoppm rendering produces PNG images matching the expected scale and pixel budget
Temp files are cleaned up in all code paths (including error paths)

Checklist

Self-tested on target environment
No new dependencies introduced
CI checks pass
Commit message follows conventional commits format

Changed files

extensions/document-extract/document-extractor.ts (modified, +84/-11)

PR #75370: feat(document-extract): add pdftoppm fallback when @napi-rs/canvas is unavailable

Repository: openclaw/openclaw
Author: mgustimz
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/75370

Description (problem / solution / changelog)

Summary

When processing PDFs with vision models on headless Linux servers where @napi-rs/canvas is not installed, the document-extract plugin silently fails to render page images — returning only extracted text with no fallback.

This change adds pdftoppm (from poppler-utils) as a fallback renderer when @napi-rs/canvas is unavailable.

Changes

isPdftoppmAvailable(): checks if pdftoppm is present on the system
renderPageWithPdftoppm(): writes PDF to a private temp directory, runs pdftoppm to render a single page as PNG, reads the result back, and recursively cleans up the temp directory
Fallback branching: when loadCanvasModule() throws and pdftoppm is available, the image rendering loop enters the pdftoppm path instead of returning images: []

Security

Uses mkdtemp(3) with 0o700 permissions instead of writing PDFs to the shared /tmp with default permissions
30-second timeout on execSync to prevent indefinite blocking on malformed PDFs
DPI computed from actual pixel budget rather than forcing a minimum 72 DPI regardless of scale

Fixes

Fixes #75358

Testing

Code review pass; no new dependencies introduced
Temp directory cleanup verified in all code paths (success, error, and exception paths)
Pixel budget tracking consistent between canvas and pdftoppm rendering paths

Checklist

Self-reviewed
No new dependencies introduced
Security issues addressed (temp permissions, timeout, pixel budget)
Commit message follows conventional commits format

Changed files

extensions/document-extract/document-extractor.ts (modified, +143/-35)

Code Example

@napi-rs/canvas is NOT installed on the host:
$ node -e "require('@napi-rs/canvas')"
Error: Cannot find module '@napi-rs/canvas'
poppler-utils IS installed and functional:
$ which pdftoppm pdftotext pdfinfo
/usr/bin/pdftoppm
/usr/bin/pdftotext
/usr/bin/pdfinfo
Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Install OpenClaw on a headless Linux server without @napi-rs/canvas Send a PDF to a vision model (e.g., Qwen-VL via Ollama) with low/no text content (so image rendering is triggered) Observe that the model receives only extracted text (if any), not page images The onImageExtractionError callback fires with "Optional dependency @napi-rs/canvas is required for PDF image extraction"

Expected behavior

When @napi-rs/canvas is unavailable, the document-extract plugin should fall back to external tools like pdftoppm (from poppler-utils) to render PDF pages to PNG images. This is a common pattern on headless servers.

Actual behavior

Page image rendering is skipped entirely. The tool returns only extracted text, which may be insufficient for documents that rely on visual layout, diagrams, or scanned pages.

OpenClaw version

2026.4.29

Operating system

Ubuntu 24.04 LTS (headless server, no GUI)

Install method

npm

Model

ollama/kimi2.6-cloud

Provider / routing chain

openclaw -> Local Ollama -> Cloud Ollama -> Kimi2.6-cloud

Additional provider/model setup details

Ollama runs on a local Kubernetes cluster. The issue is not specific to Ollama — it affects any provider that triggers the document-extract fallback path (i.e., any non-native-PDF provider like Anthropic/Google). The PDF tool resolves to the session/default model when no native PDF provider is available.

Logs, screenshots, and evidence

@napi-rs/canvas is NOT installed on the host:
$ node -e "require('@napi-rs/canvas')"
Error: Cannot find module '@napi-rs/canvas'
poppler-utils IS installed and functional:
$ which pdftoppm pdftotext pdfinfo
/usr/bin/pdftoppm
/usr/bin/pdftotext
/usr/bin/pdfinfo
Patching dist/extensions/document-extract/document-extractor.js to add pdftoppm fallback resolves the issue (applied manually on 2026-05-01).

Impact and severity

Affected users/systems/channels: Any OpenClaw deployment on headless Linux servers without @napi-rs/canvas installed, using vision models for PDF analysis via non-native PDF providers.

Severity: Medium. PDF text extraction still works. Vision model analysis of diagrams, scanned pages, or visually structured documents is silently degraded to text-only.

Frequency: Always on affected setups. The canvas module load throws on every PDF that triggers image rendering fallback.

Consequence: Users receive incomplete analysis when asking vision models to interpret PDFs containing figures, diagrams, or layouts that depend on visual rendering. No user-facing error is shown — the tool silently returns empty images: [].

Additional information

extent analysis

TL;DR

Install @napi-rs/canvas or apply a patch to use pdftoppm as a fallback for PDF image rendering on headless Linux servers.

Guidance

Verify that poppler-utils is installed and functional on the headless Linux server, as it provides the pdftoppm command used in the fallback.
Consider patching the document-extractor.js file to add a fallback to pdftoppm when @napi-rs/canvas is not installed, as shown in the provided logs.
If using npm to install OpenClaw, ensure that the @napi-rs/canvas package is included in the installation or installed separately.
Test the PDF image rendering with a vision model after applying the patch or installing @napi-rs/canvas to ensure that page images are correctly rendered.

Example

No code snippet is provided, as the issue is resolved by either installing a package or applying a patch to an existing file.

Notes

The provided workaround of manually patching document-extractor.js after each upgrade may not be sustainable in the long term. A more permanent solution, such as installing @napi-rs/canvas or modifying the OpenClaw installation process to include the fallback, should be considered.

Recommendation

Apply a workaround by patching document-extractor.js to use pdftoppm as a fallback, as this provides a functional solution without requiring the installation of @napi-rs/canvas. However, installing @napi-rs/canvas may be a more straightforward solution if possible.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #runtime error #dependency conflict #environment setup #docker error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: PDF page image extraction silently fails when @napi-rs/canvas is not installed — no fallback to pdftoppm [2 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #1: feat(document-extract): add pdftoppm fallback when @napi-rs/canvas is unavailable

Description (problem / solution / changelog)

Summary

Changes

Fixes

Testing

Checklist

Changed files

PR #75370: feat(document-extract): add pdftoppm fallback when @napi-rs/canvas is unavailable

Description (problem / solution / changelog)

Summary

Changes

Security

Fixes

Testing

Checklist

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING