openclaw - ✅(Solved) Fix [Feature Request] Add file type detection and preprocessing for binary documents [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56876Fetched 2026-04-08 01:46:37
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
1
Author
Participants

Fix Action

Fix / Workaround

Workaround (Current)

PR fix notes

PR #5094: feat(tools): binary file type detection in file_read

Description (problem / solution / changelog)

Summary

  • Adds a file type detection layer to file_read that classifies files before reading
  • Binary files (images, office docs, executables, archives) return a concise metadata summary instead of dumping lossy UTF-8 into context
  • Three-tier detection: extension lookup → magic bytes → null-byte/UTF-8 heuristic
  • PDFs try text extraction first, fall back to summary
  • Configurable via binary_file_detection (default true)

Inspired by openclaw#56876.

Changes

FileWhat
src/tools/file_detect.rsNew — FileType enum, detection logic, 41 unit tests
src/tools/file_read.rsIntegrate detection into execute(), 6 new tests
src/tools/mod.rsWire module and config
src/config/schema.rsFileReadToolConfig with binary_file_detection

Example output

[Binary file: application/vnd.openxmlformats-officedocument.wordprocessingml.document, 1.2 MB — use a document conversion tool to read]

Test plan

  • 41 unit tests for detection logic
  • 6 integration tests for file_read integration
  • cargo build succeeds
  • cargo fmt --all clean
  • CI gate

Changed files

  • src/config/mod.rs (modified, +25/-24)
  • src/config/schema.rs (modified, +29/-0)
  • src/onboard/wizard.rs (modified, +2/-0)
  • src/tools/file_detect.rs (added, +655/-0)
  • src/tools/file_read.rs (modified, +224/-25)
  • src/tools/mod.rs (modified, +5/-1)

PR #66613: fix(gateway): fail loud on all attachment parse failures (#48123)

Description (problem / solution / changelog)

Summary

  • Problem: parseMessageWithAttachments silently dropped attachments on four separate code paths — non-image MIME sniff, unknown MIME, text-only session receiving any attachment, and zero-byte payloads. Each path logged a warning and returned or continued, so chat.send / agent.request / node-event ingress responded with success while the attachment silently vanished. A data-loss bug masquerading as normal behavior, reported across #48123, #60226, #33320, #35406, #56876.
  • Why it matters: Every gateway RPC client and every channel adapter that forwarded an attachment could hit one of these paths. The agent never saw the content, the user/operator saw no error, and the log line was too buried to be operational signal.
  • What changed:
    1. New UnsupportedAttachmentError class (next to MediaOffloadError) with a reason discriminator: "non-image" | "unknown-mime" | "text-only-session" | "empty-payload".
    2. All four silent-drop sites in src/gateway/chat-attachments.ts now throw UnsupportedAttachmentError with the matching reason.
    3. chat.send and agent.request already wrap the parser in try/catch and map non-MediaOffloadError failures to INVALID_REQUEST, so the new error rides the existing error envelope without touching RPC contracts.
    4. server-node-events.ts agent.request catch aborts on parser failure (log.warn + return) instead of falling through to a text-only dispatch. That path is fire-and-forget and has no structured failure channel back to the spawning node, so any dispatch after a parse error would mask the attachment loss behind a normal agent response.
    5. Tests cover all four throw reasons, the node-event abort, and the text-only-no-attachment passthrough; two existing chat.directive-tags.test.ts tests that asserted the old silent-drop behavior are flipped to assert INVALID_REQUEST.
  • What did NOT change (scope boundary): No PDF / document support added. No protocol changes. No new RPC wire types. Image attachment path and MediaOffloadError handling unchanged. No downstream agent runner changes.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #48123
  • Related #60226, #33320, #35406, #56876
  • This PR fixes a bug or regression

Root Cause

  • Root cause (parser, four paths):
    1. Sniffed non-image (chat-attachments.ts:360-367): per-attachment sniff returned a non-image MIME → log.warn + continue. Result: the non-image attachment was dropped from the returned images array and the caller saw a success.
    2. Unknown MIME (chat-attachments.ts:367-372): sniff failed and the provided MIME was not image/*log.warn + continue. Same drop-and-continue.
    3. Text-only session (chat-attachments.ts:310-317): when supportsImages === false, the function returned early with empty results after warning. This dropped all attachments (including valid images) before any per-attachment processing. Anyone sending an image to a text-only model saw a silent empty success — exactly the same data-loss shape as #48123 but one layer above.
    4. Zero-byte payload (chat-attachments.ts:345-349): estimateBase64DecodedBytes <= 0log.warn + continue. Defensive branch for an input state that the current isValidBase64 + estimateBase64DecodedBytes implementations cannot actually reach, but the log + continue shape was wrong on principle and a future change to the estimator would have silently resurrected it.
  • Root cause (node-event path): server-node-events.ts:390-428 wraps the parser in a try/catch where the catch does log.warn + return. That was consistent with how the handler already treated MediaOffloadError, invalid base64, and oversized payloads — all pre-existing parser errors. The fall-through amendment attempted mid-PR widened this catch to non-image errors but, as the codex adversarial review correctly flagged, extended the silent-swallow set to cover MediaOffloadError and malformed attachment errors too, so the catch now served as a text-only-masking layer for infrastructure failures. Reverted to the pre-existing abort semantics.
  • Missing detection / guardrail: The existing chat-attachments.test.ts assertions literally asserted the broken drop-and-log shape (expect(parsed.images).toHaveLength(0) + expect(logs[0]).toMatch(/non-image/i)). chat.directive-tags.test.ts had two tests locking in the text-only silent drop as expected behavior. server-node-events.test.ts mocked the parser entirely and had no coverage of the parser-throw catch path.
  • Contributing context: The parser was originally written for an image-only gateway and accreted silent-drop branches as non-image uploads became more common upstream. Principle 4 of the in-repo attachment design notes ("errors loudly, never silently drop") was never wired into code. This PR is the minimum mechanical change to satisfy that principle across all four parser paths without preempting the larger design discussion about real PDF / document support.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/chat-attachments.test.ts, src/gateway/server-methods/chat.directive-tags.test.ts, src/gateway/server-node-events.test.ts
  • Scenario the tests lock in:
    • chat-attachments.test.tsexpectUnsupportedAttachment helper asserts both the UnsupportedAttachmentError class and the specific reason. Five cases: sniffed-non-image (non-image), unsniffable + non-image MIME (unknown-mime), mixed batch with one non-image (non-image), text-only session with any attachment (text-only-session), text-only session with no attachments (passthrough, no throw).
    • chat.directive-tags.test.ts — two tests previously asserting silent drop for text-only session models now assert respond(false, undefined, { code: INVALID_REQUEST }) and mockState.lastDispatchImages === undefined. Covers both the direct session entry case and the session-agent-model resolution case.
    • server-node-events.test.ts — new test: parseMessageWithAttachmentsMock.mockRejectedValueOnce(...)handleNodeEvent returns without dispatching the agent run, logGateway.warn fires with the expected message. Locks in the abort-on-parse-failure contract for the node-event ingress path.
  • Why this is the smallest reliable guardrail: Each test covers one silent-drop path and each path gets a direct assertion on behavior (throw + reason for the parser, error envelope for the RPC, abort + log for the node-event). No test asserts side-effects at distance; no test relies on transitive behavior across layers.
  • Existing test that already covers this: None — the existing tests asserted the bug. They had to be replaced or flipped.
  • If no new test is added, why not: empty-payload has no direct test because the upstream isValidBase64 + estimateBase64DecodedBytes contract makes the branch unreachable without mocking the estimator. The throw is kept defensively to preserve the "fail loud" contract if the estimator ever changes; the existing cases exercise the happy path for that estimator.

User-visible / Behavior Changes

  • Before: chat.send / agent.request with a non-image attachment returned a successful response; the attachment was silently lost; the agent saw empty content and either ignored it or hallucinated about the missing document. Sending an image to a text-only model had the same shape — empty success, agent never saw the image.
  • After: Both paths return an explicit INVALID_REQUEST error. The error message names the offending attachment label, the detected/provided MIME, the failure reason (non-image, unknown-mime, text-only-session, empty-payload), and a link to #48123. Clients already handling other parser errors (invalid base64, MediaOffloadError, oversized payloads) get this through the same envelope.
  • Node-event agent.request path: when the parser throws for any reason, the handler logs agent.request attachment parse failed: <reason> and aborts the request without dispatching the agent run. Matches the pre-existing abort semantics for MediaOffloadError, invalid base64, and oversized payloads on the same path, now extended to cover non-image and text-only-session cases too.
  • This is an intentional behavior change on the text-only session attachment path: users who were sending images to a text-only model will now see INVALID_REQUEST instead of a silent empty reply. That is the point of the fix — the previous behavior was a data-loss bug, not a feature.

Diagram

Before (any silent-drop path on RPC or node-event):
[client sends PDF / image-to-text-only / empty / unsniffable]
  -> parseMessageWithAttachments: log.warn + continue  (per-attachment loop)
     or log.warn + return                              (text-only early return)
  -> parsed.images = []
  -> chat.send / agent.request / handleNodeEvent proceeds with empty attachments
  -> success response delivered to client / receipt sent / agent dispatched text-only
  -> attachment silently lost, no client-side signal

After (gateway RPC path):
[client sends any silent-drop-triggering attachment via chat.send / agent.request]
  -> parseMessageWithAttachments: throw new UnsupportedAttachmentError(msg, reason)
  -> existing try/catch in server-methods/{chat,agent}.ts maps to
     errorShape(INVALID_REQUEST, String(err))
  -> respond(false, undefined, errorShape)
  -> client receives structured 4xx error with reason + pointer to #48123

After (node-event agent.request path):
[spawned node sends agent.request with attachment]
  -> parseMessageWithAttachments throws (any reason)
  -> catch at server-node-events.ts logs warn and returns
  -> no dispatch, no receipt, no agent run
  -> pre-existing abort semantics preserved for the newly-expanded throw set

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • No Yes answers, so no risk/mitigation is required.

Repro + Verification

Environment

  • OS: macOS (Darwin 25.4.0)
  • Runtime/container: Node 24.14, Vitest 4.1.2
  • Model/provider: N/A — change is provider-agnostic and verified via parser unit tests plus mocked RPC tests
  • Integration/channel (if any): RPC path (chat.send, agent.request) and node-event ingress (handleNodeEvent agent.request case)
  • Relevant config (redacted): default

Steps

  1. node scripts/test-projects.mjs src/gateway/chat-attachments.test.ts
  2. node scripts/test-projects.mjs src/gateway/chat-attachments.test.ts src/gateway/server-node-events.test.ts src/gateway/server-methods
  3. node scripts/run-tsgo.mjs

Expected

  • Step 1: 12 / 12 chat-attachments.test.ts tests pass, including the five flipped / added assertions (three originals upgraded to expectUnsupportedAttachment, two new for text-only session).
  • Step 2: 439 / 439 tests pass across gateway + server-methods + node-events — includes the three new tests (two text-only cases in parser tests, one parse-abort case in node-events) and the two flipped directive-tags assertions.
  • Step 3: No new type errors introduced by this PR.

Actual

  • Step 1: 12 / 12 passed.
  • Step 2: 439 / 439 passed.
  • Step 3: 5 pre-existing extensions/discord/src/monitor/gateway-plugin.ts errors (firstHeartbeatTimeout property missing) that are already on origin/main and unrelated to this change. Filtering those out: 0 new type errors.

Evidence

Attach at least one:

  • Failing test/log before + passing after

  • Existing chat-attachments.test.ts tests that asserted parsed.images.length === 0 + logs[0].toMatch(/non-image/i) are replaced with expectUnsupportedAttachment(..., reason, pattern).

  • chat.directive-tags.test.ts tests that asserted mockState.lastDispatchImages === undefined (plus nothing about the error response) now also assert respond(false, undefined, { code: INVALID_REQUEST }).

  • server-node-events.test.ts gains coverage for the previously-untested parser catch path.

Human Verification (required)

  • Verified scenarios:
    • Non-image sniff → UnsupportedAttachmentError("non-image")
    • Unsniffable + non-image provided MIME → UnsupportedAttachmentError("unknown-mime")
    • Mixed batch (PNG + PDF) → parse fails at first non-image; client must retry with only the PNG
    • Text-only session + any attachment → UnsupportedAttachmentError("text-only-session")
    • Text-only session + no attachments → passthrough unchanged
    • Image-only happy path → unchanged (existing tests still pass)
    • Node-event parser throw → abort with warn log, no agent dispatch
  • Edge cases checked:
    • server-methods/chat.ts:1911 and server-methods/agent.ts:395 already wrap the parser in try/catch with respond(false, errorShape(err instanceof MediaOffloadError ? UNAVAILABLE : INVALID_REQUEST, String(err))). UnsupportedAttachmentError is not a MediaOffloadError, so it naturally maps to INVALID_REQUEST without RPC contract changes.
    • server-node-events.ts:416 catch was briefly amended to fall through on parser error (round 2) and narrowed in a proposed round 3; both were walked back in this round. Reverting to the pre-existing log.warn + return is the simpler and more honest fix — it matches how the handler has always treated other parser errors, and avoids adding a dedicated error-class branch that only exists to mask scope creep. Addressed by the amended commit 0ab71ca50c.
    • chat.directive-tags.test.ts used expectBroadcast: false (which waits for dedupe) for the two text-only tests. Since the new error path returns before setting dedupe, the flipped tests use waitFor: "none" to avoid hanging on the assertion, matching the existing rejects oversized chat.send session keys before dispatch pattern at chat.directive-tags.test.ts:589.
  • What I did not verify:
    • End-to-end RPC integration against a real gateway with a third-party desktop / CLI client. Rationale: the touched seams (parser throw, RPC catch mapping, node-event catch) are covered by direct unit tests at each layer, and the RPC error envelope is identical to the pre-existing contract for other parser errors.
    • Actual PDF / document support. This PR intentionally stops silent drops without changing what the gateway accepts; a follow-up PR can add application/pdf to SUPPORTED_OFFLOAD_MIMES plus a read_attachment tool via src/media/pdf-extract.ts.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Codex review history:

  • Round-1 P1 on chat-attachments.ts:364 (node-event catch swallowed everything after the new throw widened its trigger set): acknowledged and addressed by the current node-event abort + the matching test.
  • Round-2 P1 on server-node-events.ts:418 (fall-through widened silent-drop to MediaOffloadError / base64 / oversize): acknowledged; the fall-through has been reverted and the abort semantics restored. No single-use error class was added in service of this point.
  • Round-3 adversarial review P1 on server-node-events.ts:416-470 (same concern, reframed): addressed by the same revert. The handler aborts on all parser errors and the new test locks this in.
  • Round-3 adversarial review P1 on chat-attachments.ts:307-316 (text-only early return bypassed the non-image rejection): addressed by introducing UnsupportedAttachmentError with reason: "text-only-session" and throwing before the return.

Compatibility / Migration

  • Backward compatible? Yes for every correct client. The previous behavior was a data-loss bug: chat.send returned success while the attachment was silently discarded. No correct client can rely on "upload a PDF / send image to text-only model, get a successful empty response". This PR makes the failure mode observable via the existing 4xx error envelope.
  • Config/env changes? No.
  • Migration needed? No. Clients that already handle INVALID_REQUEST for other parser errors (invalid base64, oversized payloads) receive these new cases through the same channel.
  • Follow-up (not in this PR): add application/pdf to SUPPORTED_OFFLOAD_MIMES, introduce a read_attachment tool that resolves media://inbound/<id> via the existing src/media/pdf-extract.ts, and consider surfacing structured parser failures on the node-event path via a new back-channel. Each is an independent concern that deserves its own PR.

Risks and Mitigations

  • Risk: Webchat / Telegram / Slack / Matrix / iMessage / Discord users who were sending images to text-only models now get INVALID_REQUEST instead of an empty reply. This is the intentional behavior change the PR is named for.
    • Mitigation: The error message names the reason and points at #48123, so clients / adapters have a clean integration point. Channel adapters that want to silently absorb these errors can catch them and log or translate to a user-facing "this model can't see images" message. The pre-existing silent drop was hiding a configuration mismatch (text-only model selected, images attached) from operators.
  • Risk: Mixed-batch RPC clients sending PNG + PDF in one call previously had the PNG processed and the PDF dropped; now the entire call fails and the client must resubmit with only the PNG.
    • Mitigation: Documented in the mixed-batch test and the error message. Batch-level failure is symmetric with existing parser errors — invalid base64 or oversized payloads already fail the entire batch today. Clients that already handle those errors do not need new code.
  • Risk: The node-event agent.request path aborts cleanly on parse failure but still has no structured back-channel to the spawning node. A mixed-batch node-event request with a non-image attachment now drops the text content too (same as pre-PR for other parser errors).
    • Mitigation: This is a strict preservation of pre-existing abort semantics on that path for all parser errors (MediaOffloadError, invalid base64, oversized). Any user of the node-event ingress with failing attachments has always seen this shape of failure; extending the trigger set to non-image is acceptable interim state. A proper failure back-channel on the node-event path is a separate concern.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/gateway/chat-attachments.test.ts (modified, +108/-38)
  • src/gateway/chat-attachments.ts (modified, +52/-11)
  • src/gateway/server-methods/chat.directive-tags.test.ts (modified, +26/-4)
  • src/gateway/server-node-events.test.ts (modified, +49/-0)
  • src/gateway/server-node-events.ts (modified, +6/-0)

Code Example

User: read /path/to/document.docx
System: [Binary content: \x50\x4B\x03\x04... thousands of hex bytes ...]
Model: Attempts to process meaningless binary data

---

# Detect file type by extension or magic number
if file_type in ['.docx', '.xlsx', '.pdf', '.pptx']:
    # Extract text using appropriate libraries
    content = extract_text(file_path)
elif is_binary(file_content):
    content = f"[Binary file: {filename}, size: {size} bytes]"
else:
    content = read_text(file_path)

---

if file_size > MAX_CONTEXT_SIZE:
    content = summarize_or_chunk(file_path)
elif is_binary(file_path):
    content = "[Binary file - use appropriate tool]"

---

file_handlers:
  .docx: word-docx  # Use skill
  .xlsx: excel-xlsx
  .pdf: pdf-extract
  default: skip_binary
RAW_BUFFERClick to expand / collapse

Problem Description

When reading binary files like .docx, .xlsx, .pdf through the read tool, the raw binary content is passed directly to the model as context. This causes several issues:

  1. Token Waste: Binary data (e.g., "\x50\x4B\x03\x04...") consumes valuable context window tokens without providing meaningful information
  2. Model Confusion: The model attempts to "understand" binary sequences which have no semantic meaning
  3. Context Truncation: For large files (>1MB), binary content may occupy the entire context window, leaving no room for actual user queries
  4. Performance Impact: Processing unnecessary binary data slows down response generation

Current Behavior

User: read /path/to/document.docx
System: [Binary content: \x50\x4B\x03\x04... thousands of hex bytes ...]
Model: Attempts to process meaningless binary data

Proposed Solution

Add a preprocessing layer before file content enters the context:

Option A: File Type Detection + Text Extraction

# Detect file type by extension or magic number
if file_type in ['.docx', '.xlsx', '.pdf', '.pptx']:
    # Extract text using appropriate libraries
    content = extract_text(file_path)
elif is_binary(file_content):
    content = f"[Binary file: {filename}, size: {size} bytes]"
else:
    content = read_text(file_path)

Libraries to consider:

  • .docx: python-docx
  • .xlsx: openpyxl
  • .pdf: PyPDF2 or pdfplumber
  • .pptx: python-pptx

Option B: Size-based Filtering

if file_size > MAX_CONTEXT_SIZE:
    content = summarize_or_chunk(file_path)
elif is_binary(file_path):
    content = "[Binary file - use appropriate tool]"

Option C: User-configurable Handler

Allow users to configure file type handlers in settings:

file_handlers:
  .docx: word-docx  # Use skill
  .xlsx: excel-xlsx
  .pdf: pdf-extract
  default: skip_binary

Benefits

  1. Efficient Token Usage: Only meaningful text enters the context
  2. Better Model Performance: Model focuses on relevant content
  3. Cleaner UX: Users don't see binary garbage in context
  4. Extensibility: Easy to add support for new file types

Workaround (Current)

Users currently need to manually delete binary content from session files or explicitly tell the model to ignore it.

Additional Context

This issue was discovered when attempting to read Word document templates for training material generation. The binary content filled the context window and required manual cleanup.


Would you like me to submit a PR for this feature? I can implement Option A as a preprocessing middleware.

extent analysis

Fix Plan

To address the issue of binary files causing problems when read through the read tool, we will implement a preprocessing layer. We will use Option A: File Type Detection + Text Extraction.

Step-by-Step Solution:

  1. Install required libraries:
    • python-docx for .docx files
    • openpyxl for .xlsx files
    • PyPDF2 or pdfplumber for .pdf files
    • python-pptx for .pptx files
  2. Implement file type detection and text extraction:

from docx import Document from openpyxl import load_workbook from PyPDF2 import PdfReader from pptx import Presentation

def extract_text(file_path): file_type = file_path.split('.')[-1] if file_type == 'docx': doc = Document(file_path) text = '\n'.join([paragraph.text for paragraph in doc.paragraphs]) elif file_type == 'xlsx': wb = load_workbook(file_path) text = '\n'.join([cell.value for sheet in wb.worksheets for row in sheet.rows for cell in row]) elif file_type == 'pdf': pdf_reader = PdfReader(file_path) text = '\n'.join([page.extract_text() for page in pdf_reader.pages]) elif file_type == 'pptx': presentation = Presentation(file_path) text = '\n'.join([run.text for slide in presentation.slides for shape in slide.shapes if hasattr(shape, "text") for run in shape.text.frame.paragraphs]) else: # Handle other file types or raise an error pass return text

3. **Integrate the text extraction function into the read tool**:
```python
def read_file(file_path):
 if file_path.endswith(('.docx', '.xlsx', '.pdf', '.pptx')):
     content = extract_text(file_path)
 elif is_binary(file_path):
     content = f"[Binary file: {file_path.split('/')[-1]}, size: {os.path.getsize(file_path)} bytes]"
 else:
     content = read_text(file_path)
 return content

Verification

To verify that the fix worked, you can test the read tool with different file types, including binary files. The output should be the extracted text for supported file types and a message indicating the file is binary for unsupported file types.

Extra Tips

  • Make sure to handle exceptions and errors properly when working with files and external libraries.
  • Consider adding support for more file types and improving the text extraction functionality.
  • You can also implement Option B: Size-based Filtering or Option C: User-configurable Handler as additional features to further improve the read tool.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING