dify - ✅(Solved) Fix web scraper sometime not work [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langgenius/dify#35449Fetched 2026-04-22 08:04:10
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
1
Author
Participants
Timeline (top)
labeled ×2closed ×1cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #35450: fix: webscaper sometime not work

Description (problem / solution / changelog)

[!IMPORTANT]

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

<!-- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. --> <!-- If this PR was created by an automated agent, add `From <Tool Name>` as the final line of the description. Example: `From Codex`. -->

fix https://github.com/langgenius/dify/issues/35449

the readabilipy always try to use nodejs to parse the html which will cause this error. more about this https://github.com/alan-turing-institute/ReadabiliPy

try to mock the error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/hejl/projects/dify/api/.venv/lib/python3.12/site-packages/readabilipy/simple_json.py", line 41, in simple_json_from_html_string
    if use_readability and not have_node():
                               ^^^^^^^^^^^
  File "/Users/hejl/projects/dify/api/.venv/lib/python3.12/site-packages/readabilipy/simple_json.py", line 36, in have_node
    run_npm_install()
  File "/Users/hejl/projects/dify/api/.venv/lib/python3.12/site-packages/readabilipy/utils.py", line 60, in run_npm_install
    cp = subprocess.run(["npm", "install"], check=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hejl/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['npm', 'install']' returned non-zero exit status 1.

Screenshots

BeforeAfter
......

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint && make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

Changed files

  • api/core/tools/utils/web_reader_tool.py (modified, +1/-1)
RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.13.3

Cloud or Self Hosted

Cloud

Steps to reproduce

<img width="828" height="1180" alt="Image" src="https://github.com/user-attachments/assets/f257f64c-9138-4104-8335-74464808d179" />

the builtin tool web scraper always not work in the cloud platform

✔️ Expected Behavior

the users can use the default builtin tool

❌ Actual Behavior

No response

extent analysis

TL;DR

The issue with the built-in web scraper tool not working in the cloud platform may be related to a version-specific bug or configuration issue in Dify version 1.13.3.

Guidance

  • Verify that the web scraper tool works as expected in self-hosted environments to isolate if the issue is cloud-specific.
  • Check the Dify documentation and release notes for version 1.13.3 to see if there are any known issues or updates related to the web scraper tool.
  • Test the web scraper tool with different inputs or configurations to see if the issue is consistent or if there are specific scenarios where it fails.
  • Consider reaching out to the Dify support or community forums for further assistance, as the issue might be specific to the cloud platform or require more detailed troubleshooting.

Example

No specific code example can be provided without more details on the web scraper tool's implementation or the exact error messages received.

Notes

The provided information does not include error messages or specific details about the web scraper tool's configuration, which could be crucial for a more accurate diagnosis. The issue seems to be specific to the cloud platform, which might imply a configuration or compatibility issue rather than a straightforward bug in the Dify version.

Recommendation

Apply workaround: Given the lack of detailed error messages or configuration details, and without a clear indication of a fixed version, the best course of action is to apply workarounds such as testing different configurations or seeking community support for potential temporary fixes until a more definitive solution can be found.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING