dify - ✅(Solved) Fix [Refactor/Chore] Centralize remote file retrieval for signed file URLs [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langgenius/dify#36397Fetched 2026-05-20 04:00:13
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
1
Author
Participants
Assignees
Timeline (top)
cross-referenced ×3assigned ×1

Several backend paths fetch URL-backed inputs as remote files through the generic outbound HTTP client. This makes first-party Dify signed file URLs follow the same network path as arbitrary external URLs, even when those URLs point back to files already stored by the current Dify deployment.

Propose introducing a single remote-file retrieval boundary that:

  • handles remote file GET/HEAD operations in one module
  • recognizes valid Dify-signed file URLs for upload, tool, and datasource files
  • loads first-party files through database records and configured storage
  • delegates ordinary external URLs to the existing SSRF-protected HTTP client
  • documents when callers should use the remote-file fetcher versus the generic SSRF proxy

Root Cause

Several backend paths fetch URL-backed inputs as remote files through the generic outbound HTTP client. This makes first-party Dify signed file URLs follow the same network path as arbitrary external URLs, even when those URLs point back to files already stored by the current Dify deployment.

Propose introducing a single remote-file retrieval boundary that:

  • handles remote file GET/HEAD operations in one module
  • recognizes valid Dify-signed file URLs for upload, tool, and datasource files
  • loads first-party files through database records and configured storage
  • delegates ordinary external URLs to the existing SSRF-protected HTTP client
  • documents when callers should use the remote-file fetcher versus the generic SSRF proxy

Fix Action

Fixed

PR fix notes

PR #36399: fix(api): centralize remote file retrieval

Description (problem / solution / changelog)

[!IMPORTANT]

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

Fixes #36397.

This PR centralizes backend remote-file fetching behind core.file.remote_fetcher. Remote-file GET/HEAD calls now detect valid Dify-signed URLs for upload, tool, and datasource files and read them from database records and configured storage, while ordinary external file URLs continue through the existing SSRF-protected HTTP client.

It also routes existing remote-file call sites through the new boundary, removes unused ToolFileManager.verify_file() and DatasourceFileManager.verify_file() helpers, and documents when callers should use remote_fetcher versus ssrf_proxy.

Screenshots

N/A. Backend-only change.

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint && make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

Changed files

  • api/controllers/console/remote_files.py (modified, +6/-6)
  • api/controllers/web/remote_files.py (modified, +6/-6)
  • api/core/app/workflow/file_runtime.py (modified, +2/-2)
  • api/core/datasource/datasource_file_manager.py (modified, +2/-22)
  • api/core/file/__init__.py (added, +5/-0)
  • api/core/file/remote_fetcher.py (added, +371/-0)
  • api/core/helper/download.py (modified, +2/-2)
  • api/core/helper/ssrf_proxy.py (modified, +10/-2)
  • api/core/rag/extractor/extract_processor.py (modified, +2/-2)
  • api/core/rag/extractor/word_extractor.py (modified, +4/-4)
  • api/core/rag/index_processor/index_processor_base.py (modified, +2/-2)
  • api/core/tools/tool_file_manager.py (modified, +2/-22)
  • api/core/tools/utils/web_reader_tool.py (modified, +4/-4)
  • api/core/workflow/node_factory.py (modified, +5/-3)
  • api/factories/file_factory/remote.py (modified, +2/-2)
  • api/services/app_dsl_service.py (modified, +2/-2)
  • api/services/rag_pipeline/rag_pipeline_dsl_service.py (modified, +2/-2)
  • api/tests/test_containers_integration_tests/services/test_app_dsl_service.py (modified, +5/-5)
  • api/tests/unit_tests/controllers/console/test_remote_files.py (modified, +29/-17)
  • api/tests/unit_tests/controllers/web/test_remote_files.py (modified, +5/-5)
  • api/tests/unit_tests/core/app/workflow/test_file_runtime.py (modified, +5/-1)
  • api/tests/unit_tests/core/datasource/test_datasource_file_manager.py (modified, +4/-35)
  • api/tests/unit_tests/core/file/test_remote_fetcher.py (added, +605/-0)
  • api/tests/unit_tests/core/helper/test_download.py (modified, +4/-4)
  • api/tests/unit_tests/core/rag/extractor/test_extract_processor.py (modified, +1/-1)
  • api/tests/unit_tests/core/rag/extractor/test_word_extractor.py (modified, +5/-5)
  • api/tests/unit_tests/core/rag/indexing/test_index_processor_base.py (modified, +7/-7)
  • api/tests/unit_tests/core/tools/test_tool_file_manager.py (modified, +6/-35)
  • api/tests/unit_tests/core/tools/utils/test_web_reader_tool.py (modified, +12/-12)
  • api/tests/unit_tests/core/workflow/test_node_factory.py (modified, +4/-3)
  • api/tests/unit_tests/core/workflow/test_workflow_entry.py (modified, +2/-2)
  • api/tests/unit_tests/factories/test_build_from_mapping.py (modified, +1/-1)
  • api/tests/unit_tests/factories/test_file_factory.py (modified, +1/-1)
  • api/tests/unit_tests/services/rag_pipeline/test_rag_pipeline_dsl_service.py (modified, +6/-3)

PR #36332: fix(docker): harden default SSRF proxy egress

Description (problem / solution / changelog)

Summary

Closes #36400.

  • Harden the Docker Compose SSRF proxy defaults by denying loopback, private, CGN, link-local, ULA, multicast, reserved, and cloud metadata target networks before general egress is allowed.
  • Stop relying on the Ubuntu Squid image's packaged conf.d access rules so allow localnet cannot bypass Dify's policy.
  • Preserve the sandbox reverse proxy path and add explicit SSRF_PROXY_ALLOW_PRIVATE_IPS / SSRF_PROXY_ALLOW_PRIVATE_DOMAINS escape hatches for trusted private-network deployments.

Note: this PR should remain draft until #36397 is resolved, so Dify-owned signed file URLs can be handled through internal file retrieval instead of the SSRF proxy.

Screenshots

N/A

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've updated the documentation accordingly.

Changed files

  • docker/.env.example (modified, +1/-0)
  • docker/docker-compose-template.yaml (modified, +2/-0)
  • docker/docker-compose.middleware.yaml (modified, +2/-0)
  • docker/docker-compose.yaml (modified, +2/-0)
  • docker/envs/infrastructure/ssrf-proxy.env.example (modified, +2/-0)
  • docker/envs/middleware.env.example (modified, +3/-1)
  • docker/ssrf_proxy/docker-entrypoint.sh (modified, +28/-0)
  • docker/ssrf_proxy/squid.conf.template (modified, +31/-16)
  • docker/ssrf_proxy/test_ssrf_proxy_config.sh (added, +143/-0)
RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for refactors or chores; if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • Please do not modify this template :) and fill in all the required fields.

Description

Several backend paths fetch URL-backed inputs as remote files through the generic outbound HTTP client. This makes first-party Dify signed file URLs follow the same network path as arbitrary external URLs, even when those URLs point back to files already stored by the current Dify deployment.

Propose introducing a single remote-file retrieval boundary that:

  • handles remote file GET/HEAD operations in one module
  • recognizes valid Dify-signed file URLs for upload, tool, and datasource files
  • loads first-party files through database records and configured storage
  • delegates ordinary external URLs to the existing SSRF-protected HTTP client
  • documents when callers should use the remote-file fetcher versus the generic SSRF proxy

Motivation

Self-hosted deployments commonly configure file URLs that point back to the Dify service or to local network addresses. Treating those first-party file URLs as ordinary outbound HTTP can make remote-file workflows depend on network loopback behavior and duplicate signed-file validation logic across call sites.

Centralizing the behavior keeps remote file handling consistent while preserving the existing SSRF-protected path for true external HTTP requests.

Additional Context

Related historical reports include #30589 and #30780, but this proposal focuses on the backend remote-file retrieval boundary rather than a single workflow surface.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

dify - ✅(Solved) Fix [Refactor/Chore] Centralize remote file retrieval for signed file URLs [2 pull requests, 1 participants]