dify - 💡(How to fix) Fix Attachment-only hybrid retrieval is reranked as TEXT_QUERY instead of IMAGE_QUERY

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In RetrievalService._retrieve (api/core/rag/datasource/retrieval_service.py), the HYBRID_SEARCH branch derives the rerank query_type like this:

query = query or attachment_id
if not query:
    return
all_documents_item = data_post_processor.invoke(
    ...,
    query_type=QueryType.TEXT_QUERY if query else QueryType.IMAGE_QUERY,
)

The ternary is evaluated after query = query or attachment_id. For an attachment-only retrieval the caller passes query=None, attachment_id=<upload_file_id>, so by the time of the ternary query is the (truthy) upload-file id — and the if not query: return guard guarantees query is truthy here. The IMAGE_QUERY branch is therefore unreachable, and attachment-only hybrid retrieval is always reranked as TEXT_QUERY.

Root Cause

In RetrievalService._retrieve (api/core/rag/datasource/retrieval_service.py), the HYBRID_SEARCH branch derives the rerank query_type like this:

query = query or attachment_id
if not query:
    return
all_documents_item = data_post_processor.invoke(
    ...,
    query_type=QueryType.TEXT_QUERY if query else QueryType.IMAGE_QUERY,
)

The ternary is evaluated after query = query or attachment_id. For an attachment-only retrieval the caller passes query=None, attachment_id=<upload_file_id>, so by the time of the ternary query is the (truthy) upload-file id — and the if not query: return guard guarantees query is truthy here. The IMAGE_QUERY branch is therefore unreachable, and attachment-only hybrid retrieval is always reranked as TEXT_QUERY.

Fix Action

Fix

Compute query_type from the original query before it is overwritten with attachment_id. A PR with a regression test follows.

Code Example

query = query or attachment_id
if not query:
    return
all_documents_item = data_post_processor.invoke(
    ...,
    query_type=QueryType.TEXT_QUERY if query else QueryType.IMAGE_QUERY,
)
RAW_BUFFERClick to expand / collapse

Self-checks: searched existing issues, reproduced on latest main.

Summary

In RetrievalService._retrieve (api/core/rag/datasource/retrieval_service.py), the HYBRID_SEARCH branch derives the rerank query_type like this:

query = query or attachment_id
if not query:
    return
all_documents_item = data_post_processor.invoke(
    ...,
    query_type=QueryType.TEXT_QUERY if query else QueryType.IMAGE_QUERY,
)

The ternary is evaluated after query = query or attachment_id. For an attachment-only retrieval the caller passes query=None, attachment_id=<upload_file_id>, so by the time of the ternary query is the (truthy) upload-file id — and the if not query: return guard guarantees query is truthy here. The IMAGE_QUERY branch is therefore unreachable, and attachment-only hybrid retrieval is always reranked as TEXT_QUERY.

Impact

In RerankModelRunner.fetch_multimodal_rerank (core/rag/rerank/rerank_model.py), IMAGE_QUERY loads the upload file from storage and base64-encodes the image as the rerank query, whereas TEXT_QUERY passes the query string to the reranker as literal text. With this bug, an image/attachment-driven hybrid retrieval using a vision rerank model sends the raw upload-file-id string to the reranker as text instead of the image, producing incorrect rerank ordering.

Sibling reference

The non-hybrid embedding path in the same file already distinguishes them correctly — query_type=QueryType.TEXT_QUERY for the text future and query_type=QueryType.IMAGE_QUERY for the attachment future.

Fix

Compute query_type from the original query before it is overwritten with attachment_id. A PR with a regression test follows.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING