dify - ✅(Solved) Fix metadata_filtering_conditions in retrieval_model is silently ignored for /v1/datasets/{dataset_id}/hit-testing and /v1/datasets/{dataset_id}/retrieve API endpoints [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langgenius/dify#35666Fetched 2026-04-30 06:45:55
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
1
Author
Participants
Timeline (top)
commented ×2cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #35700: fix(api): preserve dataset metadata filters

Description (problem / solution / changelog)

Summary

Fixes #35666.

The Service API hit-testing/retrieve endpoints already pass retrieval_model into HitTestingService.retrieve, and that service already knows how to apply metadata_filtering_conditions when present. However, the request schema model for retrieval_model did not include metadata_filtering_conditions, so Pydantic validation/model dumping dropped the field before it reached the retrieval layer.

This adds metadata_filtering_conditions to the shared RetrievalModel schema and adds a service API regression test proving the filter survives request parsing and is forwarded to HitTestingService.retrieve.

Why this matters

Requests like this should filter by metadata:

{
  "query": "some query",
  "retrieval_model": {
    "search_method": "semantic_search",
    "reranking_enable": false,
    "top_k": 4,
    "score_threshold_enabled": false,
    "metadata_filtering_conditions": {
      "logical_operator": "and",
      "conditions": [
        {"name": "category", "comparison_operator": "is", "value": "finance"}
      ]
    }
  }
}

Before this change, the metadata filter was silently removed during payload validation.

Validation

  • python3 -m py_compile api/services/entities/knowledge_entities/knowledge_entities.py api/tests/unit_tests/controllers/service_api/dataset/test_hit_testing.py
  • git diff --check

I also attempted the focused pytest command:

  • UV_CACHE_DIR=/tmp/uv-cache uv run --frozen pytest tests/unit_tests/controllers/service_api/dataset/test_hit_testing.py -q

but the local runner could not complete dependency setup because this workspace ran out of disk while uv was downloading/building Dify's full API dependency graph.

Changed files

  • api/services/entities/knowledge_entities/knowledge_entities.py (modified, +2/-0)
  • api/tests/unit_tests/controllers/service_api/dataset/test_hit_testing.py (modified, +56/-0)

Code Example

{
  "query": "some query",
  "retrieval_model": {
    "search_method": "semantic_search",
    "reranking_enable": false,
    "top_k": 4,
    "score_threshold_enabled": false,
    "metadata_filtering_conditions": {
      "logical_operator": "and",
      "conditions": [
        {
          "name": "category",
          "comparison_operator": "is",
          "value": "finance"
        }
      ]
    }
  }
}
RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.13.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Description

When calling the Service API endpoints for dataset retrieval/hit-testing and passing metadata_filtering_conditions inside retrieval_model, the metadata filter is silently ignored and all chunks are returned without any filtering applied.

This worked in versions v1.8.0 but is broken in 1.13.3.

Affected Endpoints

• POST /v1/datasets/<dataset_id>/hit-testing • POST /v1/datasets/<dataset_id>/retrieve

Steps to Reproduce

  1. Create a dataset with documents that have metadata fields set.
  2. Call the hit-testing endpoint with a metadata_filtering_conditions in retrieval_model:
{
  "query": "some query",
  "retrieval_model": {
    "search_method": "semantic_search",
    "reranking_enable": false,
    "top_k": 4,
    "score_threshold_enabled": false,
    "metadata_filtering_conditions": {
      "logical_operator": "and",
      "conditions": [
        {
          "name": "category",
          "comparison_operator": "is",
          "value": "finance"
        }
      ]
    }
  }
}
  1. Observe that results are returned without the metadata filter applied — documents that do not match the condition are still returned.

✔️ Expected Behavior

Only chunks/documents matching the metadata_filtering_conditions should be returned.

❌ Actual Behavior

The metadata_filtering_conditions is silently dropped and has no effect. All chunks matching the query are returned regardless of metadata values.

extent analysis

TL;DR

The issue can be addressed by investigating and potentially updating the metadata filtering logic in the retrieval model for the affected API endpoints.

Guidance

  • Review the changes made to the metadata filtering logic between versions v1.8.0 and 1.13.3 to identify the cause of the regression.
  • Verify that the metadata_filtering_conditions are being correctly parsed and applied in the retrieval model for the affected endpoints.
  • Check the documentation for any changes in the expected format or usage of metadata_filtering_conditions in the retrieval_model.
  • Test the filtering with different logical operators and conditions to see if the issue is specific to certain scenarios.

Example

No code snippet is provided as the issue seems to be related to the internal logic of the API endpoints rather than a specific code block that can be modified.

Notes

The issue might be specific to the Self Hosted (Docker) setup, and it's unclear if the same problem occurs in Cloud Hosted environments. Additionally, the root cause of the regression is not immediately apparent and may require further investigation.

Recommendation

Apply workaround: Temporarily downgrade to version v1.8.0 if possible, or wait for a patch release that addresses the metadata filtering issue, as the current version (1.13.3) has a confirmed regression.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

dify - ✅(Solved) Fix metadata_filtering_conditions in retrieval_model is silently ignored for /v1/datasets/{dataset_id}/hit-testing and /v1/datasets/{dataset_id}/retrieve API endpoints [1 pull requests, 2 comments, 2 participants]