crewai - ✅(Solved) Fix feat(valkey): metadata_filter fields not indexed in memory_index FT schema [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#5794Fetched 2026-05-14 03:34:10
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
cross-referenced ×1mentioned ×1subscribed ×1

The metadata_filter clauses constructed in ValkeyStorage._vector_search() reference arbitrary metadata key fields (e.g. @key:value), but the memory_index FT schema only defines the following fields:

  • VectorField("embedding")
  • TagField("scope")
  • TagField("categories")
  • NumericField("created_at")
  • NumericField("importance")

As a result, metadata_filter cannot actually restrict FT.SEARCH results and will either error or silently return wrong results.

Affected file: lib/crewai/src/crewai/memory/storage/valkey_storage.py (around lines 513–519 and 1237–1245)

Error Message

As a result, metadata_filter cannot actually restrict FT.SEARCH results and will either error or silently return wrong results.

Root Cause

The metadata_filter clauses constructed in ValkeyStorage._vector_search() reference arbitrary metadata key fields (e.g. @key:value), but the memory_index FT schema only defines the following fields:

  • VectorField("embedding")
  • TagField("scope")
  • TagField("categories")
  • NumericField("created_at")
  • NumericField("importance")

As a result, metadata_filter cannot actually restrict FT.SEARCH results and will either error or silently return wrong results.

Affected file: lib/crewai/src/crewai/memory/storage/valkey_storage.py (around lines 513–519 and 1237–1245)

Fix Action

Fixed

PR fix notes

PR #5797: fix(valkey): post-filter metadata outside FT.SEARCH (#5794)

Description (problem / solution / changelog)

Summary

Fixes #5794. The memory_index FT schema only materializes embedding, scope, categories, created_at, and importance. ValkeyStorage._vector_search was emitting metadata_filter clauses as @{key}:{value} against that index, but metadata keys are user-defined and not part of the schema, so the FT.SEARCH server would either error out or silently return results that the metadata predicate failed to narrow.

This PR implements Option B from the issue: apply metadata_filter as a Python post-filter outside FT.SEARCH. Scope and category predicates remain pushed down into FT.SEARCH because they are valid index fields. The chosen approach avoids changing the index schema and works for arbitrary, dynamic metadata keys.

Key changes in lib/crewai/src/crewai/memory/storage/valkey_storage.py:

  • _vector_search no longer emits @<key>:{value} clauses for metadata_filter keys.
  • When metadata_filter is supplied, KNN is overfetched (limit * 10, capped at 1000) so the post-filter still returns the caller-requested number of hits in the common case. The final result is truncated back to limit.
  • New _matches_metadata_filter helper performs string-coerced equality so callers can pass numeric / boolean / string filter values interchangeably.
  • An empty metadata_filter={} is normalized to None (no overfetch, no post-filter work).
  • Docstrings on _vector_search, asearch, and search updated to reflect the new contract.

Tests in lib/crewai/tests/memory/storage/test_valkey_storage_search.py:

  • New TestValkeyStorageMetadataPostFilter class — regression tests for #5794 covering: predicates are not pushed into the FT query, missing-key records are dropped, mismatched-value records are dropped, multi-key AND logic, numeric values, empty filter dict, overfetch preserves caller limit, truncation, and scope/categories pushdown is unaffected.
  • New TestValkeyStorageMatchesMetadataFilter class — unit tests for the new helper.
  • Three pre-existing tests that asserted the broken contract (metadata predicates appearing in the FT.SEARCH query) are updated to assert the new contract.

Dependency note

This PR builds on the unmerged #5703 (the PR that introduces valkey_storage.py). The two changed files are exclusive to #5703, so the diff against main includes the parent PR's contents. Recommend merging #5703 first, then rebasing this PR onto main — at that point the diff against main will collapse to only this fix.

Verification

Locally on python3.13 + uv sync --all-extras --dev:

uv run pytest lib/crewai/tests/memory/storage/ -q
# 199 passed, 202 warnings in 16.06s

uv run pytest lib/crewai/tests/memory/storage/test_valkey_storage_search.py -vv
# 41 passed in 25.77s

uv run ruff check lib/crewai/src/crewai/memory/storage/valkey_storage.py lib/crewai/tests/memory/storage/test_valkey_storage_search.py
# All checks passed!

uv run mypy lib/crewai/src/crewai/memory/storage/valkey_storage.py
# Success: no issues found in 1 source file

Review & Testing Checklist for Human

  • Confirm Option B (Python post-filter) is the preferred approach. The issue lists two remediation options, and the alternative (Option A: materialize metadata into the FT schema) requires defining a known key set upfront. If Option A is preferred, this PR's approach should be reverted.
  • Sanity-check the overfetch parameters: _METADATA_POSTFILTER_OVERFETCH = 10 and _METADATA_POSTFILTER_MAX_FETCH = 1000. These trade off correctness (more candidates → more likely to satisfy limit) vs. latency. For workloads where most records match the metadata predicate the defaults are likely fine; for highly selective predicates the overfetch may still not return limit results — that's an inherent post-filter limitation.
  • Run an end-to-end test against a real Valkey instance (Docker valkey/valkey-bundle:latest) to confirm the post-filter behaves correctly when records are returned from a live FT.SEARCH, not just mocks.
  • Confirm this PR should target main and is gated on #5703 merging first.

Notes

  • The post-filter uses str(record_metadata[key]) != str(expected) so callers can pass numeric or boolean filter values without first coercing them to strings. This mirrors how the legacy code stringified values via f"@{key}:{{{str(value)}}}".
  • Truncation to limit happens after both the scope boundary check and the metadata post-filter, so records that pass all filters are returned in descending-score order.

Link to Devin session: https://app.devin.ai/sessions/7edf70b38f1f4d9597b636d3fd0a31e5

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

Summary by CodeRabbit

  • New Features

    • Added Valkey cache backend support as an alternative to Redis for caching and memory storage.
    • New Valkey cache and storage implementations for improved performance and flexibility.
  • Improvements

    • Enhanced embedding validation and serialization in memory system.
    • Improved timeout handling for memory drainage operations.
    • Better cache configuration utilities.
  • Tests

    • Added comprehensive test coverage for Valkey implementations and cache functionality.
<!-- review_stack_entry_start -->

Review Change Stack

<!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->

Changed files

  • lib/crewai-files/src/crewai_files/cache/upload_cache.py (modified, +163/-100)
  • lib/crewai/pyproject.toml (modified, +3/-0)
  • lib/crewai/src/crewai/a2a/utils/agent_card.py (modified, +20/-2)
  • lib/crewai/src/crewai/a2a/utils/task.py (modified, +94/-52)
  • lib/crewai/src/crewai/memory/encoding_flow.py (modified, +22/-1)
  • lib/crewai/src/crewai/memory/storage/valkey_cache.py (added, +198/-0)
  • lib/crewai/src/crewai/memory/storage/valkey_storage.py (added, +1967/-0)
  • lib/crewai/src/crewai/memory/types.py (modified, +65/-2)
  • lib/crewai/src/crewai/memory/unified_memory.py (modified, +64/-3)
  • lib/crewai/src/crewai/utilities/cache_config.py (added, +78/-0)
  • lib/crewai/tests/memory/storage/test_valkey_cache.py (added, +511/-0)
  • lib/crewai/tests/memory/storage/test_valkey_storage.py (added, +3074/-0)
  • lib/crewai/tests/memory/storage/test_valkey_storage_errors.py (added, +267/-0)
  • lib/crewai/tests/memory/storage/test_valkey_storage_scope.py (added, +1110/-0)
  • lib/crewai/tests/memory/storage/test_valkey_storage_search.py (added, +1426/-0)
  • lib/crewai/tests/memory/test_embedding_safety.py (added, +115/-0)
  • lib/crewai/tests/utilities/test_cache_config.py (added, +125/-0)
  • pyproject.toml (modified, +2/-0)
  • uv.lock (modified, +44/-2)
RAW_BUFFERClick to expand / collapse

Summary

The metadata_filter clauses constructed in ValkeyStorage._vector_search() reference arbitrary metadata key fields (e.g. @key:value), but the memory_index FT schema only defines the following fields:

  • VectorField("embedding")
  • TagField("scope")
  • TagField("categories")
  • NumericField("created_at")
  • NumericField("importance")

As a result, metadata_filter cannot actually restrict FT.SEARCH results and will either error or silently return wrong results.

Affected file: lib/crewai/src/crewai/memory/storage/valkey_storage.py (around lines 513–519 and 1237–1245)

Remediation options

Option A — Materialize metadata into the FT index: Add the necessary metadata fields to the memory_index schema (e.g. as TagField or NumericField depending on type) and update the schema construction accordingly. This requires defining a known set of metadata keys upfront or dynamically managing the index schema.

Option B — Post-filter outside FT.SEARCH: Remove metadata clauses from the FT query in _vector_search() and instead apply metadata_filter as a post-filter over the FT.SEARCH results before returning final hits. This is simpler but may return more documents from Valkey than strictly necessary.

Context

Raised during review of PR #5703 (ValkeyStorage vector memory backend, part 4/4). Deferred to a follow-up PR so maintainers can decide the preferred approach, given the broader schema implications.

References:

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

crewai - ✅(Solved) Fix feat(valkey): metadata_filter fields not indexed in memory_index FT schema [1 pull requests, 1 participants]