llamaIndex - ✅(Solved) Fix [Bug]: S3 Vector Store: Filterable metadata must have at most 2048 bytes [1 pull requests, 6 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21062Fetched 2026-04-08 00:57:21
View on GitHub
Comments
6
Participants
2
Timeline
15
Reactions
0
Author
Timeline (top)
commented ×5mentioned ×3subscribed ×3labeled ×2

Error Message

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the PutVectors operation: Invalid record for key '0d0cf068-94a4-4349-9ffc-96b166bd30dc': Filterable metadata must have at most 2048 bytes

PR fix notes

PR #21279: fix(s3-vector-store): warn when _node_content exceeds S3 filterable metadata limit

Description (problem / solution / changelog)

Fixes #21062

Problem

When users create an S3 Vectors index manually (e.g., via the AWS console) and use S3VectorStore to insert nodes, they receive:

ValidationException: Filterable metadata must have at most 2048 bytes

The root cause is that node_to_metadata_dict() always adds a _node_content field containing the serialized JSON of the entire node, which is typically much larger than 2048 bytes. AWS S3 Vectors enforces this limit only on filterable metadata fields, so the field must be configured as non-filterable in the index.

create_index_from_bucket() already handles this correctly by adding _node_content and _node_type to metadataConfiguration.nonFilterableMetadataKeys. However, users creating their index by other means had no guidance on this requirement.

Solution

  • Add a clear note to the class docstring explaining that _node_content and _node_type must be configured as non-filterable keys, with a recommendation to use create_index_from_bucket().
  • In add(), emit a logger.warning (once per call) when _node_content exceeds 2048 bytes, pointing users to the correct fix before the AWS API call fails.

Testing

Existing tests continue to pass (they use create_index_from_bucket() which already configures the index correctly). The new warning path is triggered only when _node_content exceeds 2048 bytes in the metadata and index is not configured with non-filterable keys.

Changed files

  • llama-index-integrations/vector_stores/llama-index-vector-stores-s3/llama_index/vector_stores/s3/base.py (modified, +31/-0)

Code Example

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the PutVectors operation: Invalid record for key '0d0cf068-94a4-4349-9ffc-96b166bd30dc': Filterable metadata must have at most 2048 bytes

---

...

nodes = pipeline.run(documents=documents)

# strip the chunks of metadata that is not relevant (s3 has metadata limits), preserve just file name
for node in nodes:
    node.metadata = {k: v for k, v in node.metadata.items() if k in ["file_name"]}

for i, node in enumerate(nodes):
    meta_size = sys.getsizeof(json.dumps(node.metadata))

# initiate the vector store
vector_store = S3VectorStore(
    bucket_name_or_arn=S3_VECTOR_BUCKET,
    index_name_or_arn=S3_VECTOR_INDEX,
    data_type="float32",
    distance_metric="cosine",
)

...

---
RAW_BUFFERClick to expand / collapse

Bug Description

Despite manually filtering out the metadata (and confirming the metadata size), I am still getting the error:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the PutVectors operation: Invalid record for key '0d0cf068-94a4-4349-9ffc-96b166bd30dc': Filterable metadata must have at most 2048 bytes

Version

0.14.18

Steps to Reproduce

Pipeline and filtering process:

...

nodes = pipeline.run(documents=documents)

# strip the chunks of metadata that is not relevant (s3 has metadata limits), preserve just file name
for node in nodes:
    node.metadata = {k: v for k, v in node.metadata.items() if k in ["file_name"]}

for i, node in enumerate(nodes):
    meta_size = sys.getsizeof(json.dumps(node.metadata))

# initiate the vector store
vector_store = S3VectorStore(
    bucket_name_or_arn=S3_VECTOR_BUCKET,
    index_name_or_arn=S3_VECTOR_INDEX,
    data_type="float32",
    distance_metric="cosine",
)

...

Relevant Logs/Tracbacks

extent analysis

Fix Plan

The fix involves ensuring that the metadata size is within the 2048 bytes limit before calling the PutVectors operation.

Step-by-Step Solution

  • Check the metadata size after filtering and before calling PutVectors:
for i, node in enumerate(nodes):
    node.metadata = {k: v for k, v in node.metadata.items() if k in ["file_name"]}
    meta_size = sys.getsizeof(json.dumps(node.metadata))
    if meta_size > 2048:
        # Handle the case where metadata size exceeds the limit
        # For example, truncate the file name or remove other metadata
        node.metadata["file_name"] = node.metadata["file_name"][:200]  # truncate file name
  • Alternatively, use a more efficient metadata storage approach, such as storing metadata in a separate database or using a more compact metadata format.
  • Verify that the node.metadata dictionary only contains the necessary keys and values before calling PutVectors.

Verification

  • Check the metadata size after applying the fix: print(sys.getsizeof(json.dumps(node.metadata)))
  • Verify that the PutVectors operation succeeds without throwing a ValidationException.

Extra Tips

  • Consider adding a check for metadata size limits in the S3VectorStore class to prevent similar issues in the future.
  • Use a logging mechanism to monitor metadata sizes and detect potential issues before they cause errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING