llamaIndex - ✅(Solved) Fix [Bug]: `refresh_ref_docs()` / `arefresh_ref_docs()` drop kwargs after the first document in a batch [2 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21518Fetched 2026-05-01 05:33:16
View on GitHub
Comments
2
Participants
3
Timeline
6
Reactions
0
Timeline (top)
commented ×2cross-referenced ×2labeled ×2

Error Message

Relevant Logs/Tracebacks

Root Cause

insert_kwargs and update_kwargs passed to refresh_ref_docs() are silently dropped after the first matching document because the method calls .pop() on the shared update_kwargs dict inside the document loop. So, the first inserted or updated document receives the expected kwargs, but subsequent documents in the same batch receive {} without any error.

Fix Action

Fixed

PR fix notes

PR #21519: fix(core): preserve refresh_ref_docs kwargs across batch documents

Description (problem / solution / changelog)

Description

Replaced .pop() with .get() in the refresh_ref_docs() / arefresh_ref_docs() document loops so insert_kwargs and update_kwargs are forwarded to every matching document in the batch, not just the first.

Fixes #21518

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-core/llama_index/core/indices/base.py (modified, +4/-4)
  • llama-index-core/tests/indices/vector_store/test_simple.py (modified, +33/-2)
  • llama-index-core/tests/indices/vector_store/test_simple_async.py (modified, +35/-2)

PR #21522: fix(indices): preserve insert_kwargs and update_kwargs across all documents in refresh_ref_docs batch

Description (problem / solution / changelog)

Fixes #21518 Problem refresh_ref_docs() and arefresh_ref_docs() called .pop() on the shared update_kwargs dict inside the document loop. The first document received the expected kwargs but all subsequent documents got {} silently. Fix Extract insert_kwargs and update_kwargs once before the loop using .get(): python_insert_kwargs = update_kwargs.get("insert_kwargs", {}) _update_kwargs = update_kwargs.get("update_kwargs", {}) Then reference those variables inside the loop instead of calling .pop() on each iteration. Changes Only llama-index-core/llama_index/core/indices/base.py modified.

Changed files

  • llama-index-core/llama_index/core/indices/base.py (modified, +8/-4)

Code Example

### Relevant Logs/Tracebacks
RAW_BUFFERClick to expand / collapse

Bug Description

insert_kwargs and update_kwargs passed to refresh_ref_docs() are silently dropped after the first matching document because the method calls .pop() on the shared update_kwargs dict inside the document loop. So, the first inserted or updated document receives the expected kwargs, but subsequent documents in the same batch receive {} without any error.

Version

0.14.21

Steps to Reproduce

from typing import Any, List

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.schema import BaseNode, TransformComponent


class RecordKwargs(TransformComponent):
    def __call__(self, nodes: List[BaseNode], **kwargs: Any) -> List[BaseNode]:
        print(f"transform received: {kwargs}")
        return nodes


docs = [Document(text=f"doc {i}") for i in range(3)]
index = VectorStoreIndex([], transformations=[RecordKwargs()])

print("refresh_ref_docs with insert_kwargs={'my_flag': True}:")
index.refresh_ref_docs(docs, insert_kwargs={"my_flag": True})

Relevant Logs/Tracebacks

refresh_ref_docs with insert_kwargs={'my_flag': True}:
transform received: {'my_flag': True}
transform received: {}
transform received: {}
[True, True, True]

extent analysis

TL;DR

Avoid using .pop() on the shared update_kwargs dict inside the document loop in refresh_ref_docs() to prevent silently dropping insert_kwargs and update_kwargs after the first matching document.

Guidance

  • Identify the line of code where .pop() is called on update_kwargs and refactor to avoid modifying the shared dictionary.
  • Consider creating a copy of update_kwargs for each document iteration to prevent unintended side effects.
  • Review the refresh_ref_docs() method to ensure it handles insert_kwargs and update_kwargs correctly for all documents in a batch.
  • Verify the fix by running the provided example code and checking the output of transform received to ensure it prints the expected kwargs for all documents.

Example

# Create a copy of update_kwargs for each document iteration
for doc in documents:
    update_kwargs_copy = update_kwargs.copy()
    # Use update_kwargs_copy instead of update_kwargs

Notes

The provided code snippet and logs suggest that the issue is related to the use of .pop() on the shared update_kwargs dict. However, without the complete implementation of refresh_ref_docs(), it's difficult to provide a more specific fix.

Recommendation

Apply workaround: Refactor the refresh_ref_docs() method to avoid modifying the shared update_kwargs dict, and consider creating a copy of update_kwargs for each document iteration. This should prevent the silent dropping of insert_kwargs and update_kwargs after the first matching document.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: `refresh_ref_docs()` / `arefresh_ref_docs()` drop kwargs after the first document in a batch [2 pull requests, 2 comments, 3 participants]