llamaIndex - ✅(Solved) Fix [Bug]: `DocumentSummaryIndex.delete_nodes()` crashes on invalid node IDs instead of skipping them [4 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21066Fetched 2026-04-08 00:58:19
View on GitHub
Comments
3
Participants
3
Timeline
22
Reactions
0
Timeline (top)
cross-referenced ×7referenced ×4commented ×3mentioned ×3

Error Message

Traceback (most recent call last):

Root Cause

delete_nodes() tries to filter out invalid node IDs by removing them from the input list while iterating over it. Because mutating a list during iteration can skip elements, some invalid IDs may remain in the list. Those IDs are then passed to _index_struct.delete_nodes(), which performs a raw dict lookup and raises KeyError

Fix Action

Fixed

PR fix notes

PR #21067: fix(core): prevent KeyError in DocumentSummaryIndex.delete_nodes when invalid node ID is provided

Description (problem / solution / changelog)

Description

Fixes a bug where delete_nodes() called list.remove() during iteration, causing the iterator to skip consecutive invalid IDs and pass them raw to the internal data struct, which raised KeyError. Invalid IDs are now accumulated into a separate list instead.

Fixes : #21066

It would now produce logs like ,

WARNING:llama_index.core.indices.document_summary.base:node_id does_not_exist_1 not found, will not be deleted.
WARNING:llama_index.core.indices.document_summary.base:node_id does_not_exist_2 not found, will not be deleted.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-core/llama_index/core/indices/document_summary/base.py (modified, +5/-3)
  • llama-index-core/tests/indices/document_summary/test_index.py (modified, +29/-0)

PR #21068: fix: DocumentSummaryIndex.delete_nodes() crashes on invalid node IDs

Description (problem / solution / changelog)

Summary

DocumentSummaryIndex.delete_nodes() was mutating the input list while iterating over it to filter out invalid node IDs, causing some invalid IDs to be skipped. Those IDs were then passed to _index_struct.delete_nodes(), which raised KeyError on the raw dict lookup.

Changes

  • Fixed list mutation during iteration in delete_nodes()
  • Invalid node IDs are now properly filtered before being passed to _index_struct.delete_nodes()

Testing

  • Created and ran a test script that reproduces the issue
  • Verified the fix handles multiple invalid IDs correctly
  • Edge cases tested: empty list, all invalid IDs, all valid IDs, single invalid ID, single valid ID

Fixes #21066

Changed files

  • llama-index-core/llama_index/core/indices/document_summary/base.py (modified, +5/-3)

PR #21072: fix: avoid list mutation during iteration in DocumentSummaryIndex.delete_nodes()

Description (problem / solution / changelog)

Summary

  • Fix DocumentSummaryIndex.delete_nodes() crashing with KeyError when given invalid node IDs
  • Replace in-place list.remove() during iteration with a filtered list approach

Root Cause

The delete_nodes() method mutated node_ids with remove() while iterating over it - a classic Python antipattern. When an element is removed during iteration, the iterator skips the next element, allowing invalid node IDs to pass through to code that raises KeyError.

Reproduction

from llama_index.core.indices.document_summary import DocumentSummaryIndex

index.delete_nodes(["valid_id", "does_not_exist_1", "does_not_exist_2"])
# Before fix: does_not_exist_2 slips through -> KeyError
# After fix: both invalid IDs are filtered out with warnings

Changes

  • llama-index-core/llama_index/core/indices/document_summary/base.py: Build a filtered valid_node_ids list instead of mutating node_ids during iteration

Fixes #21066

Changed files

  • llama-index-core/llama_index/core/indices/document_summary/base.py (modified, +4/-1)

PR #21077: fix(core): avoid mutating list during iteration in delete_nodes

Description (problem / solution / changelog)

Fixes #21066

DocumentSummaryIndex.delete_nodes() calls node_ids.remove() inside a for loop over node_ids, which skips elements and can raise KeyError. Replaced with a filtered list approach.

Also converts dict_keys to set for O(1) lookups.

Test plan

  • Added test_delete_nodes_with_invalid_ids regression test
  • uv run -- pytest tests/indices/document_summary/ -v passes

Changed files

  • llama-index-core/llama_index/core/indices/document_summary/base.py (modified, +5/-3)
  • llama-index-core/tests/indices/document_summary/test_index.py (modified, +13/-0)

Code Example

### Relevant Logs/Tracbacks
RAW_BUFFERClick to expand / collapse

Bug Description

delete_nodes() tries to filter out invalid node IDs by removing them from the input list while iterating over it. Because mutating a list during iteration can skip elements, some invalid IDs may remain in the list. Those IDs are then passed to _index_struct.delete_nodes(), which performs a raw dict lookup and raises KeyError

Version

0.14.18

Steps to Reproduce

from llama_index.core import Document, Settings
from llama_index.core.indices.document_summary import DocumentSummaryIndex
from llama_index.core.llms.mock import MockLLM
from llama_index.core.embeddings.mock_embed_model import MockEmbedding

Settings.llm = MockLLM()
Settings.embed_model = MockEmbedding(embed_dim=8)

index = DocumentSummaryIndex.from_documents([Document(text="Hello world")])
index.delete_nodes(["does_not_exist_1", "does_not_exist_2"])

Relevant Logs/Tracbacks

Traceback (most recent call last):
  File "D:\code\llama_index\repro_delete_nodes.py", line 10, in <module>
    index.delete_nodes(["does_not_exist_1", "does_not_exist_2"])
  File "D:\code\llama_index\llama-index-core\llama_index\core\indices\document_summary\base.py", line 265, in delete_nodes
    self._index_struct.delete_nodes(node_ids)
  File "D:\code\llama_index\llama-index-core\llama_index\core\data_structs\document_summary.py", line 67, in delete_nodes
    summary_id = self.node_id_to_summary_id[node_id]
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'does_not_exist_2'

extent analysis

Fix Plan

To fix the issue, we need to avoid mutating the list while iterating over it. We can create a copy of the list before filtering out invalid node IDs.

Code Changes

def delete_nodes(self, node_ids):
    # Create a copy of the list to avoid mutating during iteration
    node_ids_copy = node_ids.copy()
    
    # Filter out invalid node IDs
    valid_node_ids = [node_id for node_id in node_ids_copy if node_id in self._index_struct.node_id_to_summary_id]
    
    # Pass the filtered list to _index_struct.delete_nodes()
    self._index_struct.delete_nodes(valid_node_ids)

Alternatively, you can use a set to keep track of valid node IDs for efficient lookups:

def delete_nodes(self, node_ids):
    # Create a set of valid node IDs for efficient lookups
    valid_node_ids = set(self._index_struct.node_id_to_summary_id.keys())
    
    # Filter out invalid node IDs
    valid_node_ids_to_delete = [node_id for node_id in node_ids if node_id in valid_node_ids]
    
    # Pass the filtered list to _index_struct.delete_nodes()
    self._index_struct.delete_nodes(valid_node_ids_to_delete)

Verification

To verify the fix, run the provided reproduction code again:

index.delete_nodes(["does_not_exist_1", "does_not_exist_2"])

The code should no longer raise a KeyError. You can also add test cases to ensure that the delete_nodes method correctly filters out invalid node IDs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: `DocumentSummaryIndex.delete_nodes()` crashes on invalid node IDs instead of skipping them [4 pull requests, 3 comments, 3 participants]