llamaIndex - ✅(Solved) Fix [Bug]: `legacy_json_to_doc` drops persisted `doc_id` and generates a new UUID [2 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20749Fetched 2026-04-08 00:31:12
View on GitHub
Comments
1
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
cross-referenced ×2labeled ×2closed ×1referenced ×1

Error Message

AssertionError Traceback (most recent call last)

Fix Action

Fixed

PR fix notes

PR #20750: fix(core): preserve doc_id in legacy_json_to_doc

Description (problem / solution / changelog)

Description

This PR fixes a legacy deserialization bug in legacy_json_to_doc where the persisted doc_id from legacy payloads was not preserved and a new UUID was generated instead.

Fixes #20749

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-core/llama_index/core/storage/docstore/utils.py (modified, +4/-4)
  • llama-index-core/tests/storage/docstore/test_legacy_json_to_doc.py (added, +41/-0)

PR #20754: fix: preserve doc_id in legacy_json_to_doc

Description (problem / solution / changelog)

Description

Fixes #20749

legacy_json_to_doc was passing id=id_ to the node constructors, but the Pydantic field is named id_, not id. This caused the persisted doc_id to be silently dropped and a new UUID to be generated, breaking backward compatibility for users restoring or migrating persisted docstores.

Changes

Changed id=id_ to id_=id_ in all four constructor calls within legacy_json_to_doc (Document, TextNode, ImageNode, IndexNode).

Reproduction

from llama_index.core.constants import DATA_KEY, TYPE_KEY
from llama_index.core.schema import Document
from llama_index.core.storage.docstore.utils import legacy_json_to_doc

doc_dict = {
    TYPE_KEY: Document.get_type(),
    DATA_KEY: {
        "text": "hello",
        "extra_info": {},
        "doc_id": "doc-123",
        "relationships": {},
    },
}

loaded = legacy_json_to_doc(doc_dict)
assert loaded.id_ == "doc-123"  # Previously failed: got a new UUID

This PR was created with the assistance of Claude Opus 4.6 by Anthropic. Happy to make any adjustments! Reviewed and submitted by a human.

Changed files

  • llama-index-core/llama_index/core/storage/docstore/utils.py (modified, +4/-4)

Code Example

### Relevant Logs/Tracbacks
RAW_BUFFERClick to expand / collapse

Bug Description

When loading legacy node JSON via legacy_json_to_doc, the doc_id stored in the legacy payload is not preserved and a new UUID is generated instead. This would break the backward compatibility for users restoring or migrating persisted stores where stable node IDs are required (e.g., docstore lookups, relationship resolution, etc).

Version

llama-index-core 0.14.15

Steps to Reproduce

from llama_index.core.constants import DATA_KEY, TYPE_KEY
from llama_index.core.schema import Document
from llama_index.core.storage.docstore.utils import legacy_json_to_doc

doc_dict = {
    TYPE_KEY: Document.get_type(),
    DATA_KEY: {
        "text": "hello",
        "extra_info": {},
        "doc_id": "doc-123",
        "relationships": {},
    },
}

loaded = legacy_json_to_doc(doc_dict)

print("expected:", "doc-123")
print("actual:  ", loaded.id_)

assert loaded.id_ == "doc-123", "BUG: legacy loader lost persisted doc_id"

Relevant Logs/Tracbacks

expected: doc-123
actual:   b980379e-b733-49dd-8cfc-4e5bfed7fca5
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipython-input-767892816.py in <cell line: 0>()
     18 print("actual:  ", loaded.id_)
     19 
---> 20 assert loaded.id_ == "doc-123", "BUG: legacy loader lost persisted doc_id"

AssertionError: BUG: legacy loader lost persisted doc_id

extent analysis

Fix Plan

Fix Name

Preserve Legacy doc_id in legacy_json_to_doc

Steps to Fix

  1. Update legacy_json_to_doc function to preserve the doc_id from the legacy payload:

from llama_index.core.storage.docstore.utils import legacy_json_to_doc

def legacy_json_to_doc(doc_dict): # ... existing code ... doc.id_ = doc_dict[DATA_KEY]["doc_id"] # Preserve doc_id from legacy payload # ... existing code ...

2. **Update `Document` class** to use the preserved `doc_id`:
   ```python
from llama_index.core.schema import Document

class Document:
    # ... existing code ...
    def __init__(self, id_, ...):
        self.id_ = id_  # Use preserved doc_id
        # ... existing code ...
  1. Update tests to reflect the fix:

from llama_index.core.constants import DATA_KEY, TYPE_KEY from llama_index.core.schema import Document from llama_index.core.storage.docstore.utils import legacy_json_to_doc

doc_dict = { TYPE_KEY: Document.get_type(), DATA_KEY: { "text": "hello", "extra_info": {}, "doc_id": "doc-123", "relationships": {}, }, }

loaded = legacy_json_to_doc(doc_dict)

print("expected:", "doc-123") print("actual: ", loaded.id_)

assert loaded.id_ == "doc-123", "FIXED: legacy loader preserves persisted doc_id"

### Verification

Run the updated test to ensure the fix works as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING