llamaIndex - ✅(Solved) Fix [Bug]: `legacy_json_to_doc` drops persisted `doc_id` and generates a new UUID [2 pull requests, 1 comments, 1 participants]

llamaIndex2026-02-19 14:35:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#20749•Fetched 2026-04-08 00:31:12

View on GitHub

Comments

Participants

Timeline

Reactions

Author

gautamvarmadatla

Participants

gautamvarmadatla

Timeline (top)

cross-referenced ×2labeled ×2closed ×1referenced ×1

Error Message

AssertionError Traceback (most recent call last)

Fix Action

Fixed

Fixed by PR: fix(core): preserve doc_id in legacy_json_to_doc (https://github.com/run-llama/llama_index/pull/20750)
Fixed by PR: fix: preserve doc_id in legacy_json_to_doc (https://github.com/run-llama/llama_index/pull/20754)

PR fix notes

PR #20750: fix(core): preserve doc_id in legacy_json_to_doc

Repository: run-llama/llama_index
Author: gautamvarmadatla
State: closed | merged: True
Link: https://github.com/run-llama/llama_index/pull/20750

Description (problem / solution / changelog)

Description

This PR fixes a legacy deserialization bug in legacy_json_to_doc where the persisted doc_id from legacy payloads was not preserved and a new UUID was generated instead.

Fixes #20749

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Type of Change

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

Changed files

llama-index-core/llama_index/core/storage/docstore/utils.py (modified, +4/-4)
llama-index-core/tests/storage/docstore/test_legacy_json_to_doc.py (added, +41/-0)

PR #20754: fix: preserve doc_id in legacy_json_to_doc

Repository: run-llama/llama_index
Author: danielalanbates
State: closed | merged: False
Link: https://github.com/run-llama/llama_index/pull/20754

Description (problem / solution / changelog)

Description

Fixes #20749

legacy_json_to_doc was passing id=id_ to the node constructors, but the Pydantic field is named id_, not id. This caused the persisted doc_id to be silently dropped and a new UUID to be generated, breaking backward compatibility for users restoring or migrating persisted docstores.

Changes

Changed id=id_ to id_=id_ in all four constructor calls within legacy_json_to_doc (Document, TextNode, ImageNode, IndexNode).

Reproduction

from llama_index.core.constants import DATA_KEY, TYPE_KEY
from llama_index.core.schema import Document
from llama_index.core.storage.docstore.utils import legacy_json_to_doc

doc_dict = {
    TYPE_KEY: Document.get_type(),
    DATA_KEY: {
        "text": "hello",
        "extra_info": {},
        "doc_id": "doc-123",
        "relationships": {},
    },
}

loaded = legacy_json_to_doc(doc_dict)
assert loaded.id_ == "doc-123"  # Previously failed: got a new UUID

This PR was created with the assistance of Claude Opus 4.6 by Anthropic. Happy to make any adjustments! Reviewed and submitted by a human.

Changed files

llama-index-core/llama_index/core/storage/docstore/utils.py (modified, +4/-4)

Code Example

### Relevant Logs/Tracbacks

RAW_BUFFERClick to expand / collapse

Bug Description

When loading legacy node JSON via legacy_json_to_doc, the doc_id stored in the legacy payload is not preserved and a new UUID is generated instead. This would break the backward compatibility for users restoring or migrating persisted stores where stable node IDs are required (e.g., docstore lookups, relationship resolution, etc).

Version

llama-index-core 0.14.15

Steps to Reproduce

from llama_index.core.constants import DATA_KEY, TYPE_KEY
from llama_index.core.schema import Document
from llama_index.core.storage.docstore.utils import legacy_json_to_doc

doc_dict = {
    TYPE_KEY: Document.get_type(),
    DATA_KEY: {
        "text": "hello",
        "extra_info": {},
        "doc_id": "doc-123",
        "relationships": {},
    },
}

loaded = legacy_json_to_doc(doc_dict)

print("expected:", "doc-123")
print("actual:  ", loaded.id_)

assert loaded.id_ == "doc-123", "BUG: legacy loader lost persisted doc_id"

Relevant Logs/Tracbacks

expected: doc-123
actual:   b980379e-b733-49dd-8cfc-4e5bfed7fca5
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipython-input-767892816.py in <cell line: 0>()
     18 print("actual:  ", loaded.id_)
     19 
---> 20 assert loaded.id_ == "doc-123", "BUG: legacy loader lost persisted doc_id"

AssertionError: BUG: legacy loader lost persisted doc_id

extent analysis

Fix Plan

Fix Name

Preserve Legacy doc_id in legacy_json_to_doc

Steps to Fix

Update legacy_json_to_doc function to preserve the doc_id from the legacy payload:

from llama_index.core.storage.docstore.utils import legacy_json_to_doc

def legacy_json_to_doc(doc_dict): # ... existing code ... doc.id_ = doc_dict[DATA_KEY]["doc_id"] # Preserve doc_id from legacy payload # ... existing code ...

2. **Update `Document` class** to use the preserved `doc_id`:
   ```python
from llama_index.core.schema import Document

class Document:
    # ... existing code ...
    def __init__(self, id_, ...):
        self.id_ = id_  # Use preserved doc_id
        # ... existing code ...

Update tests to reflect the fix:

from llama_index.core.constants import DATA_KEY, TYPE_KEY from llama_index.core.schema import Document from llama_index.core.storage.docstore.utils import legacy_json_to_doc

doc_dict = { TYPE_KEY: Document.get_type(), DATA_KEY: { "text": "hello", "extra_info": {}, "doc_id": "doc-123", "relationships": {}, }, }

loaded = legacy_json_to_doc(doc_dict)

print("expected:", "doc-123") print("actual: ", loaded.id_)

assert loaded.id_ == "doc-123", "FIXED: legacy loader preserves persisted doc_id"

### Verification

Run the updated test to ensure the fix works as expected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #task chaining #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

llamaIndex - ✅(Solved) Fix [Bug]: `legacy_json_to_doc` drops persisted `doc_id` and generates a new UUID [2 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #20750: fix(core): preserve doc_id in legacy_json_to_doc

Description (problem / solution / changelog)

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Changed files

PR #20754: fix: preserve doc_id in legacy_json_to_doc

Description (problem / solution / changelog)

Description

Changes

Reproduction

Changed files

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

Fix Plan

Fix Name

Steps to Fix

Still need to ship something?

RELATED_DISCOVERY

TRENDING