langchain - ✅(Solved) Fix `XMLOutputParser._root_to_dict` drops child elements when parent has text [2 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36744Fetched 2026-04-17 08:23:11
View on GitHub
Comments
2
Participants
3
Timeline
12
Reactions
0
Timeline (top)
mentioned ×3subscribed ×3commented ×2cross-referenced ×2

_root_to_dict in langchain_core/output_parsers/xml.py checks if root.text and returns early with {root.tag: root.text}. When an element has both text and children (mixed content), this early return discards all child elements.

The if root.text branch should only fire when there are zero children. When both text and children exist, text should be included alongside the children rather than replacing them.

Error Message

Error Message and Stack Trace (if applicable)

No error raised. The child element is silently dropped.

Description

_root_to_dict in langchain_core/output_parsers/xml.py checks if root.text and returns early with {root.tag: root.text}. When an element has both text and children (mixed content), this early return discards all child elements.

The if root.text branch should only fire when there are zero children. When both text and children exist, text should be included alongside the children rather than replacing them.

System Info

Root Cause

_root_to_dict in langchain_core/output_parsers/xml.py checks if root.text and returns early with {root.tag: root.text}. When an element has both text and children (mixed content), this early return discards all child elements.

The if root.text branch should only fire when there are zero children. When both text and children exist, text should be included alongside the children rather than replacing them.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

PR fix notes

PR #36842: fix(core): fix XML mixed content parsing in XMLOutputParser (preserve children when text is present)

Description (problem / solution / changelog)

Fixes #36744

This PR fixes a bug in _root_to_dict where elements containing both text and child nodes (mixed content) were incorrectly parsed. The previous implementation returned early when root.text was non-empty, causing all child elements to be discarded.

Changed files

  • libs/core/langchain_core/output_parsers/xml.py (modified, +1/-1)

PR #36843: fix(core): fix XML mixed content parsing in XMLOutputParser (preserve children when text is present)

Description (problem / solution / changelog)

Fixes #36744

This PR fixes a bug in _root_to_dict where elements containing both text and child nodes (mixed content) were incorrectly parsed. The previous implementation returned early when root.text was non-empty, causing all child elements to be discarded.

Changed files

  • libs/core/langchain_core/output_parsers/xml.py (modified, +9/-5)
  • libs/core/tests/unit_tests/output_parsers/test_xml_parser.py (modified, +13/-0)

Code Example

from langchain_core.output_parsers import XMLOutputParser

parser = XMLOutputParser()

# Element has both text and children
xml_str = "<result>Summary<detail>info</detail></result>"
parsed = parser.parse(f"

---

### Error Message and Stack Trace (if applicable)

No error raised. The child element is silently dropped.

### Description

`_root_to_dict` in `langchain_core/output_parsers/xml.py` checks `if root.text` and returns early with `{root.tag: root.text}`. When an element has both text and children (mixed content), this early return discards all child elements.

The `if root.text` branch should only fire when there are zero children. When both text and children exist, text should be included alongside the children rather than replacing them.

### System Info
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain-core

Reproduction Steps / Example Code (Python)

from langchain_core.output_parsers import XMLOutputParser

parser = XMLOutputParser()

# Element has both text and children
xml_str = "<result>Summary<detail>info</detail></result>"
parsed = parser.parse(f"```xml\n{xml_str}\n```")
print(parsed)
# Actual:   {'result': 'Summary'}
# Expected: {'result': [{'_text': 'Summary'}, {'detail': 'info'}]}

Error Message and Stack Trace (if applicable)

No error raised. The child element is silently dropped.

Description

_root_to_dict in langchain_core/output_parsers/xml.py checks if root.text and returns early with {root.tag: root.text}. When an element has both text and children (mixed content), this early return discards all child elements.

The if root.text branch should only fire when there are zero children. When both text and children exist, text should be included alongside the children rather than replacing them.

System Info

System Information
------------------
> OS:  Windows
> OS Version:  10.0.26200
> Python Version:  3.13.7

Package Information
-------------------
> langchain_core: 1.3.0a1
> langsmith: 0.7.13

extent analysis

TL;DR

The issue can be fixed by modifying the _root_to_dict function in langchain_core/output_parsers/xml.py to handle elements with both text and children.

Guidance

  • The current implementation of _root_to_dict returns early when root.text is present, discarding child elements. This logic should be adjusted to only return early when there are no children.
  • To verify the fix, run the provided reproduction steps and check that the output includes both the text and child elements.
  • The modified function should check for the presence of children before deciding whether to return the text or a dictionary with text and children.
  • Consider adding a test case to cover this scenario and prevent similar issues in the future.

Example

def _root_to_dict(root):
    if root.text and len(root) == 0:  # Check for text and no children
        return {root.tag: root.text}
    elif len(root) > 0:  # If there are children, include text alongside them
        result = {}
        if root.text:
            result['_text'] = root.text
        for child in root:
            if child.tag not in result:
                result[child.tag] = []
            result[child.tag].append(_root_to_dict(child))
        return {root.tag: result}
    else:
        return {root.tag: {}}

Notes

The provided example code snippet assumes that the desired output format is a dictionary with the text as a value under the _text key and the child elements as values under their respective tags.

Recommendation

Apply workaround: The issue can be resolved by modifying the _root_to_dict function as described above, allowing elements with both text and children to be parsed correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - ✅(Solved) Fix `XMLOutputParser._root_to_dict` drops child elements when parent has text [2 pull requests, 2 comments, 3 participants]