langchain - ✅(Solved) Fix ExperimentalMarkdownSyntaxTextSplitter silently discards content from unclosed code blocks [5 pull requests, 8 comments, 7 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36186Fetched 2026-04-08 01:21:39
View on GitHub
Comments
8
Participants
7
Timeline
24
Reactions
0
Timeline (top)
commented ×8cross-referenced ×4mentioned ×3referenced ×3

In ExperimentalMarkdownSyntaxTextSplitter._resolve_code_chunk(), when a markdown code block is never closed (missing closing fence like ``` or ~~~), the method returns an empty string "", silently discarding all accumulated content from the unclosed block.

Expected behavior: The splitter should preserve the content from unclosed code blocks rather than silently dropping it. Malformed markdown is common in real-world data, and silent data loss makes debugging very difficult.

Root cause: In langchain_text_splitters/markdown.py, the _resolve_code_chunk method accumulates content in chunk but returns "" if the closing fence is never found. The fix is to return the accumulated chunk instead.

Error Message

Error Message and Stack Trace (if applicable)

No error is raised. The content from the unclosed code block ( block without closing ) is silently discarded. The chunks output only contains "Some text before code." — all content inside and after the unclosed code block is lost.

Root Cause

Root cause: In langchain_text_splitters/markdown.py, the _resolve_code_chunk method accumulates content in chunk but returns "" if the closing fence is never found. The fix is to return the accumulated chunk instead.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

PR fix notes

PR #36185: fix(text-splitters): preserve content from unclosed markdown code blocks

Description (problem / solution / changelog)

In ExperimentalMarkdownSyntaxTextSplitter._resolve_code_chunk(), when a markdown code block is never closed (missing closing fence), the method returned an empty string "", silently discarding all accumulated content from the unclosed block. This fix returns the accumulated chunk content instead, ensuring no content is silently lost when processing malformed markdown.

Fixes #36186

Changed files

  • libs/text-splitters/langchain_text_splitters/markdown.py (modified, +3/-1)

PR #36266: fix(text-splitters): preserve content of unclosed markdown code blocks

Description (problem / solution / changelog)

Summary

  • _resolve_code_chunk in ExperimentalMarkdownSyntaxTextSplitter returned "" when no closing fence (``` or ~~~) was found, silently discarding all content inside the unclosed block.
  • Fix: return chunk instead of "" so accumulated content is preserved.

Test

Added test_experimental_markdown_syntax_text_splitter_unclosed_code_block to verify unclosed code blocks retain their content.

Fixes #36186

Changed files

  • libs/text-splitters/langchain_text_splitters/markdown.py (modified, +1/-1)
  • libs/text-splitters/tests/unit_tests/test_text_splitters.py (modified, +19/-0)

PR #36330: fix(text-splitters): preserve content from unclosed code blocks in ExperimentalMarkdownSyntaxTextSplitter

Description (problem / solution / changelog)

Fixes #36186.

Problem

ExperimentalMarkdownSyntaxTextSplitter._resolve_code_chunk() scans forward for a closing fence. If the input ends without one, it returns "", silently discarding all content accumulated since the opening fence:

def _resolve_code_chunk(self, current_line: str, raw_lines: list[str]) -> str:
    chunk = current_line
    while raw_lines:
        raw_line = raw_lines.pop(0)
        chunk += raw_line
        if self._match_code(raw_line):
            return chunk
    return ""   # ← all content lost here

Any Markdown document that ends inside a code block (unterminated fence, common in READMEs and LLM-generated text) will have that entire section dropped from the split output.

Fix

Return the accumulated chunk instead of "". If the opening fence was the last line, chunk is empty anyway, so there is no behaviour change for the empty case. For non-empty unclosed blocks, content is now preserved.

Testing

Added test_experimental_markdown_syntax_text_splitter_unclosed_code_block to tests/unit_tests/test_text_splitters.py. It splits a Markdown document whose code block has no closing fence and asserts the code body appears in the output.

Changed files

  • libs/text-splitters/langchain_text_splitters/markdown.py (modified, +1/-1)
  • libs/text-splitters/tests/unit_tests/test_text_splitters.py (modified, +16/-0)

PR #36501: fix(text-splitters): preserve content from unclosed markdown code blocks

Description (problem / solution / changelog)

Summary

  • Fixes a silent data loss bug in ExperimentalMarkdownSyntaxTextSplitter
  • When a markdown code block has no closing fence (```), _resolve_code_chunk() was returning "", discarding all accumulated content
  • Fix: return chunk instead of "" when the loop exhausts without finding a closing fence

Root cause (libs/text-splitters/langchain_text_splitters/markdown.py line 449):

# Before (buggy)
return ""

# After (fixed)
return chunk

Malformed/unclosed code blocks are common in real-world data (e.g., AI-generated markdown), making silent content loss difficult to debug.

Test plan

  • Added test_experimental_markdown_syntax_text_splitter_unclosed_code_block — verifies content from an unclosed code block is preserved, not silently dropped
  • Existing ExperimentalMarkdownSyntaxTextSplitter tests cover the happy path (closed blocks still work correctly)

Notes

This PR was developed with AI assistance (Claude Code). The fix and tests were reviewed and validated manually.

Fixes #36186

Changed files

  • libs/text-splitters/langchain_text_splitters/markdown.py (modified, +1/-1)
  • libs/text-splitters/tests/unit_tests/test_text_splitters.py (modified, +33/-0)

PR #36664: fix(text-splitters): preserve content from unclosed markdown code blocks

Description (problem / solution / changelog)

Summary

  • _resolve_code_chunk() returned empty string when a code block was never closed, silently discarding all content within it
  • Changed return "" to return chunk so nothing is lost

Fixes #36186

Test plan

  • Added test_experimental_markdown_syntax_text_splitter_unclosed_code_block
  • Existing markdown splitter tests still pass

AI assistance (Claude Code) was used. All changes reviewed and validated by the submitting human.

🤖 Generated with Claude Code

Changed files

  • libs/text-splitters/langchain_text_splitters/markdown.py (modified, +1/-1)
  • libs/text-splitters/tests/unit_tests/test_text_splitters.py (modified, +13/-0)

Code Example

from langchain_text_splitters import ExperimentalMarkdownSyntaxTextSplitter

splitter = ExperimentalMarkdownSyntaxTextSplitter()

# Markdown with an unclosed code block
text = """# Header

Some text before code.


def hello():
    print("world")

More text after the unclosed code block.
"""

chunks = splitter.split_text(text)
for chunk in chunks:
    print(repr(chunk.page_content))
    print(chunk.metadata)
    print("---")

---

No error is raised. The content from the unclosed code block ( block without closing ) is silently discarded. The chunks output only contains "Some text before code." — all content inside and after the unclosed code block is lost.
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

  • #36185

Reproduction Steps / Example Code (Python)

from langchain_text_splitters import ExperimentalMarkdownSyntaxTextSplitter

splitter = ExperimentalMarkdownSyntaxTextSplitter()

# Markdown with an unclosed code block
text = """# Header

Some text before code.


def hello():
    print("world")

More text after the unclosed code block.
"""

chunks = splitter.split_text(text)
for chunk in chunks:
    print(repr(chunk.page_content))
    print(chunk.metadata)
    print("---")

Error Message and Stack Trace (if applicable)

No error is raised. The content from the unclosed code block ( block without closing ) is silently discarded. The chunks output only contains "Some text before code." — all content inside and after the unclosed code block is lost.

Description

In ExperimentalMarkdownSyntaxTextSplitter._resolve_code_chunk(), when a markdown code block is never closed (missing closing fence like ``` or ~~~), the method returns an empty string "", silently discarding all accumulated content from the unclosed block.

Expected behavior: The splitter should preserve the content from unclosed code blocks rather than silently dropping it. Malformed markdown is common in real-world data, and silent data loss makes debugging very difficult.

Root cause: In langchain_text_splitters/markdown.py, the _resolve_code_chunk method accumulates content in chunk but returns "" if the closing fence is never found. The fix is to return the accumulated chunk instead.

System Info

langchain-text-splitters: 0.3.8 langchain-core: 0.3.61 Python: 3.12 OS: Linux

extent analysis

Fix Plan

To fix the issue of silently discarding content from unclosed code blocks in ExperimentalMarkdownSyntaxTextSplitter, we need to modify the _resolve_code_chunk method. Here are the steps:

  • Open the langchain_text_splitters/markdown.py file.
  • Locate the _resolve_code_chunk method.
  • Modify the method to return the accumulated chunk instead of an empty string when the closing fence is not found.

Example code:

def _resolve_code_chunk(self, chunk):
    # ... existing code ...
    if self._code_fence_close not in chunk:
        # Return the accumulated chunk instead of an empty string
        return chunk
    # ... existing code ...

Alternatively, you can raise an exception or log a warning to indicate that the markdown is malformed.

Verification

To verify that the fix worked, run the provided example code:

from langchain_text_splitters import ExperimentalMarkdownSyntaxTextSplitter

splitter = ExperimentalMarkdownSyntaxTextSplitter()

text = """# Header

Some text before code.


def hello():
    print("world")

More text after the unclosed code block.
"""

chunks = splitter.split_text(text)
for chunk in chunks:
    print(repr(chunk.page_content))
    print(chunk.metadata)
    print("---")

The output should now include the content from the unclosed code block.

Extra Tips

  • Consider adding a test case to ensure that the fix works correctly.
  • If you're using a version control system, commit the changes with a descriptive message, such as "Fix: Preserve content from unclosed code blocks in ExperimentalMarkdownSyntaxTextSplitter".

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING