llamaIndex - ✅(Solved) Fix [Bug]: documentation override for classes inheriting from MetadataAwareTextSplitter [1 pull requests, 6 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20706Fetched 2026-04-08 00:31:20
View on GitHub
Comments
6
Participants
3
Timeline
17
Reactions
0
Assignees
Timeline (top)
commented ×6referenced ×3cross-referenced ×2labeled ×2

Fix Action

Fix / Workaround

i did notice that the documentation for the init method is being overridden for classes inheriting from MetadataAwareTextSplitter, i think we can attribute this fact to DispatcherSpanMixin (not confirmed but i wanted to iterate on my opinion)

I did face this issue and patched it in #20622 by manually setting the docs outside the class definition.

PR fix notes

PR #20622: feat: add chonkie integration

Description (problem / solution / changelog)

Description

this will add chonkie integration to the list of llamaindex integrations. how to use:

from llama_index.ingestion.chonkie import Chunker
chunker = Chunker(chunker_type="semantic")
out_list = chunker.split_text(text)

or you can use (following example from llama-index docs )

from llama_index.core import Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.ingestion.chonkie import Chunker

pipeline = IngestionPipeline(
    transformations=[
        Chunker(chunker_type="recursive", chunk_size=512),
    ]
)
nodes = pipeline.run(documents=[Document.example()])

with reference to llama-index docs

from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
from llama_index.ingestion.chonkie import Chunker

documents = [
    Document(text="text", metadata={"author": "LlamaIndex"}),
    Document(text="text", metadata={"author": "John Doe"}),
]
# use Chunker for chunking
chunker =  Chunker(chunker_type="recursive", chunk_size=512)

index = VectorStoreIndex.from_documents(documents, transformations= [chunker])

Fixes # (issue) None

follow ups

  • I went back and forth between to put this under ingestion or node-parser, and i have settled on ingestion because i might add other pipeline-related features from our library in the future, let me know if i should switch this to node parser or keep it here
  • i have tried as much as possible to follow the same schema as other chunkers in the current library so it should in theory integrate seamlessly within the core functionalities of llamaindex
  • would appreciate it if there are any inputs as to where to docs go

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • docs/src/content/docs/framework/module_guides/loading/node_parsers/modules.md (modified, +23/-0)
  • llama-index-integrations/node_parser/llama-index-node-parser-chonkie/.gitignore (added, +207/-0)
  • llama-index-integrations/node_parser/llama-index-node-parser-chonkie/README.md (added, +73/-0)
  • llama-index-integrations/node_parser/llama-index-node-parser-chonkie/llama_index/node_parser/chonkie/__init__.py (added, +7/-0)
  • llama-index-integrations/node_parser/llama-index-node-parser-chonkie/llama_index/node_parser/chonkie/chunkers.py (added, +160/-0)
  • llama-index-integrations/node_parser/llama-index-node-parser-chonkie/pyproject.toml (added, +72/-0)
  • llama-index-integrations/node_parser/llama-index-node-parser-chonkie/tests/__init__.py (added, +1/-0)
  • llama-index-integrations/node_parser/llama-index-node-parser-chonkie/tests/test_chunkers.py (added, +406/-0)

Code Example

check comment  and colab notebook attached above
RAW_BUFFERClick to expand / collapse

Bug Description

following up on https://github.com/run-llama/llama_index/pull/20622#discussion_r2764697454

i did notice that the documentation for the init method is being overridden for classes inheriting from MetadataAwareTextSplitter, i think we can attribute this fact to DispatcherSpanMixin (not confirmed but i wanted to iterate on my opinion)

I did face this issue and patched it in #20622 by manually setting the docs outside the class definition.

TLDR: help(class.__init__) has been overridden.

Version

editable

Steps to Reproduce

for the sake of debugging i created a seperate branch and a minimalistic colab notebook to help with this https://colab.research.google.com/drive/1fIKG6hK09ykuqUY5bhGWK2d90TgsVb3S?usp=sharing

Relevant Logs/Tracbacks

check comment  and colab notebook attached above

extent analysis

Fix Plan

Override Documentation for Inherited Classes

To fix the issue where the documentation for the __init__ method is being overridden for classes inheriting from MetadataAwareTextSplitter, we need to manually set the documentation outside the class definition.

Steps:

  1. Check if the class is inheriting from MetadataAwareTextSplitter:

if issubclass(class_to_check, MetadataAwareTextSplitter): # proceed with the fix

2. **Manually set the documentation for the `__init__` method**:
   ```python
class_to_check.__init__.__doc__ = "Custom documentation for the __init__ method"
  1. Apply the fix to the class definition:

class CustomClass(MetadataAwareTextSplitter): def init(self, *args, **kwargs): super().init(*args, **kwargs) # custom initialization code

   ```python
CustomClass.__init__.__doc__ = "Custom documentation for the __init__ method"

Example Use Case:

class CustomClass(MetadataAwareTextSplitter):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # custom initialization code

CustomClass.__init__.__doc__ = "Custom documentation for the __init__ method"

print(CustomClass.__init__.__doc__)  # Output: Custom documentation for the __init__ method

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING