llamaIndex - ✅(Solved) Fix [Bug]: tier="agentic" produces inconsistent/unpredictable parse latency [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20845Fetched 2026-04-08 00:30:42
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ×2closed ×1commented ×1cross-referenced ×1

Root Cause

The problem specifically affects production pipelines where multiple documents are parsed in parallel (using asyncio), because the inconsistent timing in tier="agentic" makes SLA guarantees and timeout tuning impossible and the parsing time isnt acceptable by the users.

Fix Action

Fixed

PR fix notes

PR #20947: docs(llama-parse): add timeout guidance for agentic tier in production

Description (problem / solution / changelog)

Summary

When using tier="agentic", parse latency can be variable. Document asyncio.wait_for workaround for production pipelines.

Fix

Add README section showing how to wrap async parse calls with asyncio.wait_for for SLA/timeout tuning.

Fixes #20845

Changed files

  • llama-index-integrations/readers/llama-index-readers-llama-parse/README.md (modified, +24/-0)
RAW_BUFFERClick to expand / collapse

Bug Description

When using tier="agentic" to parse PDF documents, execution time is both slow and non-deterministic — the same document parsed multiple times returns wildly different durations with no clear pattern. After switching to parse_mode="parse_page_with_llm", parsing speed became consistently fast and predictable with tradeoff .

The problem specifically affects production pipelines where multiple documents are parsed in parallel (using asyncio), because the inconsistent timing in tier="agentic" makes SLA guarantees and timeout tuning impossible and the parsing time isnt acceptable by the users.

Version

llama-parse==0.6.92

Steps to Reproduce

  1. Initialize LlamaParse with tier="agentic" and parse the same multi-page PDF several times: from llama_parse import LlamaParse

parser = LlamaParse( api_key="<your_key>", result_type="markdown", num_workers=8, verbose=True, tier="agentic", # <-- the problematic config version="latest", ) 2. Replace tier="agentic" with parse_mode="parse_page_with_llm" and repeat:

parser = LlamaParse( api_key="<your_key>", result_type="markdown", num_workers=8, verbose=True, parse_mode="parse_page_with_llm", # <-- replacement version="latest", )

Relevant Logs/Tracbacks

extent analysis

Fix Plan

1. Upgrade to a newer version of llama-parse

The current version 0.6.92 is outdated. Upgrade to the latest version to ensure you have the latest bug fixes and performance improvements.

2. Switch to parse_mode="parse_page_with_llm"

Replace tier="agentic" with parse_mode="parse_page_with_llm" in your LlamaParse initialization:

from llama_parse import LlamaParse

parser = LlamaParse(
    api_key="<your_key>",
    result_type="markdown",
    num_workers=8,
    verbose=True,
    parse_mode="parse_page_with_llm",  # Replace "agentic" with "parse_page_with_llm"
    version="latest",
)

3. Use asyncio with timeout

To ensure SLA guarantees and timeout tuning, use asyncio with a timeout:

import asyncio

async def parse_document(parser, document):
    try:
        result = await parser.parse(document, timeout=30)  # Set a 30-second timeout
        return result
    except asyncio.TimeoutError:
        print("Timeout exceeded")

async def main():
    parser = LlamaParse(
        api_key="<your_key>",
        result_type="markdown",
        num_workers=8,
        verbose=True,
        parse_mode="parse_page_with_llm",
        version="latest",
    )
    documents = [...]  # List of documents to parse
    tasks = [parse_document(parser, document) for document in documents]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    for result in results:
        if isinstance(result, Exception):
            print(f"Error: {result}")

asyncio.run(main())

Verification

  • Monitor the parsing time and ensure it's consistently fast and predictable.
  • Verify that the SLA guarantees and timeout tuning are working as expected.

**Extra Tips

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: tier="agentic" produces inconsistent/unpredictable parse latency [1 pull requests, 1 comments, 2 participants]