llamaIndex - ✅(Solved) Fix [Bug]: @retry_decorator on async generator _conversion_stream_with_retry is silently inert [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21346Fetched 2026-04-09 07:51:04
View on GitHub
Comments
1
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
commented ×1

Error Message

This means retries never happen for streaming calls, even when the error is retryable (e.g., ThrottlingException, ServiceUnavailableException). raise ValueError("mid-stream error — should retry but won't") caught (no retry happened): mid-stream error — should retry but won't No traceback — the bug is silent. Retries simply never trigger for streaming calls.

Fix Action

Fixed

PR fix notes

PR #21351: fix(integrations): apply Tenacity only to initial converse_stream call

Description (problem / solution / changelog)

Description

The fix moves the retry logic into a separate async function that only wraps the initial converse_stream() call, ensuring Tenacity can properly retry connection/setup failures. The outer async generator then simply yields events from the established stream without being decorated, avoiding ineffective retries during iteration. Fixes #21346

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-integrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/utils.py (modified, +19/-12)
  • llama-index-integrations/llms/llama-index-llms-bedrock-converse/pyproject.toml (modified, +1/-1)
  • uv.lock (modified, +6/-22)

Code Example

# utils.py lines 875-884
@retry_decorator
async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    async with session.client(
        "bedrock-runtime",
        config=config,
        **_boto_client_kwargs,
    ) as client:
        response = await client.converse_stream(**kwargs)
        async for event in response["stream"]:
            yield event  # <-- yield makes this an async generator; retry never fires

---

import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(min=1, max=4),
    reraise=True,
)
async def gen_with_retry():
    print("gen_with_retry called")  # only prints once
    yield 1
    raise ValueError("mid-stream error — should retry but won't")

async def main():
    gen = gen_with_retry()
    print(f"type: {type(gen)}")  # <class 'async_generator'>
    try:
        async for item in gen:
            print(f"got: {item}")
    except ValueError as e:
        print(f"caught (no retry happened): {e}")

asyncio.run(main())

---

type: <class 'async_generator'>
gen_with_retry called
got: 1
caught (no retry happened): mid-stream error — should retry but won't

---

async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    @retry_decorator
    async def _connect(**kw: Any) -> Any:
        async with session.client("bedrock-runtime", config=config, **_boto_client_kwargs) as c:
            return await c.converse_stream(**kw), c

    response, client = await _connect(**kwargs)
    async for event in response["stream"]:
        yield event
RAW_BUFFERClick to expand / collapse

Bug Description

In llama_index/llms/bedrock_converse/utils.py, the function _conversion_stream_with_retry is decorated with tenacity's @retry_decorator, but since the function contains yield, it is an async generator function. Tenacity's retry cannot intercept exceptions that occur during iteration of the returned async generator — it only wraps the initial function call, which for a generator merely creates the generator object without executing any code.

This means retries never happen for streaming calls, even when the error is retryable (e.g., ThrottlingException, ServiceUnavailableException).

# utils.py lines 875-884
@retry_decorator
async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    async with session.client(
        "bedrock-runtime",
        config=config,
        **_boto_client_kwargs,
    ) as client:
        response = await client.converse_stream(**kwargs)
        async for event in response["stream"]:
            yield event  # <-- yield makes this an async generator; retry never fires

The comment at line 861 acknowledges something is off ("Returning the generator directly from converse_stream doesn't work... This is a bit of a hack"), but the current approach still doesn't achieve retry.

Contrast with the sync version in the same file — _converse_with_retry (line 767) is a regular function that returns the generator object directly from client.converse_stream(). Tenacity wraps the call to client.converse_stream(), so retry works correctly for connection-time errors. But the async version tries to iterate inside the generator, making tenacity inert.

Version

llama-index-llms-bedrock-converse==0.14.5

Steps to Reproduce

import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(min=1, max=4),
    reraise=True,
)
async def gen_with_retry():
    print("gen_with_retry called")  # only prints once
    yield 1
    raise ValueError("mid-stream error — should retry but won't")

async def main():
    gen = gen_with_retry()
    print(f"type: {type(gen)}")  # <class 'async_generator'>
    try:
        async for item in gen:
            print(f"got: {item}")
    except ValueError as e:
        print(f"caught (no retry happened): {e}")

asyncio.run(main())

Output:

type: <class 'async_generator'>
gen_with_retry called
got: 1
caught (no retry happened): mid-stream error — should retry but won't

Relevant Logs/Tracebacks

No traceback — the bug is silent. Retries simply never trigger for streaming calls.

Suggested Fix

Since retrying a partially-consumed stream is not meaningful (you can't replay already-yielded chunks), the retry should only apply to the initial converse_stream() call. One approach:

async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    @retry_decorator
    async def _connect(**kw: Any) -> Any:
        async with session.client("bedrock-runtime", config=config, **_boto_client_kwargs) as c:
            return await c.converse_stream(**kw), c

    response, client = await _connect(**kwargs)
    async for event in response["stream"]:
        yield event

extent analysis

TL;DR

  • The most likely fix involves modifying the _conversion_stream_with_retry function to apply tenacity's retry mechanism only to the initial converse_stream() call.

Guidance

  • Identify the initial call to converse_stream() within the _conversion_stream_with_retry function as the point where retries should be applied.
  • Create a nested function _connect that wraps the converse_stream() call and applies the @retry_decorator to it.
  • Ensure the retry logic only applies to the initial connection attempt, not to the iteration of the async generator.
  • Verify the fix by testing the _conversion_stream_with_retry function with a scenario that triggers a retryable exception during the initial converse_stream() call.

Example

async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    @retry_decorator
    async def _connect(**kw: Any) -> Any:
        async with session.client("bedrock-runtime", config=config, **_boto_client_kwargs) as c:
            return await c.converse_stream(**kw), c

    response, client = await _connect(**kwargs)
    async for event in response["stream"]:
        yield event

Notes

  • The provided example code in the issue already suggests a correct approach to fixing the problem.
  • The key insight is recognizing that retries should only apply to the initial connection attempt, not to the iteration of the async generator.

Recommendation

  • Apply the suggested fix by modifying the _conversion_stream_with_retry function as described, to ensure that retries are correctly triggered for connection-time errors in streaming calls.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING