llamaIndex - ✅(Solved) Fix [Bug]: @retry_decorator on async generator _conversion_stream_with_retry is silently inert [1 pull requests, 1 comments, 1 participants]

zxzinn · 2026-04-09T02:46:43Z

[llamaIndex] PR 21351: fix integrations : apply Tenacity only to initial converse stream call - Repository: run-llama/llama index - Author: jasiecky - State: o… # PR #21351: fix(integrations): apply Tenacity only to initial converse_stream call - Repository: run-llama/llama_index - Author: jasiecky - State: open | merged: False - Link: https://github.com/run-llama/llama_index/pull/21351 ## Description (problem / solution / changelog) # Description The fix moves the retry logic into a separate async function that only wraps the initial converse_stream() call, ensuring Tenacity can properly retry connection/setup failures. The outer async generator then simply yields events from the established stream without being decorated, avoiding ineffective retries during iteration. Fixes #21346 ## New Package? Did I fill in the `tool.llamahub` section in the `pyproject.toml` and provide a detailed README.md for my new integration or package? - [ ] Yes - [x] No ## Version Bump? Did I bump the version in the `pyproject.toml` file of the package I am updating? (Except for the `llama-index-core` package) - [x] Yes - [ ] No ## Type of Change Please delete options that are not relevant. - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update ## How Has This Been Tested? Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing. - [ ] I added new unit tests to cover this change - [x] I believe this change is already covered by existing unit tests ## Suggested Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] I have added Google Colab support for the newly added notebooks. - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [x] I ran `uv run make format; uv run make lint` to appease the lint gods ## Changed files - `llama-index-integrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/utils.py` (modified, +19/-12) - `llama-index-integrations/llms/llama-index-llms-bedrock-converse/pyproject.toml` (modified, +1/-1) - `uv.lock` (modified, +6/-22) ## Fixed - Fixed by PR: fix(integrations): apply Tenacity only to initial converse_stream call (https://github.com/run-llama/llama_index/pull/21351) ## Bug Description In `llama_index/llms/bedrock_converse/utils.py`, the function `_conversion_stream_with_retry` is decorated with tenacity's `@retry_decorator`, but since the function contains `yield`, it is an **async generator function**. Tenacity's `retry` cannot intercept exceptions that occur during iteration of the returned async generator — it only wraps the initial function call, which for a generator merely creates the generator object without executing any code. This means **retries never happen** for streaming calls, even when the error is retryable (e.g., `ThrottlingException`, `ServiceUnavailableException`). ```python # utils.py lines 875-884 @retry_decorator async def _conversion_stream_with_retry(**kwargs: Any) -> Any: async with session.client( "bedrock-runtime", config=config, **_boto_client_kwargs, ) as client: response = await client.converse_stream(**kwargs) async for event in response["stream"]: yield event # <-- yield makes this an async generator; retry never fires ``` The comment at line 861 acknowledges something is off (`"Returning the generator directly from converse_stream doesn't work... This is a bit of a hack"`), but the current approach still doesn't achieve retry. **Contrast with the sync version** in the same file — `_converse_with_retry` (line 767) is a regular function that returns the generator object directly from `client.converse_stream()`. Tenacity wraps the call to `client.converse_stream()`, so retry works correctly for connection-time errors. But the async version tries to iterate inside the generator, making tenacity inert. ## Version `llama-index-llms-bedrock-converse==0.14.5` ## Steps to Reproduce ```python import asyncio from tenacity import retry, stop_after_attempt, wait_exponential @retry( stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=4), reraise=True, ) async def gen_with_retry(): print("gen_with_retry called") # only prints once yield 1 raise ValueError("mid-stream error — should retry but won't") async def main(): gen = gen_with_retry() print(f"type: {type(gen)}") # try: async for item in gen: print(f"got: {item}") except ValueError as e: print(f"caught (no retry happened): {e}") asyncio.run(main()) ``` Output: ``` type: gen_with_retry called got: 1 ca

llamaIndex2026-04-09 02:46:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#21346•Fetched 2026-04-09 07:51:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

zxzinn

Participants

zxzinn

Timeline (top)

commented ×1

Error Message

This means retries never happen for streaming calls, even when the error is retryable (e.g., ThrottlingException, ServiceUnavailableException). raise ValueError("mid-stream error — should retry but won't") caught (no retry happened): mid-stream error — should retry but won't No traceback — the bug is silent. Retries simply never trigger for streaming calls.

Code Example

# utils.py lines 875-884
@retry_decorator
async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    async with session.client(
        "bedrock-runtime",
        config=config,
        **_boto_client_kwargs,
    ) as client:
        response = await client.converse_stream(**kwargs)
        async for event in response["stream"]:
            yield event  # <-- yield makes this an async generator; retry never fires

---

import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(min=1, max=4),
    reraise=True,
)
async def gen_with_retry():
    print("gen_with_retry called")  # only prints once
    yield 1
    raise ValueError("mid-stream error — should retry but won't")

async def main():
    gen = gen_with_retry()
    print(f"type: {type(gen)}")  # <class 'async_generator'>
    try:
        async for item in gen:
            print(f"got: {item}")
    except ValueError as e:
        print(f"caught (no retry happened): {e}")

asyncio.run(main())

---

type: <class 'async_generator'>
gen_with_retry called
got: 1
caught (no retry happened): mid-stream error — should retry but won't

---

async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    @retry_decorator
    async def _connect(**kw: Any) -> Any:
        async with session.client("bedrock-runtime", config=config, **_boto_client_kwargs) as c:
            return await c.converse_stream(**kw), c

    response, client = await _connect(**kwargs)
    async for event in response["stream"]:
        yield event

RAW_BUFFERClick to expand / collapse

Bug Description

In llama_index/llms/bedrock_converse/utils.py, the function _conversion_stream_with_retry is decorated with tenacity's @retry_decorator, but since the function contains yield, it is an async generator function. Tenacity's retry cannot intercept exceptions that occur during iteration of the returned async generator — it only wraps the initial function call, which for a generator merely creates the generator object without executing any code.

This means retries never happen for streaming calls, even when the error is retryable (e.g., ThrottlingException, ServiceUnavailableException).

# utils.py lines 875-884
@retry_decorator
async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    async with session.client(
        "bedrock-runtime",
        config=config,
        **_boto_client_kwargs,
    ) as client:
        response = await client.converse_stream(**kwargs)
        async for event in response["stream"]:
            yield event  # <-- yield makes this an async generator; retry never fires

The comment at line 861 acknowledges something is off ("Returning the generator directly from converse_stream doesn't work... This is a bit of a hack"), but the current approach still doesn't achieve retry.

Contrast with the sync version in the same file — _converse_with_retry (line 767) is a regular function that returns the generator object directly from client.converse_stream(). Tenacity wraps the call to client.converse_stream(), so retry works correctly for connection-time errors. But the async version tries to iterate inside the generator, making tenacity inert.

Version

llama-index-llms-bedrock-converse==0.14.5

Steps to Reproduce

import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(min=1, max=4),
    reraise=True,
)
async def gen_with_retry():
    print("gen_with_retry called")  # only prints once
    yield 1
    raise ValueError("mid-stream error — should retry but won't")

async def main():
    gen = gen_with_retry()
    print(f"type: {type(gen)}")  # <class 'async_generator'>
    try:
        async for item in gen:
            print(f"got: {item}")
    except ValueError as e:
        print(f"caught (no retry happened): {e}")

asyncio.run(main())

Output:

type: <class 'async_generator'>
gen_with_retry called
got: 1
caught (no retry happened): mid-stream error — should retry but won't

Relevant Logs/Tracebacks

No traceback — the bug is silent. Retries simply never trigger for streaming calls.

Suggested Fix

Since retrying a partially-consumed stream is not meaningful (you can't replay already-yielded chunks), the retry should only apply to the initial converse_stream() call. One approach:

async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    @retry_decorator
    async def _connect(**kw: Any) -> Any:
        async with session.client("bedrock-runtime", config=config, **_boto_client_kwargs) as c:
            return await c.converse_stream(**kw), c

    response, client = await _connect(**kwargs)
    async for event in response["stream"]:
        yield event

extent analysis

TL;DR

The most likely fix involves modifying the _conversion_stream_with_retry function to apply tenacity's retry mechanism only to the initial converse_stream() call.

Guidance

Identify the initial call to converse_stream() within the _conversion_stream_with_retry function as the point where retries should be applied.
Create a nested function _connect that wraps the converse_stream() call and applies the @retry_decorator to it.
Ensure the retry logic only applies to the initial connection attempt, not to the iteration of the async generator.
Verify the fix by testing the _conversion_stream_with_retry function with a scenario that triggers a retryable exception during the initial converse_stream() call.

Example

async def _conversion_stream_with_retry(**kwargs: Any) -> Any:
    @retry_decorator
    async def _connect(**kw: Any) -> Any:
        async with session.client("bedrock-runtime", config=config, **_boto_client_kwargs) as c:
            return await c.converse_stream(**kw), c

    response, client = await _connect(**kwargs)
    async for event in response["stream"]:
        yield event

Notes

The provided example code in the issue already suggests a correct approach to fixing the problem.
The key insight is recognizing that retries should only apply to the initial connection attempt, not to the iteration of the async generator.

Recommendation

Apply the suggested fix by modifying the _conversion_stream_with_retry function as described, to ensure that retries are correctly triggered for connection-time errors in streaming calls.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory management #API rate limit #retriever error #indexing error #inference speed

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

llamaIndex - ✅(Solved) Fix [Bug]: @retry_decorator on async generator _conversion_stream_with_retry is silently inert [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #21351: fix(integrations): apply Tenacity only to initial converse_stream call

Description (problem / solution / changelog)

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Changed files