llamaIndex - ✅(Solved) Fix [Bug]: RouterQueryEngine._aquery blocks the event loop when multiple engines are selected [1 pull requests, 5 comments, 3 participants]

gautamvarmadatla · 2026-02-24T23:08:30Z

[llamaIndex] PR 20795: fix core : replace blocking run async tasks with asyncio.gather - Repository: run-llama/llama index - Author: gautamvarmadatla - State:… # PR #20795: fix(core): replace blocking `run_async_tasks` with `asyncio.gather` - Repository: run-llama/llama_index - Author: gautamvarmadatla - State: closed | merged: True - Link: https://github.com/run-llama/llama_index/pull/20795 ## Description (problem / solution / changelog) # Description I replaced `run_async_tasks(tasks)` with `await asyncio.gather(*tasks)` in the async fan-out paths for `RouterQueryEngine._aquery` and `ToolRetrieverRouterQueryEngine._aquery`, so multi-engine routing no longer performs a synchronous blocking wait inside an async method. I also added a couple of regression tests for both classes to confirm the event loop isn’t blocked. Fixes #20794 ## New Package? Did I fill in the `tool.llamahub` section in the `pyproject.toml` and provide a detailed README.md for my new integration or package? - [ ] Yes - [X] No ## Version Bump? Did I bump the version in the `pyproject.toml` file of the package I am updating? (Except for the `llama-index-core` package) - [ ] Yes - [X] No ## Type of Change Please delete options that are not relevant. - [X] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update ## How Has This Been Tested? Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing. - [X] I added new unit tests to cover this change - [ ] I believe this change is already covered by existing unit tests ## Suggested Checklist: - [X] I have performed a self-review of my own code - [X] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] I have added Google Colab support for the newly added notebooks. - [X] My changes generate no new warnings - [X] I have added tests that prove my fix is effective or that my feature works - [X] New and existing unit tests pass locally with my changes - [X] I ran `uv run make format; uv run make lint` to appease the lint gods ## Changed files - `llama-index-core/llama_index/core/query_engine/router_query_engine.py` (modified, +3/-3) - `llama-index-core/tests/query_engine/test_router_query_engine.py` (added, +114/-0) ## Fixed - Fixed by PR: fix(core): replace blocking `run_async_tasks` with `asyncio.gather` (https://github.com/run-llama/llama_index/pull/20795) ### Bug Description When the router picks more than one engine/tool, the async `_aquery()` path calls `run_async_tasks()` to run them. That helper goes through `asyncio_run()`, and if there’s already a running event loop it spins up a new thread to run the work and then waits on `future.result()` on the current thread. Since `_aquery()` is running on the event-loop thread, that wait blocks the loop until all sub-queries finish, so everything else on the loop (other requests, timers, background tasks) gets stuck. Kind of similar to #17349 , #14515 , etc . ### Version 0.14.15 ### Steps to Reproduce ``` python import asyncio from unittest.mock import MagicMock from llama_index.core.base.base_selector import BaseSelector, SelectorResult, SingleSelection from llama_index.core.base.response.schema import Response from llama_index.core.query_engine.router_query_engine import RouterQueryEngine from llama_index.core.tools.types import ToolMetadata class MultiSelector(BaseSelector): def _get_prompts(self): return {} def _update_prompts(self, p): pass def _get_prompt_modules(self): return {} def _select(self, choices, query): return SelectorResult(selections=[ SingleSelection(index=0, reason=""), SingleSelection(index=1, reason=""), ]) async def _aselect(self, choices, query): return self._select(choices, query) async def fake_query(_): await asyncio.sleep(0.05) return Response(response="ok") def tool(name): e, t = MagicMock(), MagicMock() e.aquery = fake_query t.query_engine = e t.metadata = ToolMetadata(name=name, description=name) return t class Summarizer: async def aget_response(self, *_): return "combined" async def repro(): router = RouterQueryEngine( selector=MultiSelector(), query_engine_tools=[tool("a"), tool("b")], llm=MagicMock(), summarizer=Summarizer(), ) ran = False async def bg(): nonlocal ran await asyncio.sleep(0.01) ran = True asyncio.create_task(bg()) await router.aquery("test") print("background ran during aquery:", ran) # buggy: False, fixed: True await repro() ``` ### Relevant Logs/Tracbacks ```shell ```

llamaIndex2026-02-24 23:08:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#20794•Fetched 2026-04-08 00:30:53

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×5labeled ×2closed ×1cross-referenced ×1

Code Example

### Relevant Logs/Tracbacks

RAW_BUFFERClick to expand / collapse

Bug Description

When the router picks more than one engine/tool, the async _aquery() path calls run_async_tasks() to run them. That helper goes through asyncio_run(), and if there’s already a running event loop it spins up a new thread to run the work and then waits on future.result() on the current thread. Since _aquery() is running on the event-loop thread, that wait blocks the loop until all sub-queries finish, so everything else on the loop (other requests, timers, background tasks) gets stuck.

Kind of similar to #17349 , #14515 , etc .

Version

0.14.15

Steps to Reproduce

import asyncio
from unittest.mock import MagicMock

from llama_index.core.base.base_selector import BaseSelector, SelectorResult, SingleSelection
from llama_index.core.base.response.schema import Response
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.tools.types import ToolMetadata

class MultiSelector(BaseSelector):
    def _get_prompts(self): return {}
    def _update_prompts(self, p): pass
    def _get_prompt_modules(self): return {}
    def _select(self, choices, query):
        return SelectorResult(selections=[
            SingleSelection(index=0, reason=""),
            SingleSelection(index=1, reason=""),
        ])
    async def _aselect(self, choices, query): return self._select(choices, query)

async def fake_query(_):
    await asyncio.sleep(0.05)         
    return Response(response="ok")

def tool(name):
    e, t = MagicMock(), MagicMock()
    e.aquery = fake_query
    t.query_engine = e
    t.metadata = ToolMetadata(name=name, description=name)
    return t

class Summarizer:
    async def aget_response(self, *_): return "combined"

async def repro():
    router = RouterQueryEngine(
        selector=MultiSelector(),
        query_engine_tools=[tool("a"), tool("b")],
        llm=MagicMock(),
        summarizer=Summarizer(),
    )

    ran = False
    async def bg():
        nonlocal ran
        await asyncio.sleep(0.01)
        ran = True

    asyncio.create_task(bg())
    await router.aquery("test")
    print("background ran during aquery:", ran)  # buggy: False, fixed: True

await repro()

Relevant Logs/Tracbacks

extent analysis

Fix: Make `run_async_tasks` fully async (no thread‑blocking)

The block occurs because run_async_tasks() falls back to asyncio_run() → a new thread → future.result(), which blocks the current event‑loop thread.
Replace that helper with a pure‑async implementation that uses asyncio.gather (or asyncio.wait) and never calls future.result().

1. Update the helper (usually in `llama_index/core/utils.py`)

# old signature (sync)
def run_async_tasks(tasks: List[Callable[[], Awaitable[Any]]]) -> List[Any]:
    ...

# new async version
import asyncio
from typing import Awaitable, Callable, List, Any

async def run_async_tasks(tasks: List[Callable[[], Awaitable[Any]]]) -> List[Any]:
    """
    Execute a list of async callables concurrently without blocking the
    current event‑loop.  Each element in *tasks* is a zero‑arg coroutine
    factory (e.g. lambda: tool.aquery(query)).
    """
    # Build the coroutine objects
    coros = [task() for task in tasks]

    # Run them concurrently; propagate exceptions as usual
    results = await asyncio.gather(*coros, return_exceptions=False)
    return results

2. Call it with `await` from `_aquery`

class RouterQueryEngine:
    # ...

    async def _aquery(self, query: str) -> Response:
        # Build a list of async callables for the selected tools
        selected = self._select_tools(query)          # returns list of tools
        task_fns = [lambda t=t: t.query_engine.aquery(query) for t in selected]

        # <-- NEW: await the async helper
        sub_responses = await run_async_tasks(task_fns)

        # Continue with summarisation, etc.
        combined = await self.summarizer.aget_response(sub_responses)
        return Response(response=combined)

3. Remove the old `asyncio_run`/thread shim (if still imported)

# delete or comment out
# from llama_index.core.utils import asyncio_run

4. (Optional) Back‑compat shim for callers that still expect the sync API

If some external code still imports the sync version, keep a thin wrapper:

def run_async_tasks_sync(tasks):
    """Legacy wrapper – executes the async helper in the current loop."""
    return asyncio.get_event_loop().run_until_complete(run_async_tasks(tasks))

5. Verify the fix

async def repro():
    router = RouterQueryEngine(
        selector=MultiSelector(),
        query_engine_tools=[tool("a"), tool("b")],
        llm=MagicMock(),
        summarizer=Summarizer(),
    )

    ran = False
    async def bg():
        nonlocal ran
        await asyncio.sleep(0.01)
        ran = True

    asyncio.create_task(bg())
    await router.aquery("test")
    print("background ran during aquery:", ran)   # → True

Run the script; the background task should set ran = True, confirming the event loop stayed responsive.

Extra Tips

Never block the event‑loop thread with future.result() or run_until_complete.
Use asyncio.gather (or asyncio.wait) for concurrent async work.
Add a unit test that spawns a background task while router.aquery runs; assert the background task completes.
If you need a sync entry‑point, expose a separate run_router_query_sync that creates its own loop outside any existing one.

That’s all – swapping the blocking helper for the async gather version restores proper concurrency.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #GPU compatibility #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

llamaIndex - ✅(Solved) Fix [Bug]: RouterQueryEngine._aquery blocks the event loop when multiple engines are selected [1 pull requests, 5 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #20795: fix(core): replace blocking run_async_tasks with asyncio.gather

Description (problem / solution / changelog)

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Changed files

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

Fix: Make run_async_tasks fully async (no thread‑blocking)

1. Update the helper (usually in llama_index/core/utils.py)

2. Call it with await from _aquery

3. Remove the old asyncio_run/thread shim (if still imported)