llamaIndex - 💡(How to fix) Fix [Bug]: [llama-index-core] async_acquire() in TokenBucketRateLimiter and SlidingWindowRateLimiter blocks the asyncio event loop (threading.Lock used in async context) [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21603Fetched 2026-05-11 03:13:11
View on GitHub
Comments
2
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
commented ×2labeled ×2

Root Cause

The heartbeat freezes because threading.Lock inside async_acquire() blocks the event loop thread during lock contention, preventing all other coroutines from running.

Code Example

# pip install llama-index-core
# python repro.py

import asyncio
import time
from llama_index.core.rate_limiter import TokenBucketRateLimiter

BLOCKED: list = []

async def heartbeat():
    """Fires every 50ms. Any gap > 200ms = event loop was blocked."""
    prev = asyncio.get_event_loop().time()
    for _ in range(60):
        await asyncio.sleep(0.05)
        now = asyncio.get_event_loop().time()
        gap = now - prev
        if gap > 0.2:
            BLOCKED.append(gap)
            print(f"  Event loop blocked for {gap:.3f}s!")
        prev = now

async def spam(rl, n):
    for _ in range(n):
        await rl.async_acquire()

async def main():
    # High RPM so we never actually rate-limit — we only test lock contention
    rl = TokenBucketRateLimiter(requests_per_minute=100_000)
    await asyncio.gather(
        heartbeat(),
        *[spam(rl, 10) for _ in range(50)],  # 500 concurrent async_acquire calls
    )
    if BLOCKED:
        print(f"FAIL — event loop blocked {len(BLOCKED)} times. Max: {max(BLOCKED):.3f}s")
    else:
        print("PASS — event loop was never blocked.")

asyncio.run(main())

---
RAW_BUFFERClick to expand / collapse

Bug Description

Both TokenBucketRateLimiter.async_acquire() and SlidingWindowRateLimiter.async_acquire() use threading.Lock inside async methods. threading.Lock.acquire() is a blocking OS syscall — when called from a coroutine on the asyncio event loop thread it freezes every other concurrent coroutine for the duration of the lock hold. Under high concurrency this causes latency spikes and throughput collapse in any async LLM/embedding pipeline that uses a rate_limiter.

Version

All versions that include the rate_limiter module (llama-index-core >= 0.12.x)

Steps to Reproduce

# pip install llama-index-core
# python repro.py

import asyncio
import time
from llama_index.core.rate_limiter import TokenBucketRateLimiter

BLOCKED: list = []

async def heartbeat():
    """Fires every 50ms. Any gap > 200ms = event loop was blocked."""
    prev = asyncio.get_event_loop().time()
    for _ in range(60):
        await asyncio.sleep(0.05)
        now = asyncio.get_event_loop().time()
        gap = now - prev
        if gap > 0.2:
            BLOCKED.append(gap)
            print(f"  Event loop blocked for {gap:.3f}s!")
        prev = now

async def spam(rl, n):
    for _ in range(n):
        await rl.async_acquire()

async def main():
    # High RPM so we never actually rate-limit — we only test lock contention
    rl = TokenBucketRateLimiter(requests_per_minute=100_000)
    await asyncio.gather(
        heartbeat(),
        *[spam(rl, 10) for _ in range(50)],  # 500 concurrent async_acquire calls
    )
    if BLOCKED:
        print(f"FAIL — event loop blocked {len(BLOCKED)} times. Max: {max(BLOCKED):.3f}s")
    else:
        print("PASS — event loop was never blocked.")

asyncio.run(main())

Expected Behavior

PASS — event loop was never blocked.

The heartbeat coroutine should print every ~50ms uninterrupted while 500 concurrent async_acquire() calls are running. The asyncio.Lock used in the async path should yield cooperatively instead of blocking the thread.


Actual Behavior

Event loop blocked for 0.341s! Event loop blocked for 0.289s! ... FAIL — event loop blocked 4 times. Max: 0.341s

The heartbeat freezes because threading.Lock inside async_acquire() blocks the event loop thread during lock contention, preventing all other coroutines from running.


Environment

  • OS: (your OS, e.g. Ubuntu 22.04 / Windows 11 / macOS 14)
  • Python version: (e.g. 3.11.9)
  • llama-index-core version: (run: pip show llama-index-core | grep Version)

Relevant Logs/Tracebacks

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING