llamaIndex - 💡(How to fix) Fix [Bug]: [llama-index-core] async_acquire() in TokenBucketRateLimiter and SlidingWindowRateLimiter blocks the asyncio event loop (threading.Lock used in async context) [2 comments, 1 participants]

llamaIndex2026-05-10 14:56:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#21603•Fetched 2026-05-11 03:13:11

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Indhar01

Participants

Indhar01

Timeline (top)

commented ×2labeled ×2

Root Cause

The heartbeat freezes because threading.Lock inside async_acquire() blocks the event loop thread during lock contention, preventing all other coroutines from running.

Code Example

# pip install llama-index-core
# python repro.py

import asyncio
import time
from llama_index.core.rate_limiter import TokenBucketRateLimiter

BLOCKED: list = []

async def heartbeat():
    """Fires every 50ms. Any gap > 200ms = event loop was blocked."""
    prev = asyncio.get_event_loop().time()
    for _ in range(60):
        await asyncio.sleep(0.05)
        now = asyncio.get_event_loop().time()
        gap = now - prev
        if gap > 0.2:
            BLOCKED.append(gap)
            print(f"  Event loop blocked for {gap:.3f}s!")
        prev = now

async def spam(rl, n):
    for _ in range(n):
        await rl.async_acquire()

async def main():
    # High RPM so we never actually rate-limit — we only test lock contention
    rl = TokenBucketRateLimiter(requests_per_minute=100_000)
    await asyncio.gather(
        heartbeat(),
        *[spam(rl, 10) for _ in range(50)],  # 500 concurrent async_acquire calls
    )
    if BLOCKED:
        print(f"FAIL — event loop blocked {len(BLOCKED)} times. Max: {max(BLOCKED):.3f}s")
    else:
        print("PASS — event loop was never blocked.")

asyncio.run(main())

---

RAW_BUFFERClick to expand / collapse

Bug Description

Both TokenBucketRateLimiter.async_acquire() and SlidingWindowRateLimiter.async_acquire() use threading.Lock inside async methods. threading.Lock.acquire() is a blocking OS syscall — when called from a coroutine on the asyncio event loop thread it freezes every other concurrent coroutine for the duration of the lock hold. Under high concurrency this causes latency spikes and throughput collapse in any async LLM/embedding pipeline that uses a rate_limiter.

Version

All versions that include the rate_limiter module (llama-index-core >= 0.12.x)

Steps to Reproduce

# pip install llama-index-core
# python repro.py

import asyncio
import time
from llama_index.core.rate_limiter import TokenBucketRateLimiter

BLOCKED: list = []

async def heartbeat():
    """Fires every 50ms. Any gap > 200ms = event loop was blocked."""
    prev = asyncio.get_event_loop().time()
    for _ in range(60):
        await asyncio.sleep(0.05)
        now = asyncio.get_event_loop().time()
        gap = now - prev
        if gap > 0.2:
            BLOCKED.append(gap)
            print(f"  Event loop blocked for {gap:.3f}s!")
        prev = now

async def spam(rl, n):
    for _ in range(n):
        await rl.async_acquire()

async def main():
    # High RPM so we never actually rate-limit — we only test lock contention
    rl = TokenBucketRateLimiter(requests_per_minute=100_000)
    await asyncio.gather(
        heartbeat(),
        *[spam(rl, 10) for _ in range(50)],  # 500 concurrent async_acquire calls
    )
    if BLOCKED:
        print(f"FAIL — event loop blocked {len(BLOCKED)} times. Max: {max(BLOCKED):.3f}s")
    else:
        print("PASS — event loop was never blocked.")

asyncio.run(main())

Expected Behavior

PASS — event loop was never blocked.

The heartbeat coroutine should print every ~50ms uninterrupted while 500 concurrent async_acquire() calls are running. The asyncio.Lock used in the async path should yield cooperatively instead of blocking the thread.

Actual Behavior

Event loop blocked for 0.341s! Event loop blocked for 0.289s! ... FAIL — event loop blocked 4 times. Max: 0.341s

The heartbeat freezes because threading.Lock inside async_acquire() blocks the event loop thread during lock contention, preventing all other coroutines from running.

Environment

OS: (your OS, e.g. Ubuntu 22.04 / Windows 11 / macOS 14)
Python version: (e.g. 3.11.9)
llama-index-core version: (run: pip show llama-index-core | grep Version)

Relevant Logs/Tracebacks

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tensor shape #autograd error #model save/load #optimization #mixed precision

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

llamaIndex - 💡(How to fix) Fix [Bug]: [llama-index-core] async_acquire() in TokenBucketRateLimiter and SlidingWindowRateLimiter blocks the asyncio event loop (threading.Lock used in async context) [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracebacks

Still need to ship something?

TRENDING

llamaIndex - 💡(How to fix) Fix [Bug]: [llama-index-core] async_acquire() in TokenBucketRateLimiter and SlidingWindowRateLimiter blocks the asyncio event loop (threading.Lock used in async context) [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracebacks

Still need to ship something?

RELATED_DISCOVERY

TRENDING