vllm - 💡(How to fix) Fix [Performance]: KV prefetching not spawned on time can cause unnecessary GPU idle [2 comments, 2 participants]

vllm2026-05-06 05:23:51

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#41784•Fetched 2026-05-07 03:32:57

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jiangxiaosheng

Participants

chfeng-cs

jiangxiaosheng

Timeline (top)

commented ×2labeled ×1

Code Example

if request.num_computed_tokens == 0:
    # Get locally-cached tokens.
    new_computed_blocks, num_new_local_computed_tokens = (
        self.kv_cache_manager.get_computed_blocks(request)
    )

    # Get externally-cached tokens if using a KVConnector.
    if self.connector is not None:
        ext_tokens, load_kv_async = (
            self.connector.get_num_new_matched_tokens(
                request, num_new_local_computed_tokens
            )
        )

---

(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:570] schedule step 3748 running queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:572] running queue: id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f num_computed_tokens=32146 num_tokens=44166
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:580] schedule step 3748 waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/engine/core.py:457] scheduled batch [3748]: chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f prefix_computed=32146 -> 6144 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/engine/core.py:464] scheduled batch [3748] total: 6144 tokens
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:02,991] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 16.8434 ms, throughput: 11.1320 GB/s; offload_time: 15.9973 ms, put_time: 0.8122 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:02,999] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 24.4035 ms, throughput: 7.6833 GB/s; offload_time: 20.9256 ms, put_time: 3.4275 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,000] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 26.3925 ms, throughput: 7.1043 GB/s; offload_time: 22.0000 ms, put_time: 4.3512 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,001] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 26.5598 ms, throughput: 7.0595 GB/s; offload_time: 23.4608 ms, put_time: 3.0463 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3747] wall-clock: submit=1777872302.436745 finish=1777872303.005471 (time.time epoch s)
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.30 s
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:570] schedule step 3749 running queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:572] running queue: id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f num_computed_tokens=38290 num_tokens=44166
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:580] schedule step 3749 waiting queue:
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,010] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,010] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,010] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,011] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,011] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,011] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,012] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,012] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c at step 3749
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,025] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,026] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,028] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(APIServer pid=1906905) INFO:     127.0.0.1:59952 - "GET /metrics HTTP/1.1" 200 OK
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,029] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 at step 3749
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,033] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,035] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,035] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d at step 3749
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,044] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,044] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 at step 3749
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,053] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,054] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,054] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,054] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 at step 3749
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:457] scheduled batch [3749]: chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f prefix_computed=38290 -> 5876 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3749] total: 5876 tokens
(APIServer pid=1906905) INFO:     127.0.0.1:48392 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,346] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 29.7027 ms, throughput: 6.3126 GB/s; offload_time: 27.8534 ms, put_time: 1.8027 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,347] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 31.6688 ms, throughput: 5.9206 GB/s; offload_time: 29.0484 ms, put_time: 2.5788 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,351] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 35.5409 ms, throughput: 5.2756 GB/s; offload_time: 33.2369 ms, put_time: 2.2685 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,354] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 37.7730 ms, throughput: 4.9639 GB/s; offload_time: 33.5455 ms, put_time: 4.1764 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3748] wall-clock: submit=1777872302.707510 finish=1777872303.359737 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.35 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:570] schedule step 3750 running queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:572] running queue: id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f num_computed_tokens=44166 num_tokens=44166
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:580] schedule step 3750 waiting queue:
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3750 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(APIServer pid=1906905) INFO:     127.0.0.1:59952 - "GET /metrics HTTP/1.1" 200 OK
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,415] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,415] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad at step 3750
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3750] total: 0 tokens
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,635] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,635] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(APIServer pid=1906905) INFO:     127.0.0.1:48398 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1906905) INFO:     127.0.0.1:59952 - "GET /metrics HTTP/1.1" 200 OK
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,697] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 32.5115 ms, throughput: 5.5269 GB/s; offload_time: 30.0672 ms, put_time: 2.4073 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,699] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 33.5852 ms, throughput: 5.3502 GB/s; offload_time: 30.3126 ms, put_time: 3.2248 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,705] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 39.6157 ms, throughput: 4.5358 GB/s; offload_time: 37.7069 ms, put_time: 1.8760 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,704] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 39.4948 ms, throughput: 4.5497 GB/s; offload_time: 38.3431 ms, put_time: 1.1023 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3749] wall-clock: submit=1777872303.065827 finish=1777872303.709876 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:1726] [Stats tracking] Request chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f first_token_time: 1777872303.7101877 (sched_to_first_token: 1.7978887557983398), ttft: 9.484845161437988
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 349.99 ms, blocked_requests=6, cumulative_stall=32.774 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.35 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:580] schedule step 3751 waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3751 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,775] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,775] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,775] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 at step 3751
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,786] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3751] total: 0 tokens
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,787] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3750] wall-clock: submit=1777872303.424342 finish=1777872303.787366 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 75.41 ms, blocked_requests=7, cumulative_stall=32.849 s
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,788] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.08 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3752 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3752] total: 0 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3751] wall-clock: submit=1777872303.787231 finish=1777872303.861009 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 72.25 ms, blocked_requests=7, cumulative_stall=32.921 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.07 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3753 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3753] total: 0 tokens
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,934] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3752] wall-clock: submit=1777872303.860902 finish=1777872303.934533 (time.time epoch s)
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,934] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 72.07 ms, blocked_requests=7, cumulative_stall=32.993 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.07 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3754 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(APIServer pid=1906905) INFO:     127.0.0.1:48412 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,961] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,961] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:464] scheduled batch [3754] total: 0 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:528] batch [3753] wall-clock: submit=1777872303.934446 finish=1777872304.007967 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:1242] GPU idle (KV connector stall): 71.99 ms, blocked_requests=7, cumulative_stall=33.065 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:1178] Step time: 0.07 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:580] schedule step 3755 waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-67d2cd65-eeb5-4861-a8e2-72d3f38b30ae-a860ee77 num_computed_tokens=0 num_tokens=15227
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:590] schedule step 3755 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) [32;20m[2026-05-04 05:25:04,010] LMCache INFO:[0m Reqid: chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c, Total tokens 69849, Inference Engine computed tokens: 512, LMCache hit tokens: 55808, need to load: 55296 [3m(vllm_v1_adapter.py:1327:lmcache.integration.vllm.vllm_v1_adapter)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:678] prefetch completed for request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c at step 3755
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:876] [Stats tracking] Request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c queue_to_scheduled: 6.6983842849731445, prefetch_interval: 0.9912011623382568, scheduled_time: 1777872304.0123525, prefetch_steps: 6 (start=3749, end=3755), prompt_tokens: 69849, num_computed_tokens: 55808, num_tokens: 69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:457] scheduled batch [3755]: chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c prefix_computed=55808 -> 6144 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:464] scheduled batch [3755] total: 6144 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:528] batch [3754] wall-clock: submit=1777872304.007883 finish=1777872304.015702 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:1178] Step time: 0.01 s

---

The output of `python collect_env.py`

RAW_BUFFERClick to expand / collapse

Proposal to improve performance

When using vLLM with LMCache, KV prefetching is spawned in the below snippet:

if request.num_computed_tokens == 0:
    # Get locally-cached tokens.
    new_computed_blocks, num_new_local_computed_tokens = (
        self.kv_cache_manager.get_computed_blocks(request)
    )

    # Get externally-cached tokens if using a KVConnector.
    if self.connector is not None:
        ext_tokens, load_kv_async = (
            self.connector.get_num_new_matched_tokens(
                request, num_new_local_computed_tokens
            )
        )

vLLM's scheduler, when trying to form a batch, polls the running queue first, then skipped_waiting and waiting queues. If there are sufficient tokens in the running queue, the other two queues won't even be checked, meaning the prefetch of newly joined requests in the wait queue are not scheduled on time.

Under heavy workloads where KV caches are read from disk, this can cause unnecessary GPU idle due to KV waiting, as illustrated by the below log (I added some loggings by myself):

(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:570] schedule step 3748 running queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:572] running queue: id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f num_computed_tokens=32146 num_tokens=44166
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:580] schedule step 3748 waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/engine/core.py:457] scheduled batch [3748]: chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f prefix_computed=32146 -> 6144 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:02 [v1/engine/core.py:464] scheduled batch [3748] total: 6144 tokens
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:02,991] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 16.8434 ms, throughput: 11.1320 GB/s; offload_time: 15.9973 ms, put_time: 0.8122 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:02 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:02,999] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 24.4035 ms, throughput: 7.6833 GB/s; offload_time: 20.9256 ms, put_time: 3.4275 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,000] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 26.3925 ms, throughput: 7.1043 GB/s; offload_time: 22.0000 ms, put_time: 4.3512 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,001] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 26.5598 ms, throughput: 7.0595 GB/s; offload_time: 23.4608 ms, put_time: 3.0463 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3747] wall-clock: submit=1777872302.436745 finish=1777872303.005471 (time.time epoch s)
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.30 s
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:570] schedule step 3749 running queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:572] running queue: id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f num_computed_tokens=38290 num_tokens=44166
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=6144, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:580] schedule step 3749 waiting queue:
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,010] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,010] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,010] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,011] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,011] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,011] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalCPUBackend hit 2 chunks out of 272, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,012] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,012] LMCache INFO:[0m Lookup chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c: Backend LocalDiskBackend hit 216 chunks out of 270, 55296 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c at step 3749
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,025] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,026] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,028] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(APIServer pid=1906905) INFO:     127.0.0.1:59952 - "GET /metrics HTTP/1.1" 200 OK
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,029] LMCache INFO:[0m Lookup chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768: Backend LocalCPUBackend hit 2 chunks out of 97, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 at step 3749
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,033] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,034] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,035] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalCPUBackend hit 2 chunks out of 284, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,035] LMCache INFO:[0m Lookup chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d: Backend LocalDiskBackend hit 174 chunks out of 282, 44544 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d at step 3749
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalCPUBackend hit 2 chunks out of 117, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,043] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,044] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,044] LMCache INFO:[0m Lookup chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529: Backend LocalDiskBackend hit 88 chunks out of 115, 22528 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 at step 3749
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,053] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,054] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,054] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,054] LMCache INFO:[0m Lookup chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5: Backend LocalCPUBackend hit 2 chunks out of 46, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 at step 3749
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:457] scheduled batch [3749]: chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f prefix_computed=38290 -> 5876 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3749] total: 5876 tokens
(APIServer pid=1906905) INFO:     127.0.0.1:48392 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,346] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 29.7027 ms, throughput: 6.3126 GB/s; offload_time: 27.8534 ms, put_time: 1.8027 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,347] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 31.6688 ms, throughput: 5.9206 GB/s; offload_time: 29.0484 ms, put_time: 2.5788 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,351] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 35.5409 ms, throughput: 5.2756 GB/s; offload_time: 33.2369 ms, put_time: 2.2685 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP2 pid=1908387) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,354] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 6144 out of total 6144 tokens. size: 0.1875 GB, cost 37.7730 ms, throughput: 4.9639 GB/s; offload_time: 33.5455 ms, put_time: 4.1764 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP1 pid=1908386) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(Worker_TP0 pid=1908385) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3748] wall-clock: submit=1777872302.707510 finish=1777872303.359737 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.35 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:570] schedule step 3750 running queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:572] running queue: id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f num_computed_tokens=44166 num_tokens=44166
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3888] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=5876, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:580] schedule step 3750 waiting queue:
(Worker_TP3 pid=1908388) DEBUG 05-04 05:25:03 [v1/worker/gpu_model_runner.py:3909] ubatch_slices: None, ubatch_slices_padded: None
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3750 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(APIServer pid=1906905) INFO:     127.0.0.1:59952 - "GET /metrics HTTP/1.1" 200 OK
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalCPUBackend hit 2 chunks out of 235, 512 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,414] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,415] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,415] LMCache INFO:[0m Lookup chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad: Backend LocalDiskBackend hit 154 chunks out of 233, 39424 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad at step 3750
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3750] total: 0 tokens
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,635] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,635] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(APIServer pid=1906905) INFO:     127.0.0.1:48398 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1906905) INFO:     127.0.0.1:59952 - "GET /metrics HTTP/1.1" 200 OK
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,697] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 32.5115 ms, throughput: 5.5269 GB/s; offload_time: 30.0672 ms, put_time: 2.4073 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,699] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 33.5852 ms, throughput: 5.3502 GB/s; offload_time: 30.3126 ms, put_time: 3.2248 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,705] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 39.6157 ms, throughput: 4.5358 GB/s; offload_time: 37.7069 ms, put_time: 1.8760 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,704] LMCache INFO:[0m [req_id=chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f] Stored 5888 out of total 5888 tokens. size: 0.1797 GB, cost 39.4948 ms, throughput: 4.5497 GB/s; offload_time: 38.3431 ms, put_time: 1.1023 ms [3m(cache_engine.py:552:lmcache.v1.cache_engine)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3749] wall-clock: submit=1777872303.065827 finish=1777872303.709876 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:1726] [Stats tracking] Request chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f first_token_time: 1777872303.7101877 (sched_to_first_token: 1.7978887557983398), ttft: 9.484845161437988
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 349.99 ms, blocked_requests=6, cumulative_stall=32.774 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.35 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:580] schedule step 3751 waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3751 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,775] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP1 pid=1908386) [32;20m[2026-05-04 05:25:03,775] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,775] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:663] prefetch started for request chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 at step 3751
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,786] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3751] total: 0 tokens
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,787] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3750] wall-clock: submit=1777872303.424342 finish=1777872303.787366 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 75.41 ms, blocked_requests=7, cumulative_stall=32.849 s
(Worker_TP2 pid=1908387) [32;20m[2026-05-04 05:25:03,788] LMCache INFO:[0m Lookup chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618: Backend LocalCPUBackend hit 50 chunks out of 108, 12800 tokens [3m(storage_manager.py:699:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.08 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3752 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3752] total: 0 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3751] wall-clock: submit=1777872303.787231 finish=1777872303.861009 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 72.25 ms, blocked_requests=7, cumulative_stall=32.921 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.07 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3753 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:464] scheduled batch [3753] total: 0 tokens
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,934] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:528] batch [3752] wall-clock: submit=1777872303.860902 finish=1777872303.934533 (time.time epoch s)
(Worker_TP0 pid=1908385) [32;20m[2026-05-04 05:25:03,934] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1242] GPU idle (KV connector stall): 72.07 ms, blocked_requests=7, cumulative_stall=32.993 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/engine/core.py:1178] Step time: 0.07 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:590] schedule step 3754 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:03 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(APIServer pid=1906905) INFO:     127.0.0.1:48412 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,961] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c with retrieved length 55808 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(Worker_TP3 pid=1908388) [32;20m[2026-05-04 05:25:03,961] LMCache INFO:[0m Responding to scheduler for lookup id chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 with retrieved length 512 [3m(storage_manager.py:631:lmcache.v1.storage_backend.storage_manager)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:464] scheduled batch [3754] total: 0 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:528] batch [3753] wall-clock: submit=1777872303.934446 finish=1777872304.007967 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:1242] GPU idle (KV connector stall): 71.99 ms, blocked_requests=7, cumulative_stall=33.065 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:1178] Step time: 0.07 s
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:580] schedule step 3755 waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:582] waiting queue: id=chatcmpl-67d2cd65-eeb5-4861-a8e2-72d3f38b30ae-a860ee77 num_computed_tokens=0 num_tokens=15227
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:590] schedule step 3755 skipped waiting queue:
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c num_computed_tokens=0 num_tokens=69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-77758764-541a-458f-b0cd-7a498d7d9725-b422e768 num_computed_tokens=0 num_tokens=25058
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-929288cf-c28c-4824-82f1-097071a6b5fb-bcde072d num_computed_tokens=0 num_tokens=72925
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-e77e7c90-0f4b-48a8-8ac4-a3402751f4ad-87986529 num_computed_tokens=0 num_tokens=30049
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-a1e31d7a-a80c-4137-b839-30a7a9dc453b-a6db98f5 num_computed_tokens=0 num_tokens=11912
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-f8a358bb-ebd9-4ef4-8713-3c72813deab0-b902b6ad num_computed_tokens=0 num_tokens=60332
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:592] skipped waiting queue: id=chatcmpl-c2a835e7-7303-4bd1-87ae-dc9f3d926f47-a84ac618 num_computed_tokens=0 num_tokens=27888
(EngineCore pid=1907951) [32;20m[2026-05-04 05:25:04,010] LMCache INFO:[0m Reqid: chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c, Total tokens 69849, Inference Engine computed tokens: 512, LMCache hit tokens: 55808, need to load: 55296 [3m(vllm_v1_adapter.py:1327:lmcache.integration.vllm.vllm_v1_adapter)[0m
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:678] prefetch completed for request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c at step 3755
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/core/sched/scheduler.py:876] [Stats tracking] Request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c queue_to_scheduled: 6.6983842849731445, prefetch_interval: 0.9912011623382568, scheduled_time: 1777872304.0123525, prefetch_steps: 6 (start=3749, end=3755), prompt_tokens: 69849, num_computed_tokens: 55808, num_tokens: 69849
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:457] scheduled batch [3755]: chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c prefix_computed=55808 -> 6144 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:464] scheduled batch [3755] total: 6144 tokens
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:528] batch [3754] wall-clock: submit=1777872304.007883 finish=1777872304.015702 (time.time epoch s)
(EngineCore pid=1907951) DEBUG 05-04 05:25:04 [v1/engine/core.py:1178] Step time: 0.01 s

As we can see from the log, in the schedule step 3748 where request chatcmpl-6049f7b3-678b-4f32-a131-b5b984488089-a4ff032f is still far from completion, none of the 5 requests in the wait queue starts their KV prefetching. Until step 3749 where the scheduler cannot fill up the batch from the running queue, it turns to poll the waiting queue, and until then the prefetching of those 5 requests can be kicked off. But, since the first request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c in the waiting queue has substantial KV cache hit from disk, it takes a while to read them into LMCache's host memory, and vLLM has to sit idle until its KV cache is landed, which can be avoided if the prefetch is scheduled earlier on time. In my log, the scheduler sits idle from step 3749 when it starts KV prefetching for requests in the waiting queue, to step 3755 when the first request chatcmpl-37b24bca-3a70-4216-927f-08512cc4c5c0-b424a35c finishes its KV prefetching. Hundreds of milliseconds of GPU time get wasted.

This GPU idle time can even cascade when the system is loaded, i.e. when queue builds up, as every subsequent request will have their latency increased by the accumulated idle time, as is shown by my log.

I think this is a free performance gain if we can start the prefetching earlier. For example, SGLang kicks off the KV prefetching as soon as a request joins the wait queue. I don't know why vLLM handles prefetching as is, maybe there is a reason for that which I do not know, but I'd like to see the developers' thoughts on this matter. Thanks!

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #GPU compatibility #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Performance]: KV prefetching not spawned on time can cause unnecessary GPU idle [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Performance]: KV prefetching not spawned on time can cause unnecessary GPU idle [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Still need to ship something?

RELATED_DISCOVERY

TRENDING