vllm

1635 issues found

[Bug]: Accuracy drops ~20% when `--enable-prefix-caching` is used together with MTP speculative decoding (Qwen3.6 35B-A3B)

5/25/2026

[Usage]: How to run Qwen3.5 models on V100 given the conflicting requirements of transformers version and vLLM architecture support

5/25/2026

[Bug]: --kv-cache-dtype nvfp4 crashes at first request on SM120 instead of failing fast at init

5/25/2026

[Usage]: Intel Xeon Prefill Decode Disaggregation

5/25/2026

[Bug] FP8 block-quant loader rejects artifacts using 'weight_scale' rather than 'weight_scale_inv' naming

5/25/2026

[Bug]: TurboQuant crashes on T4/Turing (SM75) — FlashAttention capability not checked

5/25/2026

[Bug]: Prefix caching fails for incremental multimodal requests on Mamba-Attention hybrid models (Qwen3.5)

5/25/2026

[Bug]: AssertFail

5/25/2026

[Bug]: DeepSeek-V4-Pro DP+EP with deepep_low_latency fails during startup: expected scalar type Long but found Int

5/25/2026

[Bug]: Qwen3-VL-2B-Instruct Geo3K accuracy score lower than SGLang with deterministic sampling

5/25/2026

[Bug]: MiniMax M2.5 TP8EP8 gfx950 with AITER causes memory access fault

5/25/2026

[Bug]: AssertionError at kv_cache_utils.py:1042 — dense draft model + hybrid-attention main (DeltaNet+SWA) fails in unify_kv_cache_spec_page_size

5/26/2026

[Bug][DSV4][dynamo] nvidia/ops/attention.py wo_a access is dynamo-unsafe; forces --enforce-eager on Option-Y MTP artifacts (~10× decode slowdown on SM 12.0)

5/24/2026

[Bug][DSV4] compressor / indexer.weights_proj / indexer.wq_b hardcoded with quant_config=None; breaks load of artifacts that calibrate these attention sub-modules

5/24/2026

[Bug]: --hash-block-size 0 silently passes validation, crashes resolve_kv_cache_block_sizes with ZeroDivisionError

5/24/2026

NIXL KV transfer crash with asymmetric TP (prefill TP=4, decode TP=1) — vLLM 0.1.dev1+g2b51d23f6, NIXL 0.8.0

5/24/2026

[Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup

5/24/2026

[Bug]: --max-model-len 0 silently accepted; engine starts cleanly, requests scheduled with negative num_new_tokens

5/24/2026

[CI/Perf] Invalid JSON in serving benchmark config

5/24/2026

[Quantization] Humming MoE import uses a non-existent RoutedExperts submodule

5/24/2026