1635 issues found
[Bug]: Accuracy drops ~20% when `--enable-prefix-caching` is used together with MTP speculative decoding (Qwen3.6 35B-A3B)
5/25/2026
[Usage]: How to run Qwen3.5 models on V100 given the conflicting requirements of transformers version and vLLM architecture support
5/25/2026
[Bug]: --kv-cache-dtype nvfp4 crashes at first request on SM120 instead of failing fast at init
5/25/2026
[Usage]: Intel Xeon Prefill Decode Disaggregation
5/25/2026
[Bug] FP8 block-quant loader rejects artifacts using 'weight_scale' rather than 'weight_scale_inv' naming
5/25/2026
[Bug]: TurboQuant crashes on T4/Turing (SM75) — FlashAttention capability not checked
5/25/2026
[Bug]: Prefix caching fails for incremental multimodal requests on Mamba-Attention hybrid models (Qwen3.5)
5/25/2026
[Bug]: AssertFail
5/25/2026
[Bug]: DeepSeek-V4-Pro DP+EP with deepep_low_latency fails during startup: expected scalar type Long but found Int
5/25/2026
[Bug]: Qwen3-VL-2B-Instruct Geo3K accuracy score lower than SGLang with deterministic sampling
5/25/2026
[Bug]: MiniMax M2.5 TP8EP8 gfx950 with AITER causes memory access fault
5/25/2026
[Bug]: AssertionError at kv_cache_utils.py:1042 — dense draft model + hybrid-attention main (DeltaNet+SWA) fails in unify_kv_cache_spec_page_size
5/26/2026
[Bug][DSV4][dynamo] nvidia/ops/attention.py wo_a access is dynamo-unsafe; forces --enforce-eager on Option-Y MTP artifacts (~10× decode slowdown on SM 12.0)
5/24/2026
[Bug][DSV4] compressor / indexer.weights_proj / indexer.wq_b hardcoded with quant_config=None; breaks load of artifacts that calibrate these attention sub-modules
5/24/2026
[Bug]: --hash-block-size 0 silently passes validation, crashes resolve_kv_cache_block_sizes with ZeroDivisionError
5/24/2026
NIXL KV transfer crash with asymmetric TP (prefill TP=4, decode TP=1) — vLLM 0.1.dev1+g2b51d23f6, NIXL 0.8.0
5/24/2026
[Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup
5/24/2026
[Bug]: --max-model-len 0 silently accepted; engine starts cleanly, requests scheduled with negative num_new_tokens
5/24/2026
[CI/Perf] Invalid JSON in serving benchmark config
5/24/2026
[Quantization] Humming MoE import uses a non-existent RoutedExperts submodule
5/24/2026