vllm - 💡(How to fix) Fix [Bug]: vLLM only prints access logs, not performance statistics logs (v0.1.dev15830+g8d599d76a with deepseek-V4-flash) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41081Fetched 2026-04-29 06:12:30
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Participants
Timeline (top)
commented ×2mentioned ×2subscribed ×2labeled ×1
RAW_BUFFERClick to expand / collapse

Your current environment

Environment Information: • vLLM version: 0.1.dev15830+g8d599d76a

• Model: deepseek-V4-flash

• CUDA version: cu129

• GPU: H800

🐛 Describe the bug

Description:
When I run the vLLM API server with version 0.1.dev15830+g8d599d76a and the deepseek-V4-flash model, the logs only show basic access logs like

INFO: 127.0.0.1:32768 0 "POST v1/completions HTTP/1.1" 200 OK

But I'm not seeing the periodic performance statistics logs that include throughput, GPU KV cache usage, etc., for example:

13:57:45 Avg prompt throughput: 2598.6 tokens/s, Avg generation throughput: 1684.2 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.3%, Prefix cache hit rate: 96.9% 13:57:55 Avg prompt throughput: 83.6 tokens/s, Avg generation throughput: 1.8 tokens/s, Running: 11 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 96.9% 13:58:05 Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 11 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 96.9%

start code: VLLM_ENGINE_READY_TIMEOUT_S=1200 nohup vllm serve /mnt/algorithms/DeepSeek-V4-Flash --port 8000 --served-model-name DEEPSEEK-V4-284B-FLASH-TEST --trust-remote-code --kv-cache-dtype fp8 --block-size 256 --enable-expert-parallel --data-parallel-size 8 --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE","custom_ops":["all"]}' --tokenizer-mode deepseek_v4 --tool-call-parser deepseek_v4 --enable-auto-tool-choice --reasoning-parser deepseek_v4 &

Steps to Reproduce:

  1. Start vLLM API server with deepseek-V4-flash model
  2. Send inference requests to the server
  3. Check the console/log output

Expected Behavior: Periodic performance statistics logs should appear every few seconds showing throughput, GPU KV cache usage, request counts, etc.

Actual Behavior: Only basic HTTP access logs are printed, no performance statistics.

Additional Context: I've tried setting environment variables like VLLM_LOGGING_LEVEL=INFO and VLLM_LOG_STATS_INTERVAL=1, but still only get access logs. Are there specific configurations needed for this version or with the deepseek-V4-flash model to enable performance statistics logging?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Check the logging configuration and environment variables to ensure performance statistics logging is enabled.

Guidance

  • Verify that the VLLM_LOG_STATS_INTERVAL environment variable is set correctly and the value is reasonable for the expected log frequency.
  • Check the VLLM_LOGGING_LEVEL environment variable to ensure it is set to a level that includes performance statistics, such as DEBUG or INFO.
  • Review the documentation for the vllm command and the deepseek-V4-flash model to ensure that performance statistics logging is supported and enabled by default.
  • Consider adding the --log-stats flag or a similar option to the vllm serve command to explicitly enable performance statistics logging.

Notes

The issue may be related to the specific version of the vllm API server or the deepseek-V4-flash model, so checking the documentation and release notes for any known issues or configuration changes is recommended.

Recommendation

Apply workaround: Check and adjust the logging configuration and environment variables to enable performance statistics logging, as the issue is likely related to the logging settings rather than a bug in the vllm API server or the model.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: vLLM only prints access logs, not performance statistics logs (v0.1.dev15830+g8d599d76a with deepseek-V4-flash) [2 comments, 2 participants]