vllm - 💡(How to fix) Fix [Bug]: vLLM only prints access logs, not performance statistics logs (v0.1.dev15830+g8d599d76a with deepseek-V4-flash) [2 comments, 2 participants]

vllm2026-04-28 02:48:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#41081•Fetched 2026-04-29 06:12:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

QiyaoHuang

Participants

njhill

QiyaoHuang

Timeline (top)

commented ×2mentioned ×2subscribed ×2labeled ×1

RAW_BUFFERClick to expand / collapse

Your current environment

Environment Information: • vLLM version: 0.1.dev15830+g8d599d76a

• Model: deepseek-V4-flash

• CUDA version: cu129

• GPU: H800

🐛 Describe the bug

Description:
When I run the vLLM API server with version 0.1.dev15830+g8d599d76a and the deepseek-V4-flash model, the logs only show basic access logs like

INFO: 127.0.0.1:32768 0 "POST v1/completions HTTP/1.1" 200 OK

But I'm not seeing the periodic performance statistics logs that include throughput, GPU KV cache usage, etc., for example:

13:57:45 Avg prompt throughput: 2598.6 tokens/s, Avg generation throughput: 1684.2 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.3%, Prefix cache hit rate: 96.9% 13:57:55 Avg prompt throughput: 83.6 tokens/s, Avg generation throughput: 1.8 tokens/s, Running: 11 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 96.9% 13:58:05 Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 11 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 96.9%

start code: VLLM_ENGINE_READY_TIMEOUT_S=1200 nohup vllm serve /mnt/algorithms/DeepSeek-V4-Flash --port 8000 --served-model-name DEEPSEEK-V4-284B-FLASH-TEST --trust-remote-code --kv-cache-dtype fp8 --block-size 256 --enable-expert-parallel --data-parallel-size 8 --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE","custom_ops":["all"]}' --tokenizer-mode deepseek_v4 --tool-call-parser deepseek_v4 --enable-auto-tool-choice --reasoning-parser deepseek_v4 &

Steps to Reproduce:

Start vLLM API server with deepseek-V4-flash model
Send inference requests to the server
Check the console/log output

Expected Behavior: Periodic performance statistics logs should appear every few seconds showing throughput, GPU KV cache usage, request counts, etc.

Actual Behavior: Only basic HTTP access logs are printed, no performance statistics.

Additional Context: I've tried setting environment variables like VLLM_LOGGING_LEVEL=INFO and VLLM_LOG_STATS_INTERVAL=1, but still only get access logs. Are there specific configurations needed for this version or with the deepseek-V4-flash model to enable performance statistics logging?

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Check the logging configuration and environment variables to ensure performance statistics logging is enabled.

Guidance

Verify that the VLLM_LOG_STATS_INTERVAL environment variable is set correctly and the value is reasonable for the expected log frequency.
Check the VLLM_LOGGING_LEVEL environment variable to ensure it is set to a level that includes performance statistics, such as DEBUG or INFO.
Review the documentation for the vllm command and the deepseek-V4-flash model to ensure that performance statistics logging is supported and enabled by default.
Consider adding the --log-stats flag or a similar option to the vllm serve command to explicitly enable performance statistics logging.

Notes

The issue may be related to the specific version of the vllm API server or the deepseek-V4-flash model, so checking the documentation and release notes for any known issues or configuration changes is recommended.

Recommendation

Apply workaround: Check and adjust the logging configuration and environment variables to enable performance statistics logging, as the issue is likely related to the logging settings rather than a bug in the vllm API server or the model.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #environment variable #authentication issue #prompt issue #agent setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: vLLM only prints access logs, not performance statistics logs (v0.1.dev15830+g8d599d76a with deepseek-V4-flash) [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: vLLM only prints access logs, not performance statistics logs (v0.1.dev15830+g8d599d76a with deepseek-V4-flash) [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING