vllm - 💡(How to fix) Fix [Bug]: VLLM running qwen3.6 for image inference occasionally reports 500 Internal Server Error [4 comments, 2 participants]

Error Message

What causes this 500 error, or how to output the detailed reason for the 500 error The error log is as follows： (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) INFO: 10.10.7.200:60696 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) INFO: 10.10.7.200:60696 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Your current environment

VLLM version: 0.16.0rc2.dev496+g4a9c07a0a System environment: Ubuntu 22.04.5 LTS Graphics card type: H200 series

What causes this 500 error, or how to output the detailed reason for the 500 error

🐛 Describe the bug

Service startup command: VLLM_LOGGING_LEVEL=DEBUG vllm serve /sharedFile/jiuding/models/models/Qwen/Qwen3___6-35B-A3B --port 6008 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder --enable-log-requests --trust-remote-code --served-model-name JinQuan2604

The error log is as follows：

(APIServer pid=6051) DEBUG 04-25 02:26:25 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) DEBUG 04-25 02:26:35 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) DEBUG 04-25 02:26:45 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) DEBUG 04-25 02:26:55 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) INFO: 10.10.7.200:43364 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) DEBUG 04-25 02:27:05 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) DEBUG 04-25 02:27:15 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) DEBUG 04-25 02:27:25 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) INFO: 10.10.7.200:60696 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) DEBUG 04-25 02:27:35 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) INFO: 10.10.7.200:60696 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=6051) DEBUG 04-25 02:27:45 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6% (APIServer pid=6051) DEBUG 04-25 02:27:55 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 98.6%

extent analysis

TL;DR

The 500 Internal Server Error is likely caused by an issue with the VLLM service, and increasing the logging level to DEBUG has not provided sufficient information to identify the root cause.

Guidance

Check the VLLM service logs for any error messages that may indicate the cause of the 500 Internal Server Error.
Verify that the VLLM_LOGGING_LEVEL environment variable is set correctly and that the logging level is sufficient to capture error messages.
Consider increasing the logging level to a more verbose level, such as TRACE, to gather more detailed information about the error.
Review the service startup command and configuration to ensure that all required parameters are set correctly.

Example

No code snippet is provided as the issue does not contain sufficient information to create a relevant example.

Notes

The provided log messages do not contain any error messages that would indicate the cause of the 500 Internal Server Error. Additional logging or debugging may be necessary to identify the root cause of the issue.

Recommendation

Apply workaround: Increase the logging level to a more verbose level, such as TRACE, to gather more detailed information about the error. This may help identify the root cause of the issue and provide a more detailed error message.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: VLLM running qwen3.6 for image inference occasionally reports 500 Internal Server Error [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Your current environment

🐛 Describe the bug

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: VLLM running qwen3.6 for image inference occasionally reports 500 Internal Server Error [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Your current environment

🐛 Describe the bug

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING