vllm - 💡(How to fix) Fix fix(metrics): Prometheus counter crash on negative prompt tokens with external KV transfer [1 participants]

vllm2026-04-02 19:26:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38839•Fetched 2026-04-08 02:34:36

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dmvevents

Participants

dmvevents

Error Message

Error: vllm/v1/metrics/loggers.py:1316 — ValueError: Negative prompt token count Clamp prompt token count to max(0, ...) in the metrics logger, or track externally-received prompt tokens separately via the KV connector metadata. The negative value is a bookkeeping artifact of disaggregated serving, not an actual error.

Root Cause

When KV cache arrives externally via NIXL, the prompt tokens are not counted in the decode worker's local metrics. The logger computes prompt_tokens = total_tokens - completion_tokens, which goes negative because the decode worker never saw the prompt tokens — they were processed on the remote prefill worker.

RAW_BUFFERClick to expand / collapse

Bug

When using NixlConnector for disaggregated prefill/decode, the Prometheus metrics logger crashes with a negative prompt token count.

Error: vllm/v1/metrics/loggers.py:1316 — ValueError: Negative prompt token count

Environment

vLLM 0.18.0 (also reproduced on 0.16.0)
NixlConnector with kv_buffer_device=cpu
Decode worker receiving KV from remote prefill worker via NIXL LIBFABRIC over EFA RDMA
Cross-accelerator P/D: Trainium prefill → H100 decode

Reproduction

Deploy disaggregated P/D with Dynamo frontend and NixlConnector
Send 8+ concurrent requests to the frontend
Decode engine crashes after processing ~4 requests

Root Cause

Suggested Fix

Clamp prompt token count to max(0, ...) in the metrics logger, or track externally-received prompt tokens separately via the KV connector metadata. The negative value is a bookkeeping artifact of disaggregated serving, not an actual error.

Impact

Kills the decode engine under sustained concurrent load with disaggregated inference. This is a production blocker for any deployment using NixlConnector with Prometheus metrics enabled.

extent analysis

TL;DR

Clamp the prompt token count to a non-negative value in the metrics logger to prevent crashes.

Guidance

Identify the line of code in vllm/v1/metrics/loggers.py:1316 where the ValueError occurs and modify it to clamp the prompt token count to max(0, ...) to prevent negative values.
Consider tracking externally-received prompt tokens separately via the KV connector metadata to improve accuracy in metrics logging.
Verify that the decode engine no longer crashes under sustained concurrent load with disaggregated inference after applying the fix.
Test the modified metrics logger with various input scenarios to ensure it handles different token counts correctly.

Example

# Example of clamping prompt token count to non-negative value
prompt_tokens = max(0, total_tokens - completion_tokens)

Notes

This fix assumes that the negative prompt token count is a bookkeeping artifact and not an actual error. If the negative count indicates a genuine issue, further investigation may be necessary.

Recommendation

Apply the workaround by clamping the prompt token count to a non-negative value, as this is a production blocker for deployments using NixlConnector with Prometheus metrics enabled.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix fix(metrics): Prometheus counter crash on negative prompt tokens with external KV transfer [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Bug

Environment

Reproduction

Root Cause

Suggested Fix

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix fix(metrics): Prometheus counter crash on negative prompt tokens with external KV transfer [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Bug

Environment

Reproduction

Root Cause

Suggested Fix

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING