vllm - 💡(How to fix) Fix fix(metrics): Prometheus counter crash on negative prompt tokens with external KV transfer [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38839Fetched 2026-04-08 02:34:36
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Error Message

Error: vllm/v1/metrics/loggers.py:1316 — ValueError: Negative prompt token count Clamp prompt token count to max(0, ...) in the metrics logger, or track externally-received prompt tokens separately via the KV connector metadata. The negative value is a bookkeeping artifact of disaggregated serving, not an actual error.

Root Cause

When KV cache arrives externally via NIXL, the prompt tokens are not counted in the decode worker's local metrics. The logger computes prompt_tokens = total_tokens - completion_tokens, which goes negative because the decode worker never saw the prompt tokens — they were processed on the remote prefill worker.

RAW_BUFFERClick to expand / collapse

Bug

When using NixlConnector for disaggregated prefill/decode, the Prometheus metrics logger crashes with a negative prompt token count.

Error: vllm/v1/metrics/loggers.py:1316 — ValueError: Negative prompt token count

Environment

  • vLLM 0.18.0 (also reproduced on 0.16.0)
  • NixlConnector with kv_buffer_device=cpu
  • Decode worker receiving KV from remote prefill worker via NIXL LIBFABRIC over EFA RDMA
  • Cross-accelerator P/D: Trainium prefill → H100 decode

Reproduction

  1. Deploy disaggregated P/D with Dynamo frontend and NixlConnector
  2. Send 8+ concurrent requests to the frontend
  3. Decode engine crashes after processing ~4 requests

Root Cause

When KV cache arrives externally via NIXL, the prompt tokens are not counted in the decode worker's local metrics. The logger computes prompt_tokens = total_tokens - completion_tokens, which goes negative because the decode worker never saw the prompt tokens — they were processed on the remote prefill worker.

Suggested Fix

Clamp prompt token count to max(0, ...) in the metrics logger, or track externally-received prompt tokens separately via the KV connector metadata. The negative value is a bookkeeping artifact of disaggregated serving, not an actual error.

Impact

Kills the decode engine under sustained concurrent load with disaggregated inference. This is a production blocker for any deployment using NixlConnector with Prometheus metrics enabled.

extent analysis

TL;DR

Clamp the prompt token count to a non-negative value in the metrics logger to prevent crashes.

Guidance

  • Identify the line of code in vllm/v1/metrics/loggers.py:1316 where the ValueError occurs and modify it to clamp the prompt token count to max(0, ...) to prevent negative values.
  • Consider tracking externally-received prompt tokens separately via the KV connector metadata to improve accuracy in metrics logging.
  • Verify that the decode engine no longer crashes under sustained concurrent load with disaggregated inference after applying the fix.
  • Test the modified metrics logger with various input scenarios to ensure it handles different token counts correctly.

Example

# Example of clamping prompt token count to non-negative value
prompt_tokens = max(0, total_tokens - completion_tokens)

Notes

This fix assumes that the negative prompt token count is a bookkeeping artifact and not an actual error. If the negative count indicates a genuine issue, further investigation may be necessary.

Recommendation

Apply the workaround by clamping the prompt token count to a non-negative value, as this is a production blocker for deployments using NixlConnector with Prometheus metrics enabled.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING