vllm - ✅(Solved) Fix [Docs] Document NIXL KV connector metrics aggregation semantics [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41230Fetched 2026-04-30 06:19:26
View on GitHub
Comments
3
Participants
3
Timeline
12
Reactions
1
Assignees
Timeline (top)
commented ×3mentioned ×3subscribed ×3assigned ×1

The NIXL KV connector logs transfer metrics periodically:

KV Transfer metrics: Num successful transfers=4, Avg xfer time (ms)=1.381, P90 xfer time (ms)=2.601, Avg post time (ms)=0.672, P90 post time (ms)=0.801, Avg MB per transfer=2.25, Throughput (MB/s)=1629.549, Avg number of descriptors=72.0

Currently there is no documentation explaining what these metrics represent, especially in the context of multi-rank (TP > 1) deployments. This has already caused confusion among users.

Root Cause

This is unintuitive because users may expect metrics to reflect per-engine totals or aggregate system throughput.

Fix Action

Fixed

PR fix notes

PR #41259: docs: clarify NIXL KV transfer metrics aggregation

Description (problem / solution / changelog)

Purpose

Fixes #41230.

This PR documents the aggregation semantics for NIXL KV connector transfer metrics. In TP > 1 deployments, NIXL transfer observations are recorded per rank and then aggregated before summary stats are computed. The updated docs and docstrings clarify that:

  • Num successful transfers is the total count across rank-level transfers.
  • averages and P90 values are computed over the combined rank-level observation pool.
  • Avg MB per transfer is per rank-level transfer, not per engine-level KV operation.
  • Throughput (MB/s) is total MB divided by summed rank-level transfer time, not aggregate system throughput over wall-clock time.

I checked for duplicate open PRs using GitHub search and did not find an existing PR addressing #41230.

AI assistance was used to draft and apply this documentation update. I reviewed the changed files.

Test Plan

pre-commit run ruff-check --files vllm/distributed/kv_transfer/kv_connector/v1/metrics.py vllm/distributed/kv_transfer/kv_connector/v1/nixl/stats.py
pre-commit run markdownlint-cli2 --files docs/usage/metrics.md

## Changed files

- `docs/usage/metrics.md` (modified, +24/-0)
- `vllm/distributed/kv_transfer/kv_connector/v1/metrics.py` (modified, +25/-0)
- `vllm/distributed/kv_transfer/kv_connector/v1/nixl/stats.py` (modified, +25/-2)

Code Example

KV Transfer metrics: Num successful transfers=4, Avg xfer time (ms)=1.381, P90 xfer time (ms)=2.601, Avg post time (ms)=0.672, P90 post time (ms)=0.801, Avg MB per transfer=2.25, Throughput (MB/s)=1629.549, Avg number of descriptors=72.0
RAW_BUFFERClick to expand / collapse

Summary

The NIXL KV connector logs transfer metrics periodically:

KV Transfer metrics: Num successful transfers=4, Avg xfer time (ms)=1.381, P90 xfer time (ms)=2.601, Avg post time (ms)=0.672, P90 post time (ms)=0.801, Avg MB per transfer=2.25, Throughput (MB/s)=1629.549, Avg number of descriptors=72.0

Currently there is no documentation explaining what these metrics represent, especially in the context of multi-rank (TP > 1) deployments. This has already caused confusion among users.

Current behavior

All metrics are aggregated across all TP ranks before summary stats are computed:

  1. Each TP rank independently records per-transfer telemetry (transfer_duration, post_duration, bytes_transferred, num_descriptors) via NixlKVConnectorStats.record_transfer() in stats.py.
  2. Stats from all ranks are concatenated via aggregate() (list.extend()).
  3. reduce() computes averages, percentiles, and throughput over the combined pool of observations from all ranks.

This means:

  • "Num successful transfers" is the total count across all ranks, not per-rank.
  • "Avg MB per transfer" is the average over all individual rank-level transfers, not the total bytes moved for a single KV cache transfer operation.
  • "Throughput (MB/s)" is total_MB_all_ranks / total_time_all_ranks — effectively an average per-rank throughput, not the aggregate system throughput.
  • Percentiles (P90) are computed over the combined distribution of all ranks' transfer times.

This is unintuitive because users may expect metrics to reflect per-engine totals or aggregate system throughput.

What needs to be documented

  1. Docstrings in stats.py: Add clear documentation to NixlKVConnectorStats explaining that stats are aggregated across all TP ranks and what each metric represents in that context.
  2. Inline comments in reduce(): Clarify the semantics of throughput and averages — that they are per-rank averages over the combined observation pool.
  3. Docstrings in metrics.py: Document the observe()aggregate()reduce()log() pipeline and the fact that stats arrive pre-aggregated across workers.
  4. (Optional) Docs page: Add a section to the disaggregated serving documentation explaining how to interpret the KV Transfer metrics log line.

Relevant files

  • vllm/distributed/kv_transfer/kv_connector/v1/nixl/stats.pyNixlKVConnectorStats (recording, aggregation, reduction)
  • vllm/distributed/kv_transfer/kv_connector/v1/metrics.pyKVConnectorLogging (observe/log pipeline), KVConnectorStats (base class)

Context

See related discussion: metrics are aggregated across ranks rather than reported per-rank or per-engine. This is a deliberate design choice (fire-and-forget from workers), but it needs to be clearly documented so users can correctly interpret the numbers.

extent analysis

TL;DR

To address the confusion around KV Transfer metrics, documentation should be added to explain that metrics are aggregated across all TP ranks.

Guidance

  • Add docstrings to NixlKVConnectorStats in stats.py to clarify the aggregation of metrics across TP ranks.
  • Include inline comments in reduce() to explain the semantics of throughput and averages.
  • Document the observe()aggregate()reduce()log() pipeline in metrics.py to provide context on how stats are collected and reported.
  • Consider adding a section to the documentation explaining how to interpret the KV Transfer metrics log line.

Example

No code snippet is provided as the issue focuses on documentation rather than code changes.

Notes

The current implementation is a deliberate design choice, but clear documentation is necessary to avoid user confusion.

Recommendation

Apply workaround: Add documentation to explain the aggregation of metrics across TP ranks, as this will help users correctly interpret the numbers without requiring changes to the existing implementation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Docs] Document NIXL KV connector metrics aggregation semantics [1 pull requests, 3 comments, 3 participants]