vllm - ✅(Solved) Fix [RFC]: Logprobs/Logits Semantics and Determinism Across the vLLM Ecosystem [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#42259Fetched 2026-05-11 03:13:38
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2

PR fix notes

PR #36539: Fix prompt_logprobs to respect logprobs_mode

Description (problem / solution / changelog)

Summary

  • make prompt-side score computation mode-aware in V1 GPU model runner
  • use logits for *_logits modes and log-softmax for *_logprobs modes
  • add regression test ensuring prompt values differ between raw_logits and raw_logprobs

Why

Issue #35832 reports that prompt_logprobs ignores logprobs_mode. I could reproduce on both v0.16.0 and v0.17.0 before this patch.

Local verification

  • reproduced before fix using local script (scripts/repro_vllm_35832.py)
  • after patch, prompt values now split by mode (logits vs logprobs), e.g. same token had 7.78125 in logits mode and -6.6573 in logprobs mode

Notes

  • This patch fixes the mode-selection bug for prompt-side returned values.
  • Option 2 workaround (temporary): until merged/released, users can avoid relying on prompt_logprobs mode semantics and compute prompt scoring externally when exact raw-vs-processed interpretation is required.

Fixes #35832

Changed files

  • tests/v1/sample/test_sampling_params_e2e.py (modified, +21/-0)
  • tests/v1/worker/gpu/test_mode_utils.py (added, +36/-0)
  • vllm/v1/worker/gpu/sample/mode_utils.py (added, +17/-0)
  • vllm/v1/worker/gpu_model_runner.py (modified, +9/-3)

PR #42245: [Bugfix] Fix prompt_logprobs non-determinism with prefix caching (issue #42019)

Description (problem / solution / changelog)

Summary

Fixes #42019: prompt_logprobs values differ depending on request order when prefix caching is enabled.

Root cause: LogprobsTensors.empty_cpu() allocates tensors with torch.empty (uninitialized memory). When a prefix cache hit covers N tokens, positions [0:N] are never written by the current request — they retain stale values from a previous request's computation. This makes prompt_logprobs non-deterministic with respect to request ordering.

Fix: Replace torch.empty / torch.empty_like with torch.zeros / torch.zeros_like in LogprobsTensors.empty_cpu(). Unwritten positions are now always zero, making results order-independent.

This is distinct from #41411, which fixed a different bug (chunked prefill skipping the last prompt token). The torch.empty uninitialized-memory issue remains in main after that merge.

Changes

  • vllm/v1/outputs.py: LogprobsTensors.empty_cpu() — 3-line change, emptyzeros
  • tests/v1/test_prompt_logprobs_prefix_cache.py: regression test that submits the same prompts in two different orders and asserts prompt_logprobs are bit-identical, for both enable_prefix_caching=True and False

Test Plan

pytest tests/v1/test_prompt_logprobs_prefix_cache.py -v

Note: Local environment constraints prevented running the test (precompiled .so mismatch). The test is included for CI and reviewer verification.

AI Assistance

This PR was developed with AI assistance (Claude). All changed lines have been reviewed by the human submitter.

Changed files

  • tests/v1/test_prompt_logprobs_prefix_cache.py (added, +75/-0)
  • vllm/v1/outputs.py (modified, +3/-3)

PR #36746: [Bugfix] Deduplicate sampled token in top_logprobs output

Description (problem / solution / changelog)

Purpose

Fixes #36660

When the sampled token is already among the top-k logprobs (common for greedy or near-greedy decoding), compute_topk_logprobs() returns the same token twice - once as the sampled token (column 0) and once in the top-k list. This causes duplicate entries in the API response's top_logprobs field.

Root Cause

In vllm/v1/worker/gpu/sample/logprob.py, lines 103-106:

logprob_token_ids = sampled_token_ids.unsqueeze(-1)  # [batch, 1]
if num_logprobs > 0:
    topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices
    logprob_token_ids = torch.cat((logprob_token_ids, topk_indices), dim=1)

The sampled token is prepended, then topk tokens are appended without checking for overlap.

Fix

  • Request num_logprobs + 1 from torch.topk (one extra to account for potential duplicate)
  • Identify the first occurrence of the sampled token in the top-k results
  • Remove it using a stable sort that preserves the original ranking order
  • Slice to exactly num_logprobs entries

This handles both cases correctly:

  • Sampled token IS in top-k: removed from top-k, leaving exactly num_logprobs unique additional entries
  • Sampled token is NOT in top-k: no removal needed, the extra entry is simply sliced off

Test Plan

  • Verified with ruff check and ruff format (passes)
  • The fix uses only standard PyTorch operations (topk, gather, sort) with no new dependencies
  • Existing logprob tests in tests/v1/sample/test_logprobs.py cover the end-to-end behavior
  • Manual verification: with top_logprobs=5 and greedy sampling, the sampled token should appear only once in the response

Changed files

  • vllm/v1/worker/gpu/sample/logprob.py (modified, +21/-1)
RAW_BUFFERClick to expand / collapse

Motivation

Several open bugs and in-flight fixes across vLLM, vLLM-Ascend, and VERL point to the same underlying problem: the semantics and determinism of returned logits/logprobs are not specified tightly enough.

This shows up as:

  • prompt-side values ignoring logprobs_mode
  • request-order-dependent prompt_logprobs under prefix caching
  • duplicated sampled tokens in top_logprobs
  • backend-specific mode propagation gaps
  • rollout/trainer logprob mismatch in VERL

Problem Statement

The same logical request can produce different answers depending on:

  • whether prompt-side or decode-side probabilities are returned
  • raw vs processed logprobs_mode
  • prefix caching being enabled
  • streaming delta behavior
  • sampled token overlap with top-k/top-p output
  • backend implementation details in core vLLM, vLLM-Ascend, or VERL integration paths

That creates user-visible wrong values, non-deterministic scoring, and training signal mismatch.

Proposed Change

Define a shared contract for:

  • logits
  • logprobs
  • prompt_logprobs
  • top_logprobs

The contract should cover:

  • raw vs processed semantics
  • deterministic behavior with prefix caching
  • streaming delta behavior
  • de-duplication rules for sampled token vs top-k list
  • backend parity requirements across core vLLM, vLLM-Ascend, and VERL-facing rollout paths

Rollout Plan

  1. Finish and land core semantic fixes and regression tests in vLLM.
  2. Propagate the same contract into vLLM-Ascend sampler/backend paths.
  3. Align VERL rollout/trainer behavior with the same raw/processed definitions.
  4. Keep this issue as the single working tracker.

Todo List (last 3 months)

Acceptance Criteria

  • The meaning of returned logits/logprobs is documented and consistent.
  • Prompt-side values are mode-aware.
  • Prefix-cached results are deterministic.
  • top_logprobs never duplicates the sampled token.
  • Ascend and VERL either follow the same contract or document explicit deviations.

Feedback Period

At least one week.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [RFC]: Logprobs/Logits Semantics and Determinism Across the vLLM Ecosystem [3 pull requests, 1 participants]