PR fix notes

PR #36539: Fix prompt_logprobs to respect logprobs_mode

Repository: vllm-project/vllm
Author: fede-kamel
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/36539

Description (problem / solution / changelog)

Summary

make prompt-side score computation mode-aware in V1 GPU model runner
use logits for *_logits modes and log-softmax for *_logprobs modes
add regression test ensuring prompt values differ between raw_logits and raw_logprobs

Why

Issue #35832 reports that prompt_logprobs ignores logprobs_mode. I could reproduce on both v0.16.0 and v0.17.0 before this patch.

Local verification

reproduced before fix using local script (scripts/repro_vllm_35832.py)
after patch, prompt values now split by mode (logits vs logprobs), e.g. same token had 7.78125 in logits mode and -6.6573 in logprobs mode

Notes

This patch fixes the mode-selection bug for prompt-side returned values.
Option 2 workaround (temporary): until merged/released, users can avoid relying on prompt_logprobs mode semantics and compute prompt scoring externally when exact raw-vs-processed interpretation is required.

Fixes #35832

Changed files

tests/v1/sample/test_sampling_params_e2e.py (modified, +21/-0)
tests/v1/worker/gpu/test_mode_utils.py (added, +36/-0)
vllm/v1/worker/gpu/sample/mode_utils.py (added, +17/-0)
vllm/v1/worker/gpu_model_runner.py (modified, +9/-3)

PR #42245: [Bugfix] Fix prompt_logprobs non-determinism with prefix caching (issue #42019)

Repository: vllm-project/vllm
Author: factnn
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/42245

Description (problem / solution / changelog)

Summary

Fixes #42019: prompt_logprobs values differ depending on request order when prefix caching is enabled.

Root cause: LogprobsTensors.empty_cpu() allocates tensors with torch.empty (uninitialized memory). When a prefix cache hit covers N tokens, positions [0:N] are never written by the current request — they retain stale values from a previous request's computation. This makes prompt_logprobs non-deterministic with respect to request ordering.

Fix: Replace torch.empty / torch.empty_like with torch.zeros / torch.zeros_like in LogprobsTensors.empty_cpu(). Unwritten positions are now always zero, making results order-independent.

This is distinct from #41411, which fixed a different bug (chunked prefill skipping the last prompt token). The torch.empty uninitialized-memory issue remains in main after that merge.

Changes

vllm/v1/outputs.py: LogprobsTensors.empty_cpu() — 3-line change, empty → zeros
tests/v1/test_prompt_logprobs_prefix_cache.py: regression test that submits the same prompts in two different orders and asserts prompt_logprobs are bit-identical, for both enable_prefix_caching=True and False

Test Plan

pytest tests/v1/test_prompt_logprobs_prefix_cache.py -v

Note: Local environment constraints prevented running the test (precompiled .so mismatch). The test is included for CI and reviewer verification.

AI Assistance

This PR was developed with AI assistance (Claude). All changed lines have been reviewed by the human submitter.

Changed files

tests/v1/test_prompt_logprobs_prefix_cache.py (added, +75/-0)
vllm/v1/outputs.py (modified, +3/-3)

PR #36746: [Bugfix] Deduplicate sampled token in top_logprobs output

Repository: vllm-project/vllm
Author: mvanhorn
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/36746

Description (problem / solution / changelog)

Purpose

Fixes #36660

When the sampled token is already among the top-k logprobs (common for greedy or near-greedy decoding), compute_topk_logprobs() returns the same token twice - once as the sampled token (column 0) and once in the top-k list. This causes duplicate entries in the API response's top_logprobs field.

Root Cause

In vllm/v1/worker/gpu/sample/logprob.py, lines 103-106:

logprob_token_ids = sampled_token_ids.unsqueeze(-1)  # [batch, 1]
if num_logprobs > 0:
    topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices
    logprob_token_ids = torch.cat((logprob_token_ids, topk_indices), dim=1)

The sampled token is prepended, then topk tokens are appended without checking for overlap.

Fix

Request num_logprobs + 1 from torch.topk (one extra to account for potential duplicate)
Identify the first occurrence of the sampled token in the top-k results
Remove it using a stable sort that preserves the original ranking order
Slice to exactly num_logprobs entries

This handles both cases correctly:

Sampled token IS in top-k: removed from top-k, leaving exactly num_logprobs unique additional entries
Sampled token is NOT in top-k: no removal needed, the extra entry is simply sliced off

Test Plan

Verified with ruff check and ruff format (passes)
The fix uses only standard PyTorch operations (topk, gather, sort) with no new dependencies
Existing logprob tests in tests/v1/sample/test_logprobs.py cover the end-to-end behavior
Manual verification: with top_logprobs=5 and greedy sampling, the sampled token should appear only once in the response

Changed files

vllm/v1/worker/gpu/sample/logprob.py (modified, +21/-1)

Motivation

Several open bugs and in-flight fixes across vLLM, vLLM-Ascend, and VERL point to the same underlying problem: the semantics and determinism of returned logits/logprobs are not specified tightly enough.

This shows up as:

prompt-side values ignoring logprobs_mode

request-order-dependent prompt_logprobs under prefix caching

duplicated sampled tokens in top_logprobs

backend-specific mode propagation gaps

rollout/trainer logprob mismatch in VERL

Problem Statement

The same logical request can produce different answers depending on:

whether prompt-side or decode-side probabilities are returned

raw vs processed logprobs_mode

prefix caching being enabled

streaming delta behavior

sampled token overlap with top-k/top-p output

backend implementation details in core vLLM, vLLM-Ascend, or VERL integration paths

That creates user-visible wrong values, non-deterministic scoring, and training signal mismatch.

Proposed Change

Define a shared contract for:

logits

logprobs

prompt_logprobs

top_logprobs

The contract should cover:

raw vs processed semantics

deterministic behavior with prefix caching

streaming delta behavior

de-duplication rules for sampled token vs top-k list

backend parity requirements across core vLLM, vLLM-Ascend, and VERL-facing rollout paths

vllm - ✅(Solved) Fix [RFC]: Logprobs/Logits Semantics and Determinism Across the vLLM Ecosystem [3 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #36539: Fix prompt_logprobs to respect logprobs_mode

Description (problem / solution / changelog)

Summary

Why

Local verification

Notes

Changed files

PR #42245: [Bugfix] Fix prompt_logprobs non-determinism with prefix caching (issue #42019)

Description (problem / solution / changelog)

Summary

Changes

Test Plan

AI Assistance

Changed files

PR #36746: [Bugfix] Deduplicate sampled token in top_logprobs output

Description (problem / solution / changelog)

Purpose

Root Cause

Fix

Test Plan

Changed files

Motivation

Problem Statement

Proposed Change

Rollout Plan

Todo List (last 3 months)

Acceptance Criteria

Feedback Period

Still need to ship something?

RELATED_DISCOVERY

TRENDING