vllm - ✅(Solved) Fix [Bug]: Duplicate token with different logprob when requesting top [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36660Fetched 2026-04-08 00:35:34
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Participants
Timeline (top)
referenced ×2commented ×1cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #36746: [Bugfix] Deduplicate sampled token in top_logprobs output

Description (problem / solution / changelog)

Purpose

Fixes #36660

When the sampled token is already among the top-k logprobs (common for greedy or near-greedy decoding), compute_topk_logprobs() returns the same token twice - once as the sampled token (column 0) and once in the top-k list. This causes duplicate entries in the API response's top_logprobs field.

Root Cause

In vllm/v1/worker/gpu/sample/logprob.py, lines 103-106:

logprob_token_ids = sampled_token_ids.unsqueeze(-1)  # [batch, 1]
if num_logprobs > 0:
    topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices
    logprob_token_ids = torch.cat((logprob_token_ids, topk_indices), dim=1)

The sampled token is prepended, then topk tokens are appended without checking for overlap.

Fix

  • Request num_logprobs + 1 from torch.topk (one extra to account for potential duplicate)
  • Identify the first occurrence of the sampled token in the top-k results
  • Remove it using a stable sort that preserves the original ranking order
  • Slice to exactly num_logprobs entries

This handles both cases correctly:

  • Sampled token IS in top-k: removed from top-k, leaving exactly num_logprobs unique additional entries
  • Sampled token is NOT in top-k: no removal needed, the extra entry is simply sliced off

Test Plan

  • Verified with ruff check and ruff format (passes)
  • The fix uses only standard PyTorch operations (topk, gather, sort) with no new dependencies
  • Existing logprob tests in tests/v1/sample/test_logprobs.py cover the end-to-end behavior
  • Manual verification: with top_logprobs=5 and greedy sampling, the sampled token should appear only once in the response

Changed files

  • vllm/v1/worker/gpu/sample/logprob.py (modified, +21/-1)
RAW_BUFFERClick to expand / collapse

Your current environment

I am using vLLM 0.14 within the AWS LMI container: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.36.0-lmi19.0.0-cu128

🐛 Describe the bug

When requesting top_logprobs = 5 using the OpenAI chat completion schema (loprobs: True, max_tokens:1, max_completion_tokens:1), I get duplicate tokens with different probabilities.

Example: output["choices"][0]["logprobs"]["content"][0]["top_logprobs"] -> [{'token': 'none', 'logprob': -0.0012816318776458502, 'bytes': [110, 111, 110, 101]}, {'token': 'helper', 'logprob': -7.25128173828125, 'bytes': [104, 101, 108, 112, 101, 114]}, {'token': 'explanation', 'logprob': -7.75128173828125, 'bytes': [101, 120, 112, 108, 97, 110, 97, 116, 105, 111, 110]}, {'token': 'None', 'logprob': -9.87628173828125, 'bytes': [78, 111, 110, 101]}, {'token': 'none', 'logprob': -10.43878173828125, 'bytes': [110, 111, 110, 101]}]

Notice the first and last tokens in the serie are identical in name and sequence of bytes, but have different log probabilities. Why does this happen?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the issue of duplicate tokens with different probabilities, we need to modify the code to handle token normalization and probability calculation.

Step-by-Step Solution

  1. Token Normalization: Normalize the tokens to ensure that identical tokens are not treated as separate entities.
  2. Probability Calculation: Recalculate the log probabilities for the normalized tokens.

Example Code

import json

def normalize_tokens(top_logprobs):
    # Create a dictionary to store unique tokens and their log probabilities
    unique_tokens = {}
    for token_info in top_logprobs:
        token = token_info['token'].lower()  # Normalize token to lowercase
        if token in unique_tokens:
            # If token already exists, update its log probability
            unique_tokens[token] = max(unique_tokens[token], token_info['logprob'])
        else:
            unique_tokens[token] = token_info['logprob']
    
    # Convert the dictionary back to a list of token information
    normalized_top_logprobs = []
    for token, logprob in unique_tokens.items():
        # Find the original token information with the maximum log probability
        max_logprob_token_info = max([t for t in top_logprobs if t['token'].lower() == token], key=lambda x: x['logprob'])
        normalized_top_logprobs.append({
            'token': max_logprob_token_info['token'],
            'logprob': logprob,
            'bytes': max_logprob_token_info['bytes']
        })
    
    return normalized_top_logprobs

# Example usage:
output = {
    "choices": [{
        "logprobs": {
            "content": [{
                "top_logprobs": [
                    {'token': 'none', 'logprob': -0.0012816318776458502, 'bytes': [110, 111, 110, 101]},
                    {'token': 'helper', 'logprob': -7.25128173828125, 'bytes': [104, 101, 108, 112, 101, 114]},
                    {'token': 'explanation', 'logprob': -7.75128173828125, 'bytes': [101, 120, 112, 108, 97, 110, 97, 116, 105, 111, 110]},
                    {'token': 'None', 'logprob': -9.87628173828125, 'bytes': [78, 111, 110, 101]},
                    {'token': 'none', 'logprob': -10.43878173828125, 'bytes': [110, 111, 110, 101]}
                ]
            }]
        }
    }]
}

normalized_top_logprobs = normalize_tokens(output["choices"][0]["logprobs"]["content"][0]["top_logprobs"])
print(json.dumps(normalized

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING