vllm - ✅(Solved) Fix [RFC]: Add logit_scale to PoolerConfig for Affine Score Calibration (Platt Scaling) [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39434Fetched 2026-04-10 03:40:38
View on GitHub
Comments
2
Participants
2
Timeline
11
Reactions
0
Author
Participants
Timeline (top)
mentioned ×3subscribed ×3commented ×2cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #39435: feat: add logit_scale to PoolerConfig for affine score calibration

Description (problem / solution / changelog)

Purpose

Add logit_scale to PoolerConfig alongside existing logit_bias, enabling affine score calibration (Platt scaling) in the pooler: activation(scale * (logit - bias)).

This allows reranker and classification models to produce calibrated probability scores via --pooler-config without custom model code or client-side postprocessing.

RFC: #39434

Changes

  • vllm/config/pooler.py — add logit_scale: float | None = None field
  • vllm/model_executor/layers/pooler/seqwise/heads.py — apply scale in ClassifierPoolerHead.forward()
  • vllm/model_executor/layers/pooler/seqwise/poolers.py — pass scale from config
  • vllm/model_executor/layers/pooler/tokwise/heads.py — apply scale in TokenClassifierPoolerHead.forward_chunk()
  • vllm/model_executor/layers/pooler/tokwise/poolers.py — pass scale from config

5 files changed, 17 lines added. Zero behavior change for existing models (default is None).

How it works

# In ClassifierPoolerHead.forward() / TokenClassifierPoolerHead.forward_chunk():
if self.logit_bias is not None:
    logits -= self.logit_bias      # existing
if self.logit_scale is not None:
    logits *= self.logit_scale     # new
if self.activation is not None:
    logits = self.activation(logits)  # existing

Usage

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'

Use cases

  1. Cross-encoder rerankers — Map raw logits to calibrated [0,1] probabilities without retraining
  2. Temperature scaling — Calibrate overconfident classifiers (logit_scale = 1/T)
  3. Score normalization across model versions — Normalize different model versions to a consistent score range
  4. Domain adaptation — Calibrate a model for a new domain by fitting two parameters on a small validation set

Test plan

  • Ruff format + check passes
  • mypy 3.10 passes
  • Tested with cross-encoder reranker model (score calibration produces correct calibrated scores)
# Test with any cross-encoder
python -m vllm.entrypoints.openai.api_server \
    --model cross-encoder/ms-marco-MiniLM-L-6-v2 \
    --runner pooling \
    --pooler-config '{"use_activation": true, "logit_bias": 0.0, "logit_scale": 1.0}'

Not a duplicate: No existing PR adds logit_scale. This completes the affine calibration pair started by the existing logit_bias field.

AI-assisted work: AI assistance was used for implementation. All code reviewed and tested by human submitter.

Cc: @noooop @DarkLight1337

Changed files

  • vllm/config/pooler.py (modified, +7/-0)
  • vllm/model_executor/layers/pooler/seqwise/heads.py (modified, +4/-0)
  • vllm/model_executor/layers/pooler/seqwise/poolers.py (modified, +1/-0)
  • vllm/model_executor/layers/pooler/tokwise/heads.py (modified, +4/-0)
  • vllm/model_executor/layers/pooler/tokwise/poolers.py (modified, +1/-0)

Code Example

calibrated_score = activation(scale * (logit - bias))

---

P(y=1|x) = sigmoid(A * f(x) + B)

---

if self.logit_bias is not None:
    logits -= self.logit_bias          # ← bias exists
# no scale                              # ← scale is missing
if self.activation is not None:
    logits = self.activation(logits)    # ← sigmoid/softmax exists

---

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

---

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case with bias=0)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'
RAW_BUFFERClick to expand / collapse

Motivation.

vLLM's PoolerConfig already supports logit_bias for classification models, enabling a bias offset on raw logits before activation. However, there is no corresponding scale parameter, which means vLLM cannot express the standard affine calibration transform:

calibrated_score = activation(scale * (logit - bias))

This transform is known as Platt scaling (Platt, 1999) — the most widely-used method for calibrating classifier outputs into well-calibrated probabilities. It fits two parameters (A and B) such that:

P(y=1|x) = sigmoid(A * f(x) + B)

where f(x) is the raw model output. This is equivalent to sigmoid(A * (f(x) - (-B/A))), i.e. scale and bias.

Reference: John Platt. "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers, 1999.

Platt scaling is not limited to SVMs — it is routinely applied to neural network classifiers, cross-encoders, and reranking models to map raw logits into interpretable probability scores. Related techniques include:

  • Temperature scaling (Guo et al., 2017) — a special case where only scale is learned (bias=0)
  • SentenceTransformers cross-encoder activation functions — vLLM already supports loading these via sbert_ce_default_activation_function, but they are limited to parameterless functions like torch.nn.modules.activation.Sigmoid

Proposed Change.

Current State

vLLM's pooler heads (ClassifierPoolerHead, TokenClassifierPoolerHead) apply:

if self.logit_bias is not None:
    logits -= self.logit_bias          # ← bias exists
# no scale                              # ← scale is missing
if self.activation is not None:
    logits = self.activation(logits)    # ← sigmoid/softmax exists

This supports sigmoid(logit - bias) but NOT sigmoid(scale * (logit - bias)).

Proposal

Add logit_scale: float | None = None to PoolerConfig, passed through to classifier pooler heads. When set, logits are scaled after bias subtraction and before activation:

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

Default: None (no scaling) — fully backward compatible.

Usage

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case with bias=0)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'

Use Cases

1. Cross-encoder rerankers with calibrated scores

Reranking models often produce raw logits that are not calibrated to a [0, 1] probability range. Platt scaling is the standard post-hoc calibration method — two parameters learned on a validation set without retraining the model. With logit_scale, models can ship calibration parameters in their config and produce calibrated scores out of the box.

2. Temperature scaling for confidence calibration

Modern neural networks are often overconfident (Guo et al., 2017). Temperature scaling divides logits by a learned temperature T, equivalent to logit_scale = 1/T. This is the simplest and most effective calibration method for classification models.

3. Score normalization across model versions

When deploying multiple model versions (e.g. v3 and v4 of a reranker), their raw score distributions differ. Affine calibration allows normalizing scores to a consistent range without retraining, enabling seamless model swaps in production.

4. Domain adaptation without fine-tuning

A cross-encoder trained on one domain can be calibrated for a different domain by fitting logit_scale and logit_bias on a small validation set. This is cheaper than fine-tuning and preserves the model weights.

Scope

  • 5 files changed, ~17 lines added
  • vllm/config/pooler.py — add logit_scale field
  • vllm/model_executor/layers/pooler/seqwise/heads.py — apply scale in ClassifierPoolerHead
  • vllm/model_executor/layers/pooler/seqwise/poolers.py — pass scale from config
  • vllm/model_executor/layers/pooler/tokwise/heads.py — apply scale in TokenClassifierPoolerHead
  • vllm/model_executor/layers/pooler/tokwise/poolers.py — pass scale from config
  • Zero behavior change for existing models (default is None)
  • No new dependencies

Feedback Period.

No response

CC List.

@noooop @DarkLight1337 @hustxiayang

Any Other Things.

References

  • Platt, J. (1999). "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers.
  • Guo, C. et al. (2017). "On Calibration of Modern Neural Networks." ICML 2017.
  • vLLM PoolerConfig.logit_bias — existing precedent for per-model logit adjustment.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Add a logit_scale parameter to PoolerConfig to support Platt scaling for calibrated classifier outputs.

Guidance

  • Modify PoolerConfig to include a logit_scale field, allowing users to specify a scaling factor for logits.
  • Update ClassifierPoolerHead and TokenClassifierPoolerHead to apply the scaling factor after bias subtraction and before activation.
  • Ensure backward compatibility by setting the default logit_scale value to None.
  • Test the new functionality with various use cases, such as cross-encoder rerankers, temperature scaling, and score normalization.

Example

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

Notes

The proposed change requires updating five files and adding approximately 17 lines of code. The new logit_scale parameter will enable users to apply Platt scaling to their models, improving the calibration of classifier outputs.

Recommendation

Apply the proposed workaround by adding the logit_scale parameter to PoolerConfig and updating the relevant pooler heads. This change will provide a flexible and effective way to calibrate classifier outputs without requiring significant modifications to the existing codebase.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING