vllm - ✅(Solved) Fix RFC: Add logit_scale to PoolerConfig for Affine Score Calibration (Platt Scaling) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39433Fetched 2026-04-10 03:40:40
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
closed ×1cross-referenced ×1referenced ×1

Fix Action

Fixed

PR fix notes

PR #39435: feat: add logit_scale to PoolerConfig for affine score calibration

Description (problem / solution / changelog)

Purpose

Add logit_scale to PoolerConfig alongside existing logit_bias, enabling affine score calibration (Platt scaling) in the pooler: activation(scale * (logit - bias)).

This allows reranker and classification models to produce calibrated probability scores via --pooler-config without custom model code or client-side postprocessing.

RFC: #39434

Changes

  • vllm/config/pooler.py — add logit_scale: float | None = None field
  • vllm/model_executor/layers/pooler/seqwise/heads.py — apply scale in ClassifierPoolerHead.forward()
  • vllm/model_executor/layers/pooler/seqwise/poolers.py — pass scale from config
  • vllm/model_executor/layers/pooler/tokwise/heads.py — apply scale in TokenClassifierPoolerHead.forward_chunk()
  • vllm/model_executor/layers/pooler/tokwise/poolers.py — pass scale from config

5 files changed, 17 lines added. Zero behavior change for existing models (default is None).

How it works

# In ClassifierPoolerHead.forward() / TokenClassifierPoolerHead.forward_chunk():
if self.logit_bias is not None:
    logits -= self.logit_bias      # existing
if self.logit_scale is not None:
    logits *= self.logit_scale     # new
if self.activation is not None:
    logits = self.activation(logits)  # existing

Usage

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'

Use cases

  1. Cross-encoder rerankers — Map raw logits to calibrated [0,1] probabilities without retraining
  2. Temperature scaling — Calibrate overconfident classifiers (logit_scale = 1/T)
  3. Score normalization across model versions — Normalize different model versions to a consistent score range
  4. Domain adaptation — Calibrate a model for a new domain by fitting two parameters on a small validation set

Test plan

  • Ruff format + check passes
  • mypy 3.10 passes
  • Tested with cross-encoder reranker model (score calibration produces correct calibrated scores)
# Test with any cross-encoder
python -m vllm.entrypoints.openai.api_server \
    --model cross-encoder/ms-marco-MiniLM-L-6-v2 \
    --runner pooling \
    --pooler-config '{"use_activation": true, "logit_bias": 0.0, "logit_scale": 1.0}'

Not a duplicate: No existing PR adds logit_scale. This completes the affine calibration pair started by the existing logit_bias field.

AI-assisted work: AI assistance was used for implementation. All code reviewed and tested by human submitter.

Cc: @noooop @DarkLight1337

Changed files

  • vllm/config/pooler.py (modified, +7/-0)
  • vllm/model_executor/layers/pooler/seqwise/heads.py (modified, +4/-0)
  • vllm/model_executor/layers/pooler/seqwise/poolers.py (modified, +1/-0)
  • vllm/model_executor/layers/pooler/tokwise/heads.py (modified, +4/-0)
  • vllm/model_executor/layers/pooler/tokwise/poolers.py (modified, +1/-0)

Code Example

calibrated_score = activation(scale * (logit - bias))

---

P(y=1|x) = sigmoid(A * f(x) + B)

---

if self.logit_bias is not None:
    logits -= self.logit_bias          # ← bias exists
# no scale                              # ← scale is missing
if self.activation is not None:
    logits = self.activation(logits)    # ← sigmoid/softmax exists

---

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

---

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case with bias=0)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'
RAW_BUFFERClick to expand / collapse

Motivation

vLLM's PoolerConfig already supports logit_bias for classification models, enabling a bias offset on raw logits before activation. However, there is no corresponding scale parameter, which means vLLM cannot express the standard affine calibration transform:

calibrated_score = activation(scale * (logit - bias))

This transform is known as Platt scaling (Platt, 1999) — the most widely-used method for calibrating classifier outputs into well-calibrated probabilities. It fits two parameters (A and B) such that:

P(y=1|x) = sigmoid(A * f(x) + B)

where f(x) is the raw model output. This is equivalent to sigmoid(A * (f(x) - (-B/A))), i.e. scale and bias.

Reference: John Platt. "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers, 1999.

Platt scaling is not limited to SVMs — it is routinely applied to neural network classifiers, cross-encoders, and reranking models to map raw logits into interpretable probability scores. Related techniques include:

  • Temperature scaling (Guo et al., 2017) — a special case where only scale is learned (bias=0)
  • SentenceTransformers cross-encoder activation functions — vLLM already supports loading these via sbert_ce_default_activation_function, but they are limited to parameterless functions like torch.nn.modules.activation.Sigmoid

Current State

vLLM's pooler heads (ClassifierPoolerHead, TokenClassifierPoolerHead) apply:

if self.logit_bias is not None:
    logits -= self.logit_bias          # ← bias exists
# no scale                              # ← scale is missing
if self.activation is not None:
    logits = self.activation(logits)    # ← sigmoid/softmax exists

This supports sigmoid(logit - bias) but NOT sigmoid(scale * (logit - bias)).

Proposal

Add logit_scale: float | None = None to PoolerConfig, passed through to classifier pooler heads. When set, logits are scaled after bias subtraction and before activation:

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

Default: None (no scaling) — fully backward compatible.

Usage

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case with bias=0)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'

Use Cases

1. Cross-encoder rerankers with calibrated scores

Reranking models often produce raw logits that are not calibrated to a [0, 1] probability range. Platt scaling is the standard post-hoc calibration method — two parameters learned on a validation set without retraining the model. With logit_scale, models can ship calibration parameters in their config and produce calibrated scores out of the box.

2. Temperature scaling for confidence calibration

Modern neural networks are often overconfident (Guo et al., 2017). Temperature scaling divides logits by a learned temperature T, equivalent to logit_scale = 1/T. This is the simplest and most effective calibration method for classification models.

3. Score normalization across model versions

When deploying multiple model versions (e.g. v3 and v4 of a reranker), their raw score distributions differ. Affine calibration allows normalizing scores to a consistent range without retraining, enabling seamless model swaps in production.

4. Domain adaptation without fine-tuning

A cross-encoder trained on one domain can be calibrated for a different domain by fitting logit_scale and logit_bias on a small validation set. This is cheaper than fine-tuning and preserves the model weights.

Scope

  • 5 files changed, ~17 lines added
  • vllm/config/pooler.py — add logit_scale field
  • vllm/model_executor/layers/pooler/seqwise/heads.py — apply scale in ClassifierPoolerHead
  • vllm/model_executor/layers/pooler/seqwise/poolers.py — pass scale from config
  • vllm/model_executor/layers/pooler/tokwise/heads.py — apply scale in TokenClassifierPoolerHead
  • vllm/model_executor/layers/pooler/tokwise/poolers.py — pass scale from config
  • Zero behavior change for existing models (default is None)
  • No new dependencies

References

  • Platt, J. (1999). "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers.
  • Guo, C. et al. (2017). "On Calibration of Modern Neural Networks." ICML 2017.
  • vLLM PoolerConfig.logit_bias — existing precedent for per-model logit adjustment.

extent analysis

TL;DR

Add a logit_scale parameter to PoolerConfig to support Platt scaling for calibration of classifier outputs.

Guidance

  • To implement Platt scaling, add logit_scale: float | None = None to PoolerConfig and pass it to classifier pooler heads.
  • Update the classifier pooler heads to apply the scale after bias subtraction and before activation, using the formula logits *= self.logit_scale.
  • Use the --pooler-config flag to set logit_scale and logit_bias for calibration, for example: --pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'.
  • Verify the fix by checking the output of the model with different logit_scale and logit_bias values.

Example

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

Notes

The proposed solution is fully backward compatible, with a default logit_scale of None. The fix requires changes to 5 files, adding approximately 17 lines of code.

Recommendation

Apply the workaround by adding the logit_scale parameter to PoolerConfig and updating the classifier pooler heads to support Platt scaling. This will enable calibration of classifier outputs without requiring changes to the underlying model.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING