vllm - ✅(Solved) Fix RFC: Add logit_scale to PoolerConfig for Affine Score Calibration (Platt Scaling) [1 pull requests, 1 participants]

jefp · 2026-04-09T16:55:11Z

[vllm] PR 39435: feat: add logit scale to PoolerConfig for affine score calibration - Repository: vllm-project/vllm - Author: jefp - State: open | merged: Fals… # PR #39435: feat: add logit_scale to PoolerConfig for affine score calibration - Repository: vllm-project/vllm - Author: jefp - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/39435 ## Description (problem / solution / changelog) ## Purpose Add `logit_scale` to `PoolerConfig` alongside existing `logit_bias`, enabling affine score calibration ([Platt scaling](https://en.wikipedia.org/wiki/Platt_scaling)) in the pooler: `activation(scale * (logit - bias))`. This allows reranker and classification models to produce calibrated probability scores via `--pooler-config` without custom model code or client-side postprocessing. RFC: #39434 ## Changes - `vllm/config/pooler.py` — add `logit_scale: float | None = None` field - `vllm/model_executor/layers/pooler/seqwise/heads.py` — apply scale in `ClassifierPoolerHead.forward()` - `vllm/model_executor/layers/pooler/seqwise/poolers.py` — pass scale from config - `vllm/model_executor/layers/pooler/tokwise/heads.py` — apply scale in `TokenClassifierPoolerHead.forward_chunk()` - `vllm/model_executor/layers/pooler/tokwise/poolers.py` — pass scale from config **5 files changed, 17 lines added. Zero behavior change for existing models (default is `None`).** ## How it works ```python # In ClassifierPoolerHead.forward() / TokenClassifierPoolerHead.forward_chunk(): if self.logit_bias is not None: logits -= self.logit_bias # existing if self.logit_scale is not None: logits *= self.logit_scale # new if self.activation is not None: logits = self.activation(logits) # existing ``` ## Usage ```bash # No calibration (current behavior, unchanged) --pooler-config '{"use_activation": true}' # Bias only (current behavior, unchanged) --pooler-config '{"use_activation": true, "logit_bias": 0.3}' # Full Platt scaling (new) --pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}' # Temperature scaling (new, special case) --pooler-config '{"use_activation": true, "logit_scale": 2.0}' ``` ## Use cases 1. **Cross-encoder rerankers** — Map raw logits to calibrated `[0,1]` probabilities without retraining 2. **Temperature scaling** — Calibrate overconfident classifiers (`logit_scale = 1/T`) 3. **Score normalization across model versions** — Normalize different model versions to a consistent score range 4. **Domain adaptation** — Calibrate a model for a new domain by fitting two parameters on a small validation set ## Test plan - [x] Ruff format + check passes - [x] mypy 3.10 passes - [x] Tested with cross-encoder reranker model (score calibration produces correct calibrated scores) ```bash # Test with any cross-encoder python -m vllm.entrypoints.openai.api_server \ --model cross-encoder/ms-marco-MiniLM-L-6-v2 \ --runner pooling \ --pooler-config '{"use_activation": true, "logit_bias": 0.0, "logit_scale": 1.0}' ``` **Not a duplicate:** No existing PR adds `logit_scale`. This completes the affine calibration pair started by the existing `logit_bias` field. AI-assisted work: AI assistance was used for implementation. All code reviewed and tested by human submitter. Cc: @noooop @DarkLight1337 ## Changed files - `vllm/config/pooler.py` (modified, +7/-0) - `vllm/model_executor/layers/pooler/seqwise/heads.py` (modified, +4/-0) - `vllm/model_executor/layers/pooler/seqwise/poolers.py` (modified, +1/-0) - `vllm/model_executor/layers/pooler/tokwise/heads.py` (modified, +4/-0) - `vllm/model_executor/layers/pooler/tokwise/poolers.py` (modified, +1/-0) ## Fixed - Fixed by PR: feat: add logit_scale to PoolerConfig for affine score calibration (https://github.com/vllm-project/vllm/pull/39435) ## Motivation vLLM's `PoolerConfig` already supports `logit_bias` for classification models, enabling a bias offset on raw logits before activation. However, there is no corresponding **scale** parameter, which means vLLM cannot express the standard affine calibration transform: ``` calibrated_score = activation(scale * (logit - bias)) ``` This transform is known as **Platt scaling** (Platt, 1999) — the most widely-used method for calibrating classifier outputs into well-calibrated probabilities. It fits two parameters (A and B) such that: ``` P(y=1|x) = sigmoid(A * f(x) + B) ``` where `f(x)` is the raw model output. This is equivalent to `sigmoid(A * (f(x) - (-B/A)))`, i.e. scale and bias. **Reference:** John Platt. "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers, 1999. Platt scaling is not limited to SVMs — it is routinely applied to neural network classifiers, cross-encoders, and reranking models to map raw logits into interpretable probability scores. Related techniques include: - **Temperature scaling** (Guo et al., 2017) — a special case where only scale is learned (`bias=0`) - **SentenceTransformers cross-encoder act

vllm2026-04-09 16:55:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39433•Fetched 2026-04-10 03:40:40

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jefp

Participants

jefp

Timeline (top)

closed ×1cross-referenced ×1referenced ×1

Fix Action

Fixed

Fixed by PR: feat: add logit_scale to PoolerConfig for affine score calibration (https://github.com/vllm-project/vllm/pull/39435)

PR fix notes

PR #39435: feat: add logit_scale to PoolerConfig for affine score calibration

Repository: vllm-project/vllm
Author: jefp
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39435

Description (problem / solution / changelog)

Purpose

Add logit_scale to PoolerConfig alongside existing logit_bias, enabling affine score calibration (Platt scaling) in the pooler: activation(scale * (logit - bias)).

This allows reranker and classification models to produce calibrated probability scores via --pooler-config without custom model code or client-side postprocessing.

RFC: #39434

Changes

vllm/config/pooler.py — add logit_scale: float | None = None field
vllm/model_executor/layers/pooler/seqwise/heads.py — apply scale in ClassifierPoolerHead.forward()
vllm/model_executor/layers/pooler/seqwise/poolers.py — pass scale from config
vllm/model_executor/layers/pooler/tokwise/heads.py — apply scale in TokenClassifierPoolerHead.forward_chunk()
vllm/model_executor/layers/pooler/tokwise/poolers.py — pass scale from config

5 files changed, 17 lines added. Zero behavior change for existing models (default is None).

How it works

# In ClassifierPoolerHead.forward() / TokenClassifierPoolerHead.forward_chunk():
if self.logit_bias is not None:
    logits -= self.logit_bias      # existing
if self.logit_scale is not None:
    logits *= self.logit_scale     # new
if self.activation is not None:
    logits = self.activation(logits)  # existing

Usage

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'

Use cases

Cross-encoder rerankers — Map raw logits to calibrated [0,1] probabilities without retraining
Temperature scaling — Calibrate overconfident classifiers (logit_scale = 1/T)
Score normalization across model versions — Normalize different model versions to a consistent score range
Domain adaptation — Calibrate a model for a new domain by fitting two parameters on a small validation set

Test plan

Ruff format + check passes
mypy 3.10 passes
Tested with cross-encoder reranker model (score calibration produces correct calibrated scores)

# Test with any cross-encoder
python -m vllm.entrypoints.openai.api_server \
    --model cross-encoder/ms-marco-MiniLM-L-6-v2 \
    --runner pooling \
    --pooler-config '{"use_activation": true, "logit_bias": 0.0, "logit_scale": 1.0}'

Not a duplicate: No existing PR adds logit_scale. This completes the affine calibration pair started by the existing logit_bias field.

AI-assisted work: AI assistance was used for implementation. All code reviewed and tested by human submitter.

Cc: @noooop @DarkLight1337

Changed files

vllm/config/pooler.py (modified, +7/-0)
vllm/model_executor/layers/pooler/seqwise/heads.py (modified, +4/-0)
vllm/model_executor/layers/pooler/seqwise/poolers.py (modified, +1/-0)
vllm/model_executor/layers/pooler/tokwise/heads.py (modified, +4/-0)
vllm/model_executor/layers/pooler/tokwise/poolers.py (modified, +1/-0)

Code Example

calibrated_score = activation(scale * (logit - bias))

---

P(y=1|x) = sigmoid(A * f(x) + B)

---

if self.logit_bias is not None:
    logits -= self.logit_bias          # ← bias exists
# no scale                              # ← scale is missing
if self.activation is not None:
    logits = self.activation(logits)    # ← sigmoid/softmax exists

---

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

---

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case with bias=0)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'

RAW_BUFFERClick to expand / collapse

Motivation

vLLM's PoolerConfig already supports logit_bias for classification models, enabling a bias offset on raw logits before activation. However, there is no corresponding scale parameter, which means vLLM cannot express the standard affine calibration transform:

calibrated_score = activation(scale * (logit - bias))

This transform is known as Platt scaling (Platt, 1999) — the most widely-used method for calibrating classifier outputs into well-calibrated probabilities. It fits two parameters (A and B) such that:

P(y=1|x) = sigmoid(A * f(x) + B)

where f(x) is the raw model output. This is equivalent to sigmoid(A * (f(x) - (-B/A))), i.e. scale and bias.

Reference: John Platt. "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers, 1999.

Platt scaling is not limited to SVMs — it is routinely applied to neural network classifiers, cross-encoders, and reranking models to map raw logits into interpretable probability scores. Related techniques include:

Temperature scaling (Guo et al., 2017) — a special case where only scale is learned (bias=0)
SentenceTransformers cross-encoder activation functions — vLLM already supports loading these via sbert_ce_default_activation_function, but they are limited to parameterless functions like torch.nn.modules.activation.Sigmoid

Current State

vLLM's pooler heads (ClassifierPoolerHead, TokenClassifierPoolerHead) apply:

if self.logit_bias is not None:
    logits -= self.logit_bias          # ← bias exists
# no scale                              # ← scale is missing
if self.activation is not None:
    logits = self.activation(logits)    # ← sigmoid/softmax exists

This supports sigmoid(logit - bias) but NOT sigmoid(scale * (logit - bias)).

Proposal

Add logit_scale: float | None = None to PoolerConfig, passed through to classifier pooler heads. When set, logits are scaled after bias subtraction and before activation:

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

Default: None (no scaling) — fully backward compatible.

Usage

# No calibration (current behavior, unchanged)
--pooler-config '{"use_activation": true}'

# Bias only (current behavior, unchanged)
--pooler-config '{"use_activation": true, "logit_bias": 0.3}'

# Full Platt scaling (new)
--pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'

# Temperature scaling (new, special case with bias=0)
--pooler-config '{"use_activation": true, "logit_scale": 2.0}'

Use Cases

1. Cross-encoder rerankers with calibrated scores

Reranking models often produce raw logits that are not calibrated to a [0, 1] probability range. Platt scaling is the standard post-hoc calibration method — two parameters learned on a validation set without retraining the model. With logit_scale, models can ship calibration parameters in their config and produce calibrated scores out of the box.

2. Temperature scaling for confidence calibration

Modern neural networks are often overconfident (Guo et al., 2017). Temperature scaling divides logits by a learned temperature T, equivalent to logit_scale = 1/T. This is the simplest and most effective calibration method for classification models.

3. Score normalization across model versions

When deploying multiple model versions (e.g. v3 and v4 of a reranker), their raw score distributions differ. Affine calibration allows normalizing scores to a consistent range without retraining, enabling seamless model swaps in production.

4. Domain adaptation without fine-tuning

A cross-encoder trained on one domain can be calibrated for a different domain by fitting logit_scale and logit_bias on a small validation set. This is cheaper than fine-tuning and preserves the model weights.

Scope

5 files changed, ~17 lines added
vllm/config/pooler.py — add logit_scale field
vllm/model_executor/layers/pooler/seqwise/heads.py — apply scale in ClassifierPoolerHead
vllm/model_executor/layers/pooler/seqwise/poolers.py — pass scale from config
vllm/model_executor/layers/pooler/tokwise/heads.py — apply scale in TokenClassifierPoolerHead
vllm/model_executor/layers/pooler/tokwise/poolers.py — pass scale from config
Zero behavior change for existing models (default is None)
No new dependencies

References

Platt, J. (1999). "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers.
Guo, C. et al. (2017). "On Calibration of Modern Neural Networks." ICML 2017.
vLLM PoolerConfig.logit_bias — existing precedent for per-model logit adjustment.

extent analysis

TL;DR

Add a logit_scale parameter to PoolerConfig to support Platt scaling for calibration of classifier outputs.

Guidance

To implement Platt scaling, add logit_scale: float | None = None to PoolerConfig and pass it to classifier pooler heads.
Update the classifier pooler heads to apply the scale after bias subtraction and before activation, using the formula logits *= self.logit_scale.
Use the --pooler-config flag to set logit_scale and logit_bias for calibration, for example: --pooler-config '{"use_activation": true, "logit_bias": 0.3, "logit_scale": 5.0}'.
Verify the fix by checking the output of the model with different logit_scale and logit_bias values.

Example

if self.logit_bias is not None:
    logits -= self.logit_bias
if self.logit_scale is not None:
    logits *= self.logit_scale
if self.activation is not None:
    logits = self.activation(logits)

Notes

The proposed solution is fully backward compatible, with a default logit_scale of None. The fix requires changes to 5 files, adding approximately 17 lines of code.

Recommendation

Apply the workaround by adding the logit_scale parameter to PoolerConfig and updating the classifier pooler heads to support Platt scaling. This will enable calibration of classifier outputs without requiring changes to the underlying model.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#parallel task #integration issue #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix RFC: Add logit_scale to PoolerConfig for Affine Score Calibration (Platt Scaling) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #39435: feat: add logit_scale to PoolerConfig for affine score calibration

Description (problem / solution / changelog)

Purpose

Changes

How it works

Usage

Use cases

Test plan

Changed files

Code Example

Motivation

Current State

Proposal

Usage

Use Cases

1. Cross-encoder rerankers with calibrated scores

2. Temperature scaling for confidence calibration

3. Score normalization across model versions

4. Domain adaptation without fine-tuning

Scope

References

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING