vllm - ✅(Solved) Fix [Bug]: GPTBigCode scale_attn_weights config flag is ignored in vLLM [1 pull requests, 1 comments, 2 participants]

vllm2026-03-10 01:01:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36565•Fetched 2026-04-08 00:36:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Qi-Zhan

Participants

haosdent

Qi-Zhan

Timeline (top)

commented ×1cross-referenced ×1labeled ×1mentioned ×1

Fix Action

Fixed

Fixed by PR: [Bugfix] Respect scale_attn_weights config flag in GPTBigCode (https://github.com/vllm-project/vllm/pull/36637)

PR fix notes

PR #36637: [Bugfix] Respect scale_attn_weights config flag in GPTBigCode

Repository: vllm-project/vllm
Author: haosdent
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/36637

Description (problem / solution / changelog)

Purpose

Fix #36565.

The GPTBigCode implementation in vLLM hardcodes the attention scale as 1/sqrt(head_dim), ignoring the scale_attn_weights configuration flag. In HuggingFace Transformers, when scale_attn_weights=False, the scale is set to 1.0 (no scaling). This causes models relying on scale_attn_weights=False to produce different logits compared to Transformers.

This PR:

Makes GPTBigCodeAttention respect config.scale_attn_weights to match HuggingFace behavior.
Adds assertions for scale_attn_by_inverse_layer_idx and reorder_and_upcast_attn (unsupported in vLLM), consistent with what the GPT-2 model already does.

Test Plan

Test Result

Changed files

vllm/model_executor/models/gpt_bigcode.py (modified, +5/-1)

Code Example

self.scale_attn_weights = config.scale_attn_weights
        self.scaling = self.head_dim**-0.5 if config.scale_attn_weights else 1.0

RAW_BUFFERClick to expand / collapse

Your current environment

None

🐛 Describe the bug

Hi, following up on the discussion in #35402.

I noticed that the current implementation of GPTBigCode in vLLM appears to ignore the scale_attn_weights configuration option.

In vllm/model_executor/models/gpt_bigcode.py, the attention scaling is always applied as 1 / sqrt(head_dim):

https://github.com/vllm-project/vllm/blob/e5ff140216272c529261b02b6fd13fc480713735/vllm/model_executor/models/gpt_bigcode.py#L75

However, in the HuggingFace Transformers implementation, this scaling depends on the scale_attn_weights flag in the config:

        self.scale_attn_weights = config.scale_attn_weights
        self.scaling = self.head_dim**-0.5 if config.scale_attn_weights else 1.0

As a result, models that rely on scale_attn_weights=False may produce different logits compared with Transformers.

It might be worth checking whether the GPTBigCode implementation in vLLM should respect the scale_attn_weights config to maintain behavior parity with Transformers.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue, we need to modify the GPTBigCode implementation in vllm/model_executor/models/gpt_bigcode.py to respect the scale_attn_weights configuration option.

Steps:

Update the __init__ method to accept the scale_attn_weights configuration option
Modify the attention scaling calculation to depend on the scale_attn_weights flag

Example Code:

class GPTBigCode:
    def __init__(self, config):
        self.scale_attn_weights = config.scale_attn_weights
        # ...

    def attention(self, query, key, value):
        # ...
        scaling = self.head_dim**-0.5 if self.scale_attn_weights else 1.0
        attention_scores = torch.matmul(query, key.transpose(-1, -2)) * scaling
        # ...

Verification

To verify the fix, compare the logits produced by the updated GPTBigCode implementation with those produced by the HuggingFace Transformers implementation for models that rely on scale_attn_weights=False.

Extra Tips

Make sure to update the documentation to reflect the changed behavior.
Consider adding tests to ensure that the scale_attn_weights configuration option is correctly applied.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #response parsing #generation error #database connection #vector store #embedding generation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: GPTBigCode scale_attn_weights config flag is ignored in vLLM [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #36637: [Bugfix] Respect scale_attn_weights config flag in GPTBigCode

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Steps:

Example Code:

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: GPTBigCode scale_attn_weights config flag is ignored in vLLM [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #36637: [Bugfix] Respect scale_attn_weights config flag in GPTBigCode

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Steps:

Example Code:

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING