vllm - ✅(Solved) Fix [Bug]: `_CONFIG_REGISTRY` types get wrong config class since v0.19 [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39532Fetched 2026-04-11 06:12:58
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
referenced ×2cross-referenced ×1labeled ×1

Error Message

The workaround is to manually patch missing attributes onto the wrong config class in the model's __init__, which is fragile and error-prone.

Root Cause

Note: This is not a transformers/AutoConfig bug. AutoConfig.from_pretrained() resolving the class from the on-disk model_type is working as designed. The bug is in vLLM's HFConfigParser.parse(), which switched from calling config_class.from_pretrained() (correct) to AutoConfig.from_pretrained() (broken when on-disk model_type differs from the registered type).

In vllm/transformers_utils/config.pyHFConfigParser.parse(), when model_type is in _CONFIG_REGISTRY (but not in _SPECULATIVE_DECODING_CONFIGS), the code registers the custom class with AutoConfig.register() then delegates to AutoConfig.from_pretrained(). But AutoConfig.from_pretrained() resolves the config class from the model_type field in the checkpoint file on disk, not from the registration. So when the file says "mixtral" but the registry has "mymodel", you get a MixtralConfig instead of MyModelConfig.

The hf_overrides dict is applied after config creation (config.update(hf_overrides_kw)), which updates config.model_type to "mymodel" but doesn't change the Python class — it's still MixtralConfig with all custom attributes missing.

Fix Action

Fix / Workaround

The workaround is to manually patch missing attributes onto the wrong config class in the model's __init__, which is fragile and error-prone.

PR fix notes

PR #39554: [Bugfix] Fix _CONFIG_REGISTRY types getting wrong config class when on-disk model_type differs

Description (problem / solution / changelog)

Purpose

Fix a regression introduced in v0.19 (commit f73bcb1, PR #38247) where
_CONFIG_REGISTRY types get the wrong config class when the checkpoint's
on-disk model_type differs from the plugin-registered type.

PR #38247 changed HFConfigParser.parse() to route _CONFIG_REGISTRY types
through AutoConfig.from_pretrained() instead of
config_class.from_pretrained(). The intent was to ensure
AutoConfig.register() is called so tokenizer loading picks up the custom
config class. However, AutoConfig.from_pretrained() reads model_type from the checkpoint's config.json on disk — not from the overridden value — so it returns the wrong class when they differ.

This affects any vLLM plugin that:

  1. Registers a custom config class in _CONFIG_REGISTRY
  2. Uses hf_overrides to set model_type to the registered type
  3. Has a checkpoint whose on-disk model_type differs (e.g., training teams
    use base architectures like DeepSeek V3 or Mixtral, and hf_overrides is the intended mechanism to apply custom config without modifying checkpoints)

The fix registers the custom config class under both the overridden model_type and the on-disk model_type, so AutoConfig.from_pretrained() returns the correct class regardless of what the checkpoint says. This keeps the AutoConfig.from_pretrained() path intact (preserving the intent of PR #38247) while fixing the mismatch.

Fixes #39532

Test Plan

  1. Unit test (tests/transformers_utils/test_hf_overrides_model_type.py): Registers a custom config class, creates a checkpoint with a different on-disk model_type, and verifies parse() returns the correct class.

  2. Compatibility with PR #38247's original intent: Tested public HF models that have entries in _CONFIG_REGISTRY to verify config class resolution,
    AutoConfig registration, and tokenizer loading still work:

    • THUDM/chatglm3-6b (chatglmChatGLMConfig) ✅
    • nvidia/Nemotron-Mini-4B-Instruct (nemotronNemotronConfig) ✅
  3. End-to-end inference with a custom plugin model (on-disk model_type: deepseek_v3, plugin-registered as a custom type via hf_overrides):

    • Verified HFConfigParser.parse() returns the plugin's custom config
      class (not DeepseekV3Config)
    • Verified engine config class is correct throughout the pipeline
    • Verified model loads and generates coherent output on 8x H200
  4. Pre-commit: All checks pass (ruff check, ruff format, typos, mypy,
    etc.)

Test Result

Unit test:
PASSED
tests/transformersutils/testhfoverridesmodeltype.py::testconfigregistrywithhfo verrides

Existing test suite (33 passed, 1 unrelated failure — gated Llama model HF
auth):

33 passed, 1 failed in 283.47s

Compatibility (PR #38247 models):

=== Testing THUDM/chatglm3-6b (model_type=chatglm) ===
config class: ChatGLMConfig [PASS]
AutoConfig class: ChatGLMConfig [PASS]
tokenizer: ChatGLMTokenizer [PASS]

=== Testing nvidia/Nemotron-Mini-4B-Instruct (model_type=nemotron) ===
config class: NemotronConfig [PASS]
AutoConfig class: NemotronConfig [PASS]
tokenizer: PreTrainedTokenizerFast [PASS]

End-to-end: [PASS] Config class from parser: CustomConfig (not DeepseekV3Config) [PASS] All model-specific attributes present natively
[PASS] Engine config class correct


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues
    this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and
    after, or e2e results
  • (Optional) The necessary documentation update, such as updating
    supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please
    update the release notes draft in the [Google
    Doc](https://docs.google.com/document/d/1YyVqrgX4gHTtrstbq8oWUImOyPCKSGnJ7xtTp mXzlRs/edit?tab=t.0).
</details>

Changed files

  • tests/transformers_utils/test_hf_overrides_model_type.py (added, +62/-0)
  • vllm/transformers_utils/config.py (modified, +8/-0)

Code Example

Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.3 (main, Mar  3 2026, 12:15:18) [GCC 13.3.0] (64-bit runtime)
Python platform              : Linux-6.1.163-186.299.amzn2023.x86_64-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 13.0.88
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4
GPU 2: NVIDIA L4
GPU 3: NVIDIA L4
GPU 4: NVIDIA L4
GPU 5: NVIDIA L4
GPU 6: NVIDIA L4
GPU 7: NVIDIA L4

Nvidia driver version        : 580.126.09
cuDNN version                : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.13.0
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  192
Vendor ID:                               AuthenticAMD
Model name:                              AMD EPYC 7R13 Processor

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.6
[pip3] numpy==2.2.6
[pip3] torch==2.10.0
[pip3] torchvision==0.25.0
[pip3] transformers==4.57.6
[pip3] triton==3.6.0

==============================
         vLLM Info
==============================
vLLM Version                 : 0.19.0
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled

---

import json, os, tempfile
from transformers import PretrainedConfig
from vllm.transformers_utils.config import _CONFIG_REGISTRY, get_config

class MyCustomConfig(PretrainedConfig):
    model_type = "mymodel"
    def __init__(self, custom_attr=42, **kw):
        super().__init__(**kw)
        self.custom_attr = custom_attr

# Register (same as what a vLLM plugin does)
_CONFIG_REGISTRY["mymodel"] = MyCustomConfig

with tempfile.TemporaryDirectory() as tmpdir:
    # Checkpoint has model_type="mixtral" on disk
    cfg = {"model_type": "mixtral", "hidden_size": 128, "num_hidden_layers": 2,
           "num_attention_heads": 4, "num_key_value_heads": 4,
           "intermediate_size": 256, "num_local_experts": 4, "num_experts_per_tok": 2}
    with open(os.path.join(tmpdir, "config.json"), "w") as f:
        json.dump(cfg, f)

    config = get_config(tmpdir, trust_remote_code=False,
                        hf_overrides_kw={"model_type": "mymodel"})

    print(f"class:      {type(config).__name__}")          # MixtralConfigWRONG
    print(f"model_type: {config.model_type}")               # mymodel — correct
    print(f"has custom_attr: {hasattr(config, 'custom_attr')}")  # FalseMISSING

---

class:      MixtralConfig  <-- WRONG! Should be MyCustomConfig
model_type: mymodel        <-- correct (set by hf_overrides after creation)
has custom_attr: False     <-- MISSING!

---

# v0.18 — correct behavior
if model_type in _CONFIG_REGISTRY:
    config_class = _CONFIG_REGISTRY[model_type]
    config = config_class.from_pretrained(model, ...)   # ← returns the right class
else:
    config = AutoConfig.from_pretrained(model, ...)

---

# v0.19.0 — broken behavior
if model_type in _SPECULATIVE_DECODING_CONFIGS:
    config_class = _CONFIG_REGISTRY[model_type]
    config = config_class.from_pretrained(model, ...)   # ← still correct for these two
else:
    if model_type in _CONFIG_REGISTRY:
        config_class = _CONFIG_REGISTRY[model_type]
        AutoConfig.register(model_type, config_class, exist_ok=True)  # registers...
    config = AutoConfig.from_pretrained(model, ...)  # ← but this reads model_type from disk!
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.3 (main, Mar  3 2026, 12:15:18) [GCC 13.3.0] (64-bit runtime)
Python platform              : Linux-6.1.163-186.299.amzn2023.x86_64-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 13.0.88
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4
GPU 2: NVIDIA L4
GPU 3: NVIDIA L4
GPU 4: NVIDIA L4
GPU 5: NVIDIA L4
GPU 6: NVIDIA L4
GPU 7: NVIDIA L4

Nvidia driver version        : 580.126.09
cuDNN version                : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.13.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.13.0
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  192
Vendor ID:                               AuthenticAMD
Model name:                              AMD EPYC 7R13 Processor

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.6
[pip3] numpy==2.2.6
[pip3] torch==2.10.0
[pip3] torchvision==0.25.0
[pip3] transformers==4.57.6
[pip3] triton==3.6.0

==============================
         vLLM Info
==============================
vLLM Version                 : 0.19.0
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
</details>

🐛 Describe the bug

When a vLLM plugin registers a custom config class via AutoConfig.register() and the user passes hf_overrides={"model_type": "<custom>"}, the resulting config object is the wrong class. The model_type from hf_overrides is used for vLLM's internal _CONFIG_REGISTRY lookup, but AutoConfig.from_pretrained() independently reads model_type from the checkpoint's config.json file on disk and uses that to determine the Python class.

This means:

  • If the checkpoint says model_type: "mixtral" but the plugin registers model_type: "mymodel"AutoConfig.from_pretrained() returns MixtralConfig, not MyModelConfig
  • hf_overrides are applied after config creation (config.update(hf_overrides_kw)), so config.model_type gets updated to "mymodel" but the Python class remains MixtralConfig
  • All custom attributes defined in MyModelConfig.__init__ are missing from the config object

Root cause

Note: This is not a transformers/AutoConfig bug. AutoConfig.from_pretrained() resolving the class from the on-disk model_type is working as designed. The bug is in vLLM's HFConfigParser.parse(), which switched from calling config_class.from_pretrained() (correct) to AutoConfig.from_pretrained() (broken when on-disk model_type differs from the registered type).

In vllm/transformers_utils/config.pyHFConfigParser.parse(), when model_type is in _CONFIG_REGISTRY (but not in _SPECULATIVE_DECODING_CONFIGS), the code registers the custom class with AutoConfig.register() then delegates to AutoConfig.from_pretrained(). But AutoConfig.from_pretrained() resolves the config class from the model_type field in the checkpoint file on disk, not from the registration. So when the file says "mixtral" but the registry has "mymodel", you get a MixtralConfig instead of MyModelConfig.

The hf_overrides dict is applied after config creation (config.update(hf_overrides_kw)), which updates config.model_type to "mymodel" but doesn't change the Python class — it's still MixtralConfig with all custom attributes missing.

Minimal reproduction

This reproduces the bug through vLLM's get_config() — the actual broken code path:

import json, os, tempfile
from transformers import PretrainedConfig
from vllm.transformers_utils.config import _CONFIG_REGISTRY, get_config

class MyCustomConfig(PretrainedConfig):
    model_type = "mymodel"
    def __init__(self, custom_attr=42, **kw):
        super().__init__(**kw)
        self.custom_attr = custom_attr

# Register (same as what a vLLM plugin does)
_CONFIG_REGISTRY["mymodel"] = MyCustomConfig

with tempfile.TemporaryDirectory() as tmpdir:
    # Checkpoint has model_type="mixtral" on disk
    cfg = {"model_type": "mixtral", "hidden_size": 128, "num_hidden_layers": 2,
           "num_attention_heads": 4, "num_key_value_heads": 4,
           "intermediate_size": 256, "num_local_experts": 4, "num_experts_per_tok": 2}
    with open(os.path.join(tmpdir, "config.json"), "w") as f:
        json.dump(cfg, f)

    config = get_config(tmpdir, trust_remote_code=False,
                        hf_overrides_kw={"model_type": "mymodel"})

    print(f"class:      {type(config).__name__}")          # MixtralConfig — WRONG
    print(f"model_type: {config.model_type}")               # mymodel — correct
    print(f"has custom_attr: {hasattr(config, 'custom_attr')}")  # False — MISSING

Output

class:      MixtralConfig  <-- WRONG! Should be MyCustomConfig
model_type: mymodel        <-- correct (set by hf_overrides after creation)
has custom_attr: False     <-- MISSING!

Expected behavior

get_config() should return a MyCustomConfig instance with custom_attr=42. This worked correctly in vLLM ≤ 0.18.

This is a regression from v0.18

In vLLM ≤ 0.18, HFConfigParser.parse() used config_class.from_pretrained() directly for all types in _CONFIG_REGISTRY:

# v0.18 — correct behavior
if model_type in _CONFIG_REGISTRY:
    config_class = _CONFIG_REGISTRY[model_type]
    config = config_class.from_pretrained(model, ...)   # ← returns the right class
else:
    config = AutoConfig.from_pretrained(model, ...)

Commit f73bcb1c5 ("Various Transformers v5 config fixes", PR #38247) changed this. It introduced _SPECULATIVE_DECODING_CONFIGS and only kept the direct config_class.from_pretrained() path for speculative decoding types (eagle, speculators). All other _CONFIG_REGISTRY types were moved to a new path that calls AutoConfig.register() followed by AutoConfig.from_pretrained():

# v0.19.0 — broken behavior
if model_type in _SPECULATIVE_DECODING_CONFIGS:
    config_class = _CONFIG_REGISTRY[model_type]
    config = config_class.from_pretrained(model, ...)   # ← still correct for these two
else:
    if model_type in _CONFIG_REGISTRY:
        config_class = _CONFIG_REGISTRY[model_type]
        AutoConfig.register(model_type, config_class, exist_ok=True)  # registers...
    config = AutoConfig.from_pretrained(model, ...)  # ← but this reads model_type from disk!

The assumption was that AutoConfig.register() + AutoConfig.from_pretrained() would use the registered class. But from_pretrained() reads model_type from the checkpoint file on disk and resolves the class from that, ignoring the registration when the file's model_type differs.

The intent behind the change (from the PR description) was to ensure the custom config class is used consistently elsewhere in vLLM — specifically for tokenizer loading, which also calls AutoConfig.from_pretrained() and wasn't picking up classes from _CONFIG_REGISTRY. The AutoConfig.register() call achieves that goal, but routing the primary config loading through AutoConfig.from_pretrained() instead of config_class.from_pretrained() broke the case where the checkpoint's on-disk model_type differs from the registered type.

Impact

This affects any vLLM plugin that:

  1. Defines a custom model architecture with its own config class
  2. Has checkpoints that use a different model_type on disk (e.g., derived from an existing architecture like Mixtral)
  3. Relies on hf_overrides to switch to the custom config at load time

In practice, it's common for the on-disk model_type to differ from the plugin's registered type. Checkpoints are often produced by upstream training teams using a base architecture (e.g., Mixtral), and the inference plugin adds custom config attributes (MLA parameters, MoE routing config, etc.) on top. Coordinating a model_type change across all checkpoint producers is impractical — hf_overrides exists precisely to handle this mismatch at load time without modifying checkpoints.

The workaround is to manually patch missing attributes onto the wrong config class in the model's __init__, which is fragile and error-prone.

Suggested fix

Restore the v0.18 behavior for all _CONFIG_REGISTRY types: use config_class.from_pretrained() directly instead of AutoConfig.from_pretrained(). Keep the AutoConfig.register() call so that other parts of vLLM (tokenizers, etc.) still benefit from the registration. This is a one-line condition change — promote the _CONFIG_REGISTRY check to an elif at the same level as the speculative decoding path.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix is to restore the v0.18 behavior for all _CONFIG_REGISTRY types by using config_class.from_pretrained() directly instead of AutoConfig.from_pretrained().

Guidance

  • Identify the problematic code change introduced in commit f73bcb1c5 and revert it to use config_class.from_pretrained() for all _CONFIG_REGISTRY types.
  • Keep the AutoConfig.register() call to ensure other parts of vLLM still benefit from the registration.
  • Verify the fix by checking if get_config() returns the correct MyCustomConfig instance with custom_attr=42.
  • Test the fix with different model_type values in the checkpoint file and hf_overrides to ensure it works as expected.

Example

# v0.18 — correct behavior
if model_type in _CONFIG_REGISTRY:
    config_class = _CONFIG_REGISTRY[model_type]
    config = config_class.from_pretrained(model, ...)   # ← returns the right class
else:
    config = AutoConfig.from_pretrained(model, ...)

Notes

  • This fix assumes that the config_class.from_pretrained() method is correctly implemented for all custom config classes in _CONFIG_REGISTRY.
  • The fix may not work if there are other issues with the config_class.from_pretrained() method or the AutoConfig.register() call.

Recommendation

Apply the suggested fix by restoring the v0.18 behavior for all _CONFIG_REGISTRY types, as it is a straightforward and targeted solution to the identified problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

get_config() should return a MyCustomConfig instance with custom_attr=42. This worked correctly in vLLM ≤ 0.18.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: `_CONFIG_REGISTRY` types get wrong config class since v0.19 [1 pull requests, 1 participants]