vllm - ✅(Solved) Fix Upgrade to Transformers v5 [7 pull requests, 1 participants]

vllm2026-03-27 18:19:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38379•Fetched 2026-04-08 01:41:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

hmellor

Participants

hmellor

Assignees

hmellor

Timeline (top)

cross-referenced ×20sub_issue_added ×12subscribed ×8assigned ×1

PR fix notes

PR #38461: Fixed issues

Repository: vllm-project/vllm
Author: rpathade
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/38461

Description (problem / solution / changelog)

Purpose

Closes #38389

Fixes two test failures for IsaacForConditionalGeneration that are sub-issues of #38379.

Fix 1 — AttributeError: 'str' object has no attribute '__args__'
(tests/models/test_initialization.py::test_can_initialize_large_subset[IsaacForCo nditionalGeneration])

from __future__ import annotations in vllm/transformers_utils/configs/isaac.py defers all annotations to strings at runtime, breaking any introspection of __args__ on int | None / str | None type annotations. Removing it restores
runtime type objects — valid without the future import on Python 3.10+, which vLLM already requires.

Fix 2 — ImportError: cannot import name 'SlidingWindowCache' from 'transformers.cache_utils' (tests/models/multimodal/generation/test_common.py Isaac HF-runner tests)

The Isaac HF reference model (invoked via patch_hf_runner) imports SlidingWindowCache from transformers.cache_utils, which was removed in transformers v5. A guard is added to detect availability at import time and skip the Isaac HF-runner tests when the symbol is absent. The vLLM inference path for Isaac is unaffected.

This PR does not duplicate any existing open PR (checked via gh pr list search for isaac and #38379). AI assistance (Claude) was used; all changed lines reviewed by the human submitter.

Test Plan

# Install vLLM and transformers from source (v5)
git clone https://github.com/huggingface/transformers.git
cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .                                        
uv pip install -e ../transformers
                                                                                  
# Fix 1 — initialization test                                                   
pytest tests/models/test_initialization.py::test_can_initialize_large_subset[Isaac
ForConditionalGeneration] -v                                                      
 
# Fix 2 — multimodal generation tests (expect SKIP on transformers v5)            
pytest tests/models/multimodal/generation/test_common.py -k "isaac" -v          
                                                                                  
Test Result
                                                                                  
Test: test_can_initialize_large_subset[IsaacForConditionalGeneration]           
Before: AttributeError: 'str' object has no attribute '__args__'
After: PASSED            
────────────────────────────────────────
Test: test_single_image_models[isaac-*]                                           
Before: ImportError: cannot import name 'SlidingWindowCache'                      
After: SKIPPED (transformers v5) / PASSED (older transformers)                    
────────────────────────────────────────                                          
Test: test_multi_image_models[isaac-*]                                            
Before: ImportError: cannot import name 'SlidingWindowCache'
After: SKIPPED (transformers v5) / PASSED (older transformers)

## Changed files

- `tests/models/multimodal/generation/test_common.py` (modified, +15/-0)
- `vllm/transformers_utils/configs/isaac.py` (modified, +0/-1)


---

# PR #38747: [Transformers v5] Fix Ernie4_5_VLMoeForConditionalGeneration rope_theta config

- Repository: vllm-project/vllm
- Author: mateenali66
- State: closed | merged: False
- Link: https://github.com/vllm-project/vllm/pull/38747

## Description (problem / solution / changelog)

## Summary

Fixes #38735 (part of #38379)

Transformers v5 runs `validate_rope` during `PretrainedConfig.__init__()` via `__class_validators__`. Some upstream configs (e.g. Ernie-4.5) assign `rope_theta` after `super().__init__()`, so the validator fires before the value exists and raises a `KeyError`.

Two fixes in `vllm/transformers_utils/config.py`:

1. **Remove premature auto-validation**: `_disable_rope_auto_validation()` strips the `validate_rope` class validator at module load for Transformers >= 5. vLLM already calls `validate_rope()` explicitly in `patch_rope_parameters()`, so no validation coverage is lost.

2. **Propagate rope_theta after standardize**: `standardize_rope_params()` uses `setdefault` which won't overwrite a `None` written by the `rope_scaling` property setter during `__init__`. After standardize, force-set `rope_parameters["rope_theta"]` from `config.rope_theta` when needed.

## Test Plan

```bash
pytest tests/models/test_initialization.py::test_can_initialize_large_subset[Ernie4_5_VLMoeForConditionalGeneration] -v

Changed files

vllm/transformers_utils/config.py (modified, +48/-0)

PR #38748: [Transformers v5] Fix NemotronParse image_size tuple unpack

Repository: vllm-project/vllm
Author: mateenali66
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/38748

Description (problem / solution / changelog)

Fixes #38740 (part of #38379)

Transformers v5 changed image_size to scalar int for this model config. vllm was unpacking it as a (height, width) tuple in 3 places which caused ValueError on init.

Fixed by normalizing image_size to a 2-tuple wherever its used, same approach as interns1_vit.py already does.

Test:

pytest tests/models/multimodal/generation/test_nemotron_parse.py::test_models[5-bfloat16-nvidia/NVIDIA-Nemotron-Parse-v1.1]

Changed files

vllm/model_executor/models/nemotron_parse.py (modified, +20/-3)

PR #39173: Fix RoPE init for Ernie4.5 on Transformers v5 — suppress premature va…

Repository: vllm-project/vllm
Author: jacob-lou
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/39173

Description (problem / solution / changelog)

…lidate_rope

Remove automatic Transformers v5 validate_rope auto-validator during config construction.
Propagate rope_theta into config.rope_parameters after standardize_rope_params().

Title: compat: Fix Ernie4.5 RoPE initialization under Transformers v5 Fixes https://github.com/vllm-project/vllm/issues/38735

Summary

Remove automatic Transformers v5 validate_rope auto-validator during config construction.
Propagate rope_theta into config.rope_parameters after standardize_rope_params() so configs that set rope_theta after super().__init__() pass validation.
Fixes initialization failure for Ernie4_5_VLMoeForConditionalGeneration under Transformers v5 (see issue #38735, parent task #38379).

Purpose

Transformers v5 runs RoPE validation during PretrainedConfig.__init__() via __class_validators__. Some upstream model configs (e.g. Ernie-4.5) assign rope_theta only after super().__init__(), which caused a KeyError: Missing required keys in \rope_parameters` ... {'rope_theta'}during model initialization. vLLM already performs RoPE patching and validation inpatch_rope_parameters(); this PR prevents the premature auto-validation and guarantees rope_theta` is present when validation runs.

Changes

config.py
- Add _disable_rope_auto_validation() to remove the v5 RoPE auto-validator from PretrainedConfig.__class_validators__ (applies only for Transformers >= 5).
- In patch_rope_parameters(), after config.standardize_rope_params() propagate config.rope_theta into config.rope_parameters['rope_theta'] when needed, before config.validate_rope().

Test Plan

Run the focused checks locally, then run CI full tests (recommended on Linux):

Quick version check

.venv/bin/python -c "import transformers; print(transformers.__version__)"
# Expect: transformers >= 5.0.0 (tested with 5.5.0)

Quick config load (verifies rope_theta propagation)

.venv/bin/python -c "from vllm.transformers_utils.config import get_config; c=get_config('baidu/ERNIE-4.5-VL-28B-A3B-PT', trust_remote_code=True); tc=c.get_text_config(); print(getattr(tc,'rope_theta',None), tc.rope_parameters.get('rope_theta'))"
# Expect output: 500000 500000  (or other configured value present in both places)

Targeted unit test (macOS: avoid fork issues)

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
.venv/bin/python -m pytest 'tests/models/test_initialization.py::test_can_initialize_large_subset[Ernie4_5_VLMoeForConditionalGeneration]' -q

Full test suite (run on CI / Linux recommended)

.venv/bin/python -m pytest tests -q

Test Results (local)

get_config(...) now loads successfully and prints:

rope_theta attr: 500000
rope_parameters[rope_theta]: 500000

AutoConfig.from_pretrained(...) (tokenizer path) also succeeds.
pre-commit hooks passed after ruff formatting.

Note: On macOS running the full test suite locally sometimes produces unrelated fork / Objective‑C initialization issues (SIGSEGV). These are platform-specific; please run full regression on Linux CI.

Risk & Rollout

Risk: We mutate Transformers’ __class_validators__ to remove only the validate_rope validator. This intentionally defers automatic validation to vLLM's explicit validation in patch_rope_parameters(). Validation coverage is preserved and the change is narrowly scoped (only affects RoPE auto-validator for Transformers >= 5).
Rollout: Safe to merge. Recommend running full tests on CI (Ubuntu) before merging to main.

Documentation & Release Notes

No public API change; no docs required for users. If you maintain a release-notes doc, add a short entry: Fix: Ensure Ernie4.5 initializes under Transformers v5 by deferring RoPE auto-validation and propagating rope_theta (issue #38735).

Checklist (please confirm)

The purpose of the PR is documented (links to issues: #38735, parent #38379).
Test plan provided (commands for targeted and full tests).
Test results recorded for the critical verification steps.
(Optional) Documentation updates — none required for user-facing docs in this change.
pre-commit / linters passed locally.

Changed files

PR_BODY.md (added, +49/-0)
csrc/cpu/cpu_attn_vec.hpp (modified, +1/-1)
csrc/cpu/cpu_attn_vec16.hpp (modified, +1/-1)
csrc/cpu/micro_gemm/cpu_micro_gemm_vec.hpp (modified, +1/-1)
docs/design/moe_kernel_features.md (modified, +2/-2)
tests/evals/gpt_oss/configs/gpt-oss-20b-rocm-baseline.yaml (modified, +1/-1)
tests/evals/gpt_oss/configs/gpt-oss-20b-rocm-quark-mxfp4-bf16-aiter.yaml (modified, +2/-2)
tests/evals/gpt_oss/configs/gpt-oss-20b-rocm-quark-mxfp4-bf16-triton.yaml (modified, +1/-1)
tests/evals/gpt_oss/configs/gpt-oss-20b-rocm-quark-mxfp4-fp8-triton.yaml (modified, +1/-1)
tests/kernels/core/test_layernorm.py (modified, +29/-1)
tests/kernels/moe/test_gpt_oss_triton_kernels.py (modified, +12/-57)
tests/kernels/quantization/test_mxfp4_triton_ep.py (modified, +18/-41)
tests/models/quantization/test_nvfp4.py (modified, +15/-4)
tests/quantization/test_compressed_tensors.py (modified, +1/-4)
tests/v1/ec_connector/integration/run_epd_correctness_test.sh (modified, +3/-2)
tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh (modified, +5/-2)
tests/v1/kv_connector/nixl_integration/run_edge_case_test.sh (modified, +3/-2)
tests/v1/kv_connector/nixl_integration/run_tpu_disagg_accuracy_test.sh (modified, +3/-2)
tests/v1/kv_connector/nixl_integration/run_tpu_edge_case_test.sh (modified, +2/-1)
tests/v1/kv_connector/nixl_integration/run_xpu_disagg_accuracy_test.sh (modified, +2/-1)
tests/v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh (modified, +3/-1)
vllm/compilation/passes/fusion/allreduce_rms_fusion.py (modified, +21/-3)
vllm/compilation/passes/fusion/rms_quant_fusion.py (modified, +29/-1)
vllm/config/attention.py (modified, +1/-1)
vllm/env_override.py (modified, +1/-1)
vllm/envs.py (modified, +5/-0)
vllm/ir/ops/layernorm.py (modified, +2/-3)
vllm/kernels/aiter_ops.py (modified, +4/-6)
vllm/kernels/vllm_c.py (modified, +5/-2)
vllm/kernels/xpu_ops.py (modified, +3/-1)
vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py (modified, +49/-113)
vllm/model_executor/layers/layernorm.py (modified, +13/-58)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py (removed, +0/-2541)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/__init__.py (added, +10/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe.py (added, +175/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w4a4_mxfp4.py (added, +168/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w4a4_nvfp4.py (added, +306/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w4a8_fp8.py (added, +343/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w4a8_int8.py (added, +349/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w8a8_fp8.py (added, +414/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w8a8_int8.py (added, +161/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_wna16.py (added, +267/-0)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_wna16_marlin.py (added, +575/-0)
vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py (modified, +23/-0)
vllm/model_executor/layers/quantization/modelopt.py (modified, +19/-0)
vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py (modified, +1/-1)
vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py (modified, +22/-9)
vllm/model_executor/layers/quantization/utils/nvfp4_utils.py (modified, +99/-40)
vllm/model_executor/models/config.py (modified, +26/-7)
vllm/model_executor/models/falcon_h1.py (modified, +63/-59)
vllm/model_executor/models/nano_nemotron_vl.py (modified, +52/-10)
vllm/platforms/rocm.py (modified, +1/-0)
vllm/transformers_utils/config.py (modified, +49/-0)
vllm/utils/import_utils.py (modified, +5/-0)
vllm/v1/attention/backends/rocm_aiter_fa.py (modified, +0/-2)
vllm/v1/worker/gpu/async_utils.py (modified, +2/-4)
vllm/v1/worker/gpu/model_runner.py (modified, +10/-4)

PR #39180: Fix RoPE init for Ernie4.5 on Transformers v5

Repository: vllm-project/vllm
Author: jacob-lou
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/39180

Description (problem / solution / changelog)

Title: compat: Fix Ernie4.5 RoPE initialization under Transformers v5 Fixes https://github.com/vllm-project/vllm/issues/38735

Summary

Remove automatic Transformers v5 validate_rope auto-validator during config construction.
Propagate rope_theta into config.rope_parameters after standardize_rope_params() so configs that set rope_theta after super().__init__() pass validation.
Fixes initialization failure for Ernie4_5_VLMoeForConditionalGeneration under Transformers v5 (see issue #38735, parent task #38379).

Purpose

Changes

config.py
- Add _disable_rope_auto_validation() to remove the v5 RoPE auto-validator from PretrainedConfig.__class_validators__ (applies only for Transformers >= 5).
- In patch_rope_parameters(), after config.standardize_rope_params() propagate config.rope_theta into config.rope_parameters['rope_theta'] when needed, before config.validate_rope().

Test Plan

Run the focused checks locally, then run CI full tests (recommended on Linux):

Quick version check

.venv/bin/python -c "import transformers; print(transformers.__version__)"
# Expect: transformers >= 5.0.0 (tested with 5.5.0)

Quick config load (verifies rope_theta propagation)

.venv/bin/python -c "from vllm.transformers_utils.config import get_config; c=get_config('baidu/ERNIE-4.5-VL-28B-A3B-PT', trust_remote_code=True); tc=c.get_text_config(); print(getattr(tc,'rope_theta',None), tc.rope_parameters.get('rope_theta'))"
# Expect output: 500000 500000  (or other configured value present in both places)

Targeted unit test (macOS: avoid fork issues)

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
.venv/bin/python -m pytest 'tests/models/test_initialization.py::test_can_initialize_large_subset[Ernie4_5_VLMoeForConditionalGeneration]' -q

Full test suite (run on CI / Linux recommended)

.venv/bin/python -m pytest tests -q

Test Results (local)

get_config(...) now loads successfully and prints:

rope_theta attr: 500000
rope_parameters[rope_theta]: 500000

AutoConfig.from_pretrained(...) (tokenizer path) also succeeds.
pre-commit hooks passed after ruff formatting.

Note: On macOS running the full test suite locally sometimes produces unrelated fork / Objective‑C initialization issues (SIGSEGV). These are platform-specific; please run full regression on Linux CI.

Risk & Rollout

Risk: We mutate Transformers’ __class_validators__ to remove only the validate_rope validator. This intentionally defers automatic validation to vLLM's explicit validation in patch_rope_parameters(). Validation coverage is preserved and the change is narrowly scoped (only affects RoPE auto-validator for Transformers >= 5).
Rollout: Safe to merge. Recommend running full tests on CI (Ubuntu) before merging to main.

Documentation & Release Notes

No public API change; no docs required for users. If you maintain a release-notes doc, add a short entry: Fix: Ensure Ernie4.5 initializes under Transformers v5 by deferring RoPE auto-validation and propagating rope_theta (issue #38735).

Checklist (please confirm)

The purpose of the PR is documented (links to issues: #38735, parent #38379).
Test plan provided (commands for targeted and full tests).
Test results recorded for the critical verification steps.
(Optional) Documentation updates — none required for user-facing docs in this change.
pre-commit / linters passed locally.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Changed files

PR_BODY.md (added, +49/-0)
vllm/transformers_utils/config.py (modified, +49/-0)

PR #45326: feat[vLLM × v5]: Add vLLM compatibility for audio models

Repository: huggingface/transformers
Author: harshaljanjani
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45326

Description (problem / solution / changelog)

What does this PR do?

→ This PR introduces compat fixes across several audio models to ensure they can be loaded and used by a companion vLLM PR. <ins>These changes are deliberate and are blocking</ins> this vLLM PR which adds audio backend compatibility to vLLM. Once this PR is merged, the other PR will be marked ready for review! → Outlining the design choices of one PR without context from the other didn't make much sense to me, so I wrote a doc that outlines both sets of changes together and explains their deliberate nature, amongst other valuable things! → The v5 tracker doesn’t mention the audio backend, but it is certainly a significant gap that needs to be addressed. After this is merged, I'll open an issue tracker for the Transformers audio backend work in vLLM so the efforts can stay organized.

Please refer to the document for the reasoning behind these changes in context with the vLLM PR! Document: v5 x vLLM Audio Backend Support Document

@vasqu @ArthurZucker

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.

Changed files

src/transformers/models/audioflamingo3/modeling_audioflamingo3.py (modified, +1/-0)
src/transformers/models/audioflamingo3/modular_audioflamingo3.py (modified, +1/-0)
src/transformers/models/auto/modeling_auto.py (modified, +1/-0)
src/transformers/models/glmasr/modeling_glmasr.py (modified, +1/-0)
src/transformers/models/glmasr/modular_glmasr.py (modified, +2/-0)
src/transformers/models/granite_speech/modeling_granite_speech.py (modified, +2/-0)
src/transformers/models/musicflamingo/modeling_musicflamingo.py (modified, +1/-0)
src/transformers/models/vibevoice_acoustic_tokenizer/feature_extraction_vibevoice_acoustic_tokenizer.py (modified, +1/-0)
src/transformers/models/vibevoice_asr/modeling_vibevoice_asr.py (modified, +1/-0)
src/transformers/models/vibevoice_asr/modular_vibevoice_asr.py (modified, +2/-0)
tests/models/granite_speech/test_modeling_granite_speech.py (modified, +6/-0)

PR #39330: feat[vLLM × v5]: Add audio support for the Transformers backend

Repository: vllm-project/vllm
Author: harshaljanjani
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39330

Description (problem / solution / changelog)

What does this PR do?

→ This PR adds support for v5 Transformers audio encoder models in the vLLM Transformers backend. <ins>These changes are deliberate and are blocked by</ins> this Transformers PR which adds prerequisite compatibility to the supported models for vLLM. Once that PR is merged, this PR will be marked ready for review! → Outlining the design choices of one PR without context from the other didn't make much sense to me, so I wrote a doc that outlines both sets of changes together and explains their deliberate nature, amongst other valuable things! → The v5 tracker doesn’t mention the audio backend, but it is certainly a significant gap that needs to be addressed. After this is merged, I'll open an issue tracker for the Transformers audio backend work in vLLM so the efforts can stay organized.

Please refer to the document for the reasoning behind these changes in context with the Transformers PR! Document: v5 x vLLM Audio Backend Support Document

Performance Metrics (Env mentioned in the document)

Reference Audio Transcript: “MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL”

Model	Output Text	Latency (E2E)	Throughput	Tokens
GLM-ASR-Nano-2512	"Mister Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."	856.3 ms	26.9 tok/s	23
Audio-Flamingo-3-HF	"The content of the input audio is 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'."	1779.6 ms	16.9 tok/s	30
VibeVoice-ASR-HF	`[{"Start":0,"End":5.0,"Speaker":0,"Content":"Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."}]`	2577.9 ms	17.1 tok/s	44
Granite-Speech-3.3-2B	"Mister Quilterter is the apostle of the middle classes, and we are glad to welcome his gospel. In written format: Mister Quilterter is the apostle of the middle classes, and we are glad to welcome his gospel."	3024.9 ms	19.5 tok/s	59

Related Issues:

→ Current v5 tracker: https://github.com/vllm-project/vllm/issues/38379 → https://github.com/vllm-project/vllm/issues/38902 → Solved out of the box with this PR: https://github.com/vllm-project/vllm/issues/32823 → Documented vLLM engine issue mentioned in the document: https://github.com/vllm-project/vllm/issues/17676

@vasqu (Transformers) @DarkLight1337 @hmellor (vLLM)

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.

PR Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command (<ins>document</ins>).
The test results, such as pasting the results comparison before and after, or e2e results (<ins>document</ins>)
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Changed files

tests/models/multimodal/processing/test_transformers_audio.py (added, +166/-0)
vllm/model_executor/models/transformers/causal.py (modified, +20/-0)
vllm/model_executor/models/transformers/multimodal.py (modified, +304/-115)

Code Example

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

## Which test is failing?

---

## How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

---

RAW_BUFFERClick to expand / collapse

What is this issue?

This issue serves as a living tracker for the current issues preventing us from upgrading vLLM to Transformers v5.

We will use sub-issues to track individual failures and PRs should be made against these sub-issues.

The solutions to these issues may need to be applied to either:

Transformers in the form of:
- Adding missing backward compatibility (usually for custom code models)
- General bug fixes/improvements to new features of v5
vLLM in the form of:
- Forward compatibility with how something is now done in v5
- Edge case handling for issues that v4 ignored (such as config validation)

Sometimes, the issue is simply with the model checkpoint itself, for example if it:

Contains a malformed config.json that cannot be used to instantiate the newly input validated PreTrainedConfig class
Custom code* uses deprecated/removed APIs

In these situations, the best solution will likely be to skip these tests in vLLM and open a PR to Transformers to contribute this model. This will be faster and more sustainable than waiting for the model vendor to fix their custom model code, sometimes they nevert do.

Contributing the new model should be done using the new Modular Transformers so that the implementation is easy to maintain and will remain maintained by the Transformers team.

*particularly in the parts of the model implementation that vLLM tries to directly reuse, such as config/tokenizer/multimodal processor

Sub-issue template

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

## Which test is failing?

```console
$ pytest 
...
expected output
```

## How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

```console
# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers
```

extent analysis

Fix Plan

To address the issues preventing the upgrade to Transformers v5, we will focus on the following steps:

Update custom code models to ensure backward compatibility with v5
Implement forward compatibility in vLLM for changes in v5
Handle edge cases ignored by v4, such as config validation
Contribute new models to Transformers using Modular Transformers

Example Code Changes

For custom code models, update deprecated/removed APIs:

# Before (v4)
from transformers import AutoConfig, AutoTokenizer

# After (v5)
from transformers import AutoConfig, AutoTokenizer
from transformers.models import PreTrainedConfig

# Update config validation
config = AutoConfig.from_pretrained("model_name")
if not isinstance(config, PreTrainedConfig):
    raise ValueError("Invalid config")

For vLLM, add forward compatibility and edge case handling:

# Before (v4)
def load_model(model_name):
    # Load model without validation

# After (v5)
def load_model(model_name):
    try:
        config = AutoConfig.from_pretrained(model_name)
        # Validate config and load model
    except Exception as e:
        # Handle edge cases and errors
        print(f"Error loading model: {e}")

Contributing New Models

Use Modular Transformers to contribute new models:

# Create a new model using Modular Transformers
from transformers.models import AutoModelForSequenceClassification

class MyModel(AutoModelForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)

# Contribute the new model to Transformers

Verification

Verify the fixes by running tests and checking for compatibility issues:

$ pytest

Ensure that all tests pass and the model can be loaded and used without errors.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix Upgrade to Transformers v5 [7 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #38461: Fixed issues

Description (problem / solution / changelog)

Purpose

Test Plan

Changed files

PR #38748: [Transformers v5] Fix NemotronParse image_size tuple unpack

Description (problem / solution / changelog)

Changed files

PR #39173: Fix RoPE init for Ernie4.5 on Transformers v5 — suppress premature va…

Description (problem / solution / changelog)

Summary

Purpose

Changes

Test Plan

Test Results (local)

Risk & Rollout

Documentation & Release Notes

Checklist (please confirm)

Changed files

PR #39180: Fix RoPE init for Ernie4.5 on Transformers v5

Description (problem / solution / changelog)

Summary

Purpose

Changes

Test Plan

Test Results (local)

Risk & Rollout

Documentation & Release Notes

Checklist (please confirm)

Changed files

PR #45326: feat[vLLM × v5]: Add vLLM compatibility for audio models

Description (problem / solution / changelog)

What does this PR do?

Code Agent Policy

Before submitting

Changed files

PR #39330: feat[vLLM × v5]: Add audio support for the Transformers backend

Description (problem / solution / changelog)

What does this PR do?

Performance Metrics (Env mentioned in the document)

Code Agent Policy

Before submitting

PR Checklist

Changed files

Code Example

What is this issue?

Sub-issue template

extent analysis

Fix Plan

Example Code Changes

Contributing New Models

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING