vllm - ✅(Solved) Fix [Transformers v5] InternVL2 [2 pull requests, 4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38425Fetched 2026-04-08 01:41:29
View on GitHub
Comments
4
Participants
4
Timeline
16
Reactions
0
Author
Timeline (top)
commented ×4mentioned ×3subscribed ×3cross-referenced ×2

Error Message

$ pytest models/multimodal/generation/test_common.py::test_single_image_models[intern_vl-test_case25] ... RuntimeError: Tensor.item() cannot be called on meta tensors

PR fix notes

PR #45097: Add old InternVL2-1B/2B support to the InternVL conversion script #45092

Description (problem / solution / changelog)

What does this PR do?

This PR extends the InternVL conversion script to support the old OpenGVLab/InternVL2-1B and OpenGVLab/InternVL2-2B checkpoints. These checkpoints currently rely on remote code and are problematic for downstream users on Transformers v5. Instead of instantiating the original remote-code models, the converter now reads the original config and weights directly and emits HF-native InternVLForConditionalGeneration checkpoints.

Fixes # (issue) https://github.com/huggingface/transformers/issues/45092 38425

before that : on vllm main branch: pytest
tests/models/multimodal/generation/test_common.py
-k 'intern_vl2-hf-local and test_multi_image_models' -vv broken;

this branch:

VLLM_TEST_INTERNVL2_HF_MODEL=/tmp/InternVL2-1B-hf /root/venv/bin/python -m pytest
tests/models/multimodal/generation/test_common.py
-k 'intern_vl2-hf-local and test_multi_image_models' -vv passed

Validation

Ran the following with source ~/venv/bin/activate:

python src/transformers/models/internvl/convert_internvl_weights_to_hf.py \
  --input_dir OpenGVLab/InternVL2-1B \
  --output_dir /tmp/InternVL2-1B-hf

python src/transformers/models/internvl/convert_internvl_weights_to_hf.py \
  --input_dir OpenGVLab/InternVL2-2B \
  --output_dir /tmp/InternVL2-2B-hf

- [ ] I confirm that this is not a pure code agent PR.

## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
      [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and
      [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @

 If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
 Please tag fewer than 3 people.

Models:

- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier

Library:

- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker
- CIs: @ydshieh

Integrations:

- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization: @SunMarc
- kernels: @drbh
- peft: @BenjaminBossan @githubnemo

Devices/Backends:

- AMD ROCm: @ivarflakstad
- Intel XPU: @IlyasMoutawwakil
- Ascend NPU: @ivarflakstad 

Documentation: @stevhliu

Research projects are not maintained and should be taken as is.

 -->

## Changed files

- `src/transformers/conversion_mapping.py` (modified, +37/-0)
- `src/transformers/core_model_loading.py` (modified, +92/-0)
- `src/transformers/models/auto/configuration_auto.py` (modified, +8/-0)
- `src/transformers/models/auto/image_processing_auto.py` (modified, +81/-13)
- `src/transformers/models/auto/modeling_auto.py` (modified, +2/-0)
- `src/transformers/models/auto/processing_auto.py` (modified, +1/-0)
- `src/transformers/models/auto/video_processing_auto.py` (modified, +70/-6)
- `src/transformers/models/internvl/configuration_internvl.py` (modified, +148/-0)
- `src/transformers/models/internvl/processing_internvl.py` (modified, +27/-11)
- `tests/models/auto/test_configuration_auto.py` (modified, +83/-0)
- `tests/models/auto/test_image_processing_auto.py` (modified, +45/-0)
- `tests/models/auto/test_processor_auto.py` (modified, +88/-0)
- `tests/models/auto/test_video_processing_auto.py` (modified, +20/-0)
- `tests/utils/test_core_model_loading.py` (modified, +69/-0)


---

# PR #39974: [Tests][Transformers v5] Skip InternVL2 HF-runner tests incompatible with meta device init

- Repository: vllm-project/vllm
- Author: Spectual
- State: open | merged: False
- Link: https://github.com/vllm-project/vllm/pull/39974

## Description (problem / solution / changelog)

## Summary

- Adds `pytest.mark.skipif` marks to the `intern_vl` and `intern_vl-diff-patches` `VLMTestInfo` entries in `tests/models/multimodal/generation/test_common.py`
- These tests are skipped when `transformers >= 5.0.0` is installed, since `OpenGVLab/InternVL2-1B` and `OpenGVLab/InternVL2-2B` custom code calls `Tensor.item()` during model initialization
- This is incompatible with Transformers v5's meta device initialization pattern, causing `RuntimeError: Tensor.item() cannot be called on meta tensors`

Fixes #38425

## Why this is not a duplicate

No existing open PRs address #38425. Checked with:

gh pr list --repo vllm-project/vllm --state open --search "38425 in:body"


## Root cause

Transformers v5 initializes models on the meta device first, then loads weights. The InternVL2 custom code (HuggingFace Hub, `trust_remote_code=True`) calls `Tensor.item()` during `__init__`, which is not permitted on meta tensors. This only affects the HF reference runner used for comparison — vLLM's own `InternVLChatModel` is unaffected.

## Long-term fix

The proper solution is to upstream InternVL2 to Transformers using Modular Transformers (reusing the Qwen2 text backbone), as described in the issue. This PR provides the necessary short-term workaround.

## Test plan

- [x] `ruff check tests/models/multimodal/generation/test_common.py` — passes
- [ ] `pytest tests/models/multimodal/generation/test_common.py -k "intern_vl-"` — requires GPU; skipped locally

**Note:** These tests only need to be verified as *skipped* with `transformers>=5.0.0` and *passing* with `transformers<5.0.0`.

## AI assistance

This PR was developed with Claude Code assistance. All changed lines were reviewed by the submitter.

## Changed files

- `tests/models/multimodal/generation/test_common.py` (modified, +18/-0)

Code Example

$ pytest models/multimodal/generation/test_common.py::test_single_image_models[intern_vl-test_case25]
...
RuntimeError: Tensor.item() cannot be called on meta tensors

---

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers
RAW_BUFFERClick to expand / collapse

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

Transformers v5 creates the model on the meta device first, then loads the weights, similarly to what vLLM does. The issue here is that the custom model code in the checkpoint tries to use real tensors as part of model structure construction.

Since the issue here is with the HF reference generation, this cannot be fixed in vLLM (other than skipping the tests until the model works with Transformers v5). The proper solution to this issue is to upstream this architecture, which shouldn't be too hard using Modular Transformers as the text backbone is Qwen2 so that can be reused.

$ pytest models/multimodal/generation/test_common.py::test_single_image_models[intern_vl-test_case25]
...
RuntimeError: Tensor.item() cannot be called on meta tensors

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

extent analysis

Fix Plan

To fix the issue with Tensor.item() not being callable on meta tensors, we need to modify the custom model code to handle meta tensors correctly.

Here are the steps:

  • Update the model code to check if a tensor is a meta tensor before calling item() on it.
  • Use the to() method to move the tensor to a device that supports item() if necessary.

Example code:

import torch

# Check if tensor is a meta tensor
if tensor.is_meta:
    # Move tensor to a device that supports item()
    tensor = tensor.to('cpu')
    # Now we can call item() on the tensor
    value = tensor.item()
else:
    value = tensor.item()

Alternatively, you can use the torch.tensor.detach() method to detach the tensor from the computation graph and then call item() on it:

value = tensor.detach().item()

Note that this approach assumes that the tensor is not being used in a computation graph.

Verification

To verify that the fix worked, run the test again:

pytest models/multimodal/generation/test_common.py::test_single_image_models[intern_vl-test_case25]

If the test passes without raising a RuntimeError, the fix was successful.

Extra Tips

  • Make sure to test the model with different input types and devices to ensure that the fix works in all scenarios.
  • Consider adding a check for meta tensors in other parts of the code where item() is called to prevent similar issues in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Transformers v5] InternVL2 [2 pull requests, 4 comments, 4 participants]