vllm - 💡(How to fix) Fix [Usage]: Qwen3-VL inference on video complains of lack of metadata [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38811Fetched 2026-04-08 02:34:44
View on GitHub
Comments
3
Participants
3
Timeline
6
Reactions
0
Timeline (top)
commented ×3labeled ×1mentioned ×1subscribed ×1

Error Message

File "/home/jarvis/anaconda3/envs/Qwen3-VL/lib/python3.12/site-packages/vllm/multimodal/parse.py", line 511, in _parse_video_data raise ValueError( ValueError: Video metadata is required but not found in mm input. Please check your video input in multi_modal_data

Code Example

The output of `python collect_env.py`

==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : Could not collect
CMake version                : version 3.27.6
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 | packaged by Anaconda, Inc. | (main, Mar 19 2026, 20:20:58) [GCC 14.3.0] (64-bit runtime)
Python platform              : Linux-6.8.0-107-generic-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.1.66
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU
Nvidia driver version        : 580.126.09
cuDNN version                : Probably one of the following:
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.5
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_adv.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_cnn.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_precompiled.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_runtime_compiled.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_graph.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_heuristic.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_ops.so.9
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           39 bits physical, 48 bits virtual
Byte Order:                              Little Endian
Vendor ID:                               GenuineIntel

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.3
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.21.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.57.6
[pip3] triton==3.5.0
[conda] flashinfer-python                    0.5.3                     pypi_0           pypi
[conda] numpy                                2.2.6                     pypi_0           pypi
[conda] nvidia-cublas-cu12                   12.8.4.1                  pypi_0           pypi
[conda] nvidia-cuda-cupti-cu12               12.8.90                   pypi_0           pypi
[conda] nvidia-cuda-nvrtc-cu12               12.8.93                   pypi_0           pypi
[conda] nvidia-cuda-runtime-cu12             12.8.90                   pypi_0           pypi
[conda] nvidia-cudnn-cu12                    9.10.2.21                 pypi_0           pypi
[conda] nvidia-cudnn-frontend                1.21.0                    pypi_0           pypi
[conda] nvidia-cufft-cu12                    11.3.3.83                 pypi_0           pypi
[conda] nvidia-cufile-cu12                   1.13.1.3                  pypi_0           pypi
[conda] nvidia-curand-cu12                   10.3.9.90                 pypi_0           pypi
[conda] nvidia-cusolver-cu12                 11.7.3.90                 pypi_0           pypi
[conda] nvidia-cusparse-cu12                 12.5.8.93                 pypi_0           pypi
[conda] nvidia-cusparselt-cu12               0.7.1                     pypi_0           pypi
[conda] nvidia-cutlass-dsl                   4.4.2                     pypi_0           pypi
[conda] nvidia-cutlass-dsl-libs-base         4.4.2                     pypi_0           pypi
[conda] nvidia-ml-py                         13.595.45                 pypi_0           pypi
[conda] nvidia-nccl-cu12                     2.27.5                    pypi_0           pypi
[conda] nvidia-nvjitlink-cu12                12.8.93                   pypi_0           pypi
[conda] nvidia-nvshmem-cu12                  3.3.20                    pypi_0           pypi
[conda] nvidia-nvtx-cu12                     12.8.90                   pypi_0           pypi
[conda] pyzmq                                27.1.0                    pypi_0           pypi
[conda] torch                                2.9.0                     pypi_0           pypi
[conda] torchaudio                           2.9.0                     pypi_0           pypi
[conda] torchvision                          0.24.0                    pypi_0           pypi
[conda] transformers                         4.57.6                    pypi_0           pypi
[conda] triton                               3.5.0                     pypi_0           pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.2.dev453+g653591d5e (git sha: 653591d5e)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  	GPU0	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	0-15	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:/usr/local/cuda-12.1/extras/CUPTI/lib64:/usr/local/cuda/lib64:
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

---

from transformers import AutoProcessor
from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info

model_path = "Qwen/Qwen3-VL-2B-Instruct-FP8" 
video_path = "https://content.pexels.com/videos/free-videos.mp4"

llm = LLM(
    model=model_path,
    gpu_memory_utilization=0.9, 
    max_model_len=40000, # to fit in 16Gb VRAM
    enforce_eager=True,
    limit_mm_per_prompt={"video": 1},
)

sampling_params = SamplingParams(max_tokens=1024)

video_messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant.",
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Please describe this video. Indicate how many different scenes it has."},
            {
                "type": "video",
                "video": video_path,
                "total_pixels": 256 * 28 * 28, # 20480 * 28 * 28,
                "min_pixels": 16 * 28 * 28,
            },
        ]
    },
]

messages = video_messages
processor = AutoProcessor.from_pretrained(model_path)
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

image_inputs, video_inputs = process_vision_info(messages)
mm_data = {}
if video_inputs is not None:
    mm_data["video"] = video_inputs

llm_inputs = {
    "prompt": prompt,
    "multi_modal_data": mm_data,
}

outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

---

File "/home/jarvis/anaconda3/envs/Qwen3-VL/lib/python3.12/site-packages/vllm/multimodal/parse.py", line 511, in _parse_video_data
    raise ValueError(
ValueError: Video metadata is required but not found in mm input. Please check your video input in `multi_modal_data`

---

video_array, metadata = item
RAW_BUFFERClick to expand / collapse

Your current environment

The output of `python collect_env.py`

==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : Could not collect
CMake version                : version 3.27.6
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 | packaged by Anaconda, Inc. | (main, Mar 19 2026, 20:20:58) [GCC 14.3.0] (64-bit runtime)
Python platform              : Linux-6.8.0-107-generic-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.1.66
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU
Nvidia driver version        : 580.126.09
cuDNN version                : Probably one of the following:
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.5
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.5
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_adv.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_cnn.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_precompiled.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_runtime_compiled.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_graph.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_heuristic.so.9
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_ops.so.9
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           39 bits physical, 48 bits virtual
Byte Order:                              Little Endian
Vendor ID:                               GenuineIntel

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.3
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.21.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.57.6
[pip3] triton==3.5.0
[conda] flashinfer-python                    0.5.3                     pypi_0           pypi
[conda] numpy                                2.2.6                     pypi_0           pypi
[conda] nvidia-cublas-cu12                   12.8.4.1                  pypi_0           pypi
[conda] nvidia-cuda-cupti-cu12               12.8.90                   pypi_0           pypi
[conda] nvidia-cuda-nvrtc-cu12               12.8.93                   pypi_0           pypi
[conda] nvidia-cuda-runtime-cu12             12.8.90                   pypi_0           pypi
[conda] nvidia-cudnn-cu12                    9.10.2.21                 pypi_0           pypi
[conda] nvidia-cudnn-frontend                1.21.0                    pypi_0           pypi
[conda] nvidia-cufft-cu12                    11.3.3.83                 pypi_0           pypi
[conda] nvidia-cufile-cu12                   1.13.1.3                  pypi_0           pypi
[conda] nvidia-curand-cu12                   10.3.9.90                 pypi_0           pypi
[conda] nvidia-cusolver-cu12                 11.7.3.90                 pypi_0           pypi
[conda] nvidia-cusparse-cu12                 12.5.8.93                 pypi_0           pypi
[conda] nvidia-cusparselt-cu12               0.7.1                     pypi_0           pypi
[conda] nvidia-cutlass-dsl                   4.4.2                     pypi_0           pypi
[conda] nvidia-cutlass-dsl-libs-base         4.4.2                     pypi_0           pypi
[conda] nvidia-ml-py                         13.595.45                 pypi_0           pypi
[conda] nvidia-nccl-cu12                     2.27.5                    pypi_0           pypi
[conda] nvidia-nvjitlink-cu12                12.8.93                   pypi_0           pypi
[conda] nvidia-nvshmem-cu12                  3.3.20                    pypi_0           pypi
[conda] nvidia-nvtx-cu12                     12.8.90                   pypi_0           pypi
[conda] pyzmq                                27.1.0                    pypi_0           pypi
[conda] torch                                2.9.0                     pypi_0           pypi
[conda] torchaudio                           2.9.0                     pypi_0           pypi
[conda] torchvision                          0.24.0                    pypi_0           pypi
[conda] transformers                         4.57.6                    pypi_0           pypi
[conda] triton                               3.5.0                     pypi_0           pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.2.dev453+g653591d5e (git sha: 653591d5e)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  	GPU0	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	0-15	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:/usr/local/cuda-12.1/extras/CUPTI/lib64:/usr/local/cuda/lib64:
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

How would you like to use vllm

I want to run inference of Qwen3-VL on a demo video.

I am running the following code:


from transformers import AutoProcessor
from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info

model_path = "Qwen/Qwen3-VL-2B-Instruct-FP8" 
video_path = "https://content.pexels.com/videos/free-videos.mp4"

llm = LLM(
    model=model_path,
    gpu_memory_utilization=0.9, 
    max_model_len=40000, # to fit in 16Gb VRAM
    enforce_eager=True,
    limit_mm_per_prompt={"video": 1},
)

sampling_params = SamplingParams(max_tokens=1024)

video_messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant.",
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Please describe this video. Indicate how many different scenes it has."},
            {
                "type": "video",
                "video": video_path,
                "total_pixels": 256 * 28 * 28, # 20480 * 28 * 28,
                "min_pixels": 16 * 28 * 28,
            },
        ]
    },
]

messages = video_messages
processor = AutoProcessor.from_pretrained(model_path)
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

image_inputs, video_inputs = process_vision_info(messages)
mm_data = {}
if video_inputs is not None:
    mm_data["video"] = video_inputs

llm_inputs = {
    "prompt": prompt,
    "multi_modal_data": mm_data,
}

outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

I am getting the following error:


  File "/home/jarvis/anaconda3/envs/Qwen3-VL/lib/python3.12/site-packages/vllm/multimodal/parse.py", line 511, in _parse_video_data
    raise ValueError(
ValueError: Video metadata is required but not found in mm input. Please check your video input in `multi_modal_data`

This error seems to happen in both the latest release and the nightly build.

vllm/model_executor/models/qwen3_vl.py needs metadata when it does (line 970):

video_array, metadata = item

But MultiModalDataParser._get_video_with_metadata(video) is receiving a numpy array, so no metadata.

I'm uncertain to which extent this is a problem between qwen-vl-utils and vLLM, or just an vLLM issue.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The error is caused by missing video metadata in the multi_modal_data input, which is required by the vLLM model.

Guidance

  1. Check the process_vision_info function: Verify that this function is correctly extracting and returning video metadata.
  2. Verify video input format: Ensure that the video input is in the correct format and contains the required metadata.
  3. Update multi_modal_data: Modify the code to include video metadata in the multi_modal_data dictionary, which is passed to the llm.generate method.
  4. Consult qwen-vl-utils documentation: Review the documentation for qwen-vl-utils to ensure that the process_vision_info function is being used correctly and that video metadata is being extracted and returned as expected.

Example

No code example is provided as the issue is related to the interaction between qwen-vl-utils and vLLM, and the exact solution depends on the implementation details of these libraries.

Notes

The error message indicates that video metadata is required but not found in the mm input. This suggests that the issue is related to the formatting or content of the video input data. The vLLM model expects video metadata to be present in the multi_modal_data input, but it is not being provided.

Recommendation

Apply a workaround by modifying the code to include video metadata in the multi_modal_data dictionary. This may involve updating the process_vision_info function or modifying the video input data to include the required metadata.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING