vllm - ✅(Solved) Fix [Bug]: DeepSeek-OCR v1 crashes with TensorSchema mismatch when images_crop is empty (small images ≤640px) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36669Fetched 2026-04-08 00:35:28
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
project_v2_item_status_changed ×2added_to_project_v2 ×1closed ×1cross-referenced ×1

Error Message

ValueError: images_crop dim[2] expected 1024, got 640. Expected shape: ('bnp', 3, 1024, 1024), but got torch.Size([0, 3, 640, 640])

Root Cause

Root cause: In _parse_and_validate_image_input (deepseek_ocr.py#L455), when images_crop.numel() == 0 (no crops needed for small images), the code sets image_size = base_size = 1024. But the empty tensor's shape is still (0, 3, 640, 640) — the image_size dimension carries 640 from the Gundam processor preset. TensorSchema.validate() then sees the mismatch: expected 1024, got 640.

Fix Action

Fixed

PR fix notes

PR #36670: [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop

Description (problem / solution / changelog)

Summary

Fixes a crash in DeepseekOCRForCausalLM (deepseek-ai/DeepSeek-OCR) when processing images that do not require cropping (≤ 640×640 pixels).

Fixes #36669

Root Cause

In _parse_and_validate_image_input, when images_crop is an empty tensor (no crops needed for small images), the code checked images_crop.numel() > 0 and fell back to image_size = base_size = 1024. But the empty tensor's shape is (0, 3, 640, 640) — dimension 640 comes from the Gundam processor preset. TensorSchema validation then fails: expected 1024, got 640.

This kills the V1 engine on the first small image, making all subsequent requests fail with EngineDeadError.

Fix

Remove the numel() > 0 guard. tensor.shape[-1] works correctly on zero-element tensors and returns the actual dimension size (640).

-        if images_crop is not None and images_crop.numel() > 0:
+        if images_crop is not None:
             image_size = images_crop.shape[-1]

Testing

  • Added 4 regression tests in tests/multimodal/test_deepseek_ocr_empty_crop_unit.py:
    • Empty images_crop with Gundam preset (the crashing case)
    • Populated images_crop with Gundam preset (existing happy path)
    • Empty images_crop with Base preset (image_size == base_size)
    • Deliberately mismatched binding still raises ValueError
  • Verified with real PDFs containing mixed small/large page images

Related

  • deepseek_ocr2.py avoids this by not binding image_size in resolve_bindings
  • TODO(Isotr0py) at processors/deepseek_ocr.py:24 about exposing presets via mm_kwargs (broader fix, separate PR)

cc @Isotr0py @DarkLight1337

Changed files

  • tests/models/multimodal/processing/test_deepseek_ocr.py (added, +134/-0)
  • vllm/model_executor/models/deepseek_ocr.py (modified, +1/-4)

Code Example

ValueError: images_crop dim[2] expected 1024, got 640.
Expected shape: ('bnp', 3, 1024, 1024), but got torch.Size([0, 3, 640, 640])

---

# Before (buggy):
if images_crop is not None and images_crop.numel() > 0:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

# After (fixed):
if images_crop is not None:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

---

from vllm import LLM, SamplingParams
from PIL import Image

small_image = Image.new("RGB", (400, 300), color="white")

llm = LLM(
    model="deepseek-ai/DeepSeek-OCR",
    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
    dtype="bfloat16",
    max_model_len=4096,
)

# This will crash with EngineDeadError
output = llm.generate(
    [{"prompt": "<image>\nDescribe this image.",
      "multi_modal_data": {"image": small_image}}],
    SamplingParams(temperature=0.0, max_tokens=100),
)
RAW_BUFFERClick to expand / collapse

Your current environment

  • vLLM version: 0.17.0 (also current main)
  • GPU: NVIDIA A100
  • Python: 3.12
  • CUDA: 13.1

Model

deepseek-ai/DeepSeek-OCR (DeepseekOCRForCausalLM)

🐛 Describe the bug

DeepseekOCRForCausalLM crashes with a fatal EngineDeadError when processing images that do not require cropping (images ≤ 640×640 pixels). The V1 engine dies on the first such request and all subsequent requests fail.

Error:

ValueError: images_crop dim[2] expected 1024, got 640.
Expected shape: ('bnp', 3, 1024, 1024), but got torch.Size([0, 3, 640, 640])

Root cause: In _parse_and_validate_image_input (deepseek_ocr.py#L455), when images_crop.numel() == 0 (no crops needed for small images), the code sets image_size = base_size = 1024. But the empty tensor's shape is still (0, 3, 640, 640) — the image_size dimension carries 640 from the Gundam processor preset. TensorSchema.validate() then sees the mismatch: expected 1024, got 640.

The fix is trivial — remove the numel() > 0 guard since shape[-1] is valid on zero-element tensors:

# Before (buggy):
if images_crop is not None and images_crop.numel() > 0:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

# After (fixed):
if images_crop is not None:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

Note: deepseek_ocr2.py avoids this entirely by not binding image_size in resolve_bindings (only binds base_size).

Why only some images crash

  • Large images (> 640×640): crops are created → numel() > 0image_size = 640 → schema validates OK ✓
  • Small images (≤ 640×640): no crops → numel() == 0image_size = 1024 → mismatch with (0, 3, 640, 640)crash

Once the V1 engine encounters a single crashing request, EngineCore raises EngineDeadError and all subsequent requests fail.

How to reproduce

from vllm import LLM, SamplingParams
from PIL import Image

small_image = Image.new("RGB", (400, 300), color="white")

llm = LLM(
    model="deepseek-ai/DeepSeek-OCR",
    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
    dtype="bfloat16",
    max_model_len=4096,
)

# This will crash with EngineDeadError
output = llm.generate(
    [{"prompt": "<image>\nDescribe this image.",
      "multi_modal_data": {"image": small_image}}],
    SamplingParams(temperature=0.0, max_tokens=100),
)

Before submitting a new issue...

  • I have searched for similar issues
  • I have verified the bug on the latest vLLM main branch

extent analysis

Fix Plan

To fix the EngineDeadError crash when processing small images, update the _parse_and_validate_image_input method in deepseek_ocr.py as follows:

  • Remove the numel() > 0 guard when checking the images_crop tensor.
  • Use the shape[-1] attribute to get the image size, which is valid even for zero-element tensors.
# Updated code
if images_crop is not None:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

Verification

To verify the fix, run the provided reproduction code with the updated deepseek_ocr.py file:

from vllm import LLM, SamplingParams
from PIL import Image

small_image = Image.new("RGB", (400, 300), color="white")

llm = LLM(
    model="deepseek-ai/DeepSeek-OCR",
    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
    dtype="bfloat16",
    max_model_len=4096,
)

output = llm.generate(
    [{"prompt": "<image>\nDescribe this image.",
      "multi_modal_data": {"image": small_image}}],
    SamplingParams(temperature=0.0, max_tokens=100),
)

If the fix is successful, the code should run without crashing and produce a valid output.

Extra Tips

  • Make sure to update the deepseek_ocr.py file with the corrected code.
  • If you encounter any further issues, try resetting the V1 engine or restarting the application.
  • Consider adding additional error handling or logging to help diagnose similar issues in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: DeepSeek-OCR v1 crashes with TensorSchema mismatch when images_crop is empty (small images ≤640px) [1 pull requests, 1 participants]