vllm - 💡(How to fix) Fix [Usage]: Support Gemma 4 E4B, 31B, 26B-A4B, and assistant variants (MTP) on TPU v6e 1x1 with vLLM

vllm2026-05-20 19:24:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

vllm/vllm-tpu:gemma4

The model starts loading on TPU, but fails during engine initialization with:

AttributeError: 'Tensor' object has no attribute '_elem'

Relevant stack trace:

Precompile input_embeddings_merger --> {'num_tokens': 16} ... File "/workspace/tpu_inference/tpu_inference/models/vllm/vllm_model_wrapper.py", line 384, in embed_input_ids_func output_from_torch = torch.func.functional_call( ... File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/gemma4_mm.py", line 933, in embed_input_ids self.per_layer_embeddings[: per_layer_inputs.shape[0]].copy_( ... File "/usr/local/lib/python3.12/site-packages/torchax/ops/jaten.py", line 107, in _aten_copy x._elem = y._elem.astype(x._elem.dtype) ^^^^^^^ AttributeError: 'Tensor' object has no attribute '_elem'

Code Example

vllm/vllm-tpu:gemma4

  The model starts loading on TPU, but fails during engine initialization with:

  AttributeError: 'Tensor' object has no attribute '_elem'

  Relevant stack trace:

  Precompile input_embeddings_merger --> {'num_tokens': 16}
  ...
  File "/workspace/tpu_inference/tpu_inference/models/vllm/vllm_model_wrapper.py", line 384, in embed_input_ids_func
    output_from_torch = torch.func.functional_call(
  ...
  File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/gemma4_mm.py", line 933, in embed_input_ids
    self.per_layer_embeddings[: per_layer_inputs.shape[0]].copy_(
  ...
  File "/usr/local/lib/python3.12/site-packages/torchax/ops/jaten.py", line 107, in _aten_copy
    x._elem = y._elem.astype(x._elem.dtype)
                             ^^^^^^^
  AttributeError: 'Tensor' object has no attribute '_elem'

RAW_BUFFERClick to expand / collapse

Your current environment

I tried to serve google/gemma-4-E4B-it on Google Cloud TPU v6e (tpu-v6e-slice, topology 1x1, accelerator count 1) using vLLM.

Image used:

vllm/vllm-tpu:gemma4

The model starts loading on TPU, but fails during engine initialization with:

AttributeError: 'Tensor' object has no attribute '_elem'

Relevant stack trace:

Precompile input_embeddings_merger --> {'num_tokens': 16}
...
File "/workspace/tpu_inference/tpu_inference/models/vllm/vllm_model_wrapper.py", line 384, in embed_input_ids_func
  output_from_torch = torch.func.functional_call(
...
File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/gemma4_mm.py", line 933, in embed_input_ids
  self.per_layer_embeddings[: per_layer_inputs.shape[0]].copy_(
...
File "/usr/local/lib/python3.12/site-packages/torchax/ops/jaten.py", line 107, in _aten_copy
  x._elem = y._elem.astype(x._elem.dtype)
                           ^^^^^^^
AttributeError: 'Tensor' object has no attribute '_elem'

I also tried limiting multimodal inputs for a text-only workload:

--limit-mm-per-prompt '{"image": 0, "audio": 0}'

but the model still did not come up successfully.

Could you confirm the current support status for Gemma 4 on vLLM TPU, specifically:

google/gemma-4-E4B-it
google/gemma-4-31B-it
google/gemma-4-26B-A4B-it
google/gemma-4-E4B-it-assistant
google/gemma-4-31B-it-assistant
google/gemma-4-26B-A4B-it-assistant

We are also interested in using the assistant variants for speculative decoding / MTP-style serving. Is this supported on TPU with vLLM today? If so, what TPU topology, image tag, and vLLM arguments are recommended?

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering