vllm - ✅(Solved) Fix [New Model]: Translategemma Support [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41540Fetched 2026-05-04 04:58:58
View on GitHub
Comments
3
Participants
3
Timeline
10
Reactions
0
Timeline (top)
commented ×3mentioned ×3subscribed ×3cross-referenced ×1

PR fix notes

PR #41599: [Model] Support TranslateGemma-12b-it

Description (problem / solution / changelog)

Purpose

Closes https://github.com/vllm-project/vllm/issues/41540. Related to https://github.com/vllm-project/vllm/issues/32446

This PR is copied over from https://github.com/vllm-project/vllm/pull/32819, with comments addressed.

  • Allow per-layer nested RoPE configs to be loaded by transformers v5.
  • Allow extra fields to be passed in via chat entrypoints. Use ChatCompletionContentPartParam Union type to register known fields.

(Hey @adityapuranik99, what is your email? I have added you as co-author in https://github.com/vllm-project/vllm/pull/41599/commits/9ff6b41e98d6a4cdaf7a8dd902eb53b2a24531d7 but is not showing. )

Test Plan

Run the actual model end to end, asking to translate The quick brown fox jumps over the lazy dog. English to Spanish.

<details>
# Install vLLM editable with our changes
VLLM_USE_PRECOMPILED=1 uv pip install --python .venv/bin/python -e . --torch-backend=auto

# Download model (gated; license must be accepted on huggingface.co)
HF_TOKEN=<token> HF_HUB_ENABLE_HF_TRANSFER=1 \
  .venv/bin/hf download google/translategemma-12b-it

# Serve
VLLM_USE_FLASHINFER_SAMPLER=0 HF_TOKEN=<token> \
  .venv/bin/vllm serve google/translategemma-12b-it \
    --max-model-len 8192 \
    --port 8000 \
    --chat-template-content-format openai

# Translation request
curl -sS http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/translategemma-12b-it",
    "messages": [{
      "role": "user",
      "content": [{
        "type": "text",
        "text": "The quick brown fox jumps over the lazy dog.",
        "source_lang_code": "en",
        "target_lang_code": "es"
      }]
    }],
    "max_tokens": 128,
    "temperature": 0.0
  }'
</details>

and also run the unit test,

.venv/bin/python -m pytest \
  tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_text_parts \
  tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_image_parts \
  tests/test_config.py::test_nested_rope_parameters \
  -v

Test Result

3 unit tests passed in 14.92s.

<details>
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.0.3, pluggy-1.6.0 -- /root/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /root/vllm
configfile: pyproject.toml
plugins: anyio-4.13.0, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 3 items

tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_text_parts PASSED [ 33%]
tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_image_parts PASSED [ 66%]
tests/test_config.py::test_nested_rope_parameters PASSED                 [100%]

=============================== warnings summary ===============================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: 14 warnings
  /root/vllm/.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

<frozen importlib._bootstrap_external>:1297
  <frozen importlib._bootstrap_external>:1297: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.

<frozen importlib._bootstrap_external>:1297
  <frozen importlib._bootstrap_external>:1297: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================= 3 passed, 18 warnings in 14.92s ========================
</details>

Output of the English to Spanish translation request listed above: El rápido zorro marrón salta sobre el perro perezoso.

<details>
{
  "id": "chatcmpl-b7c34f4f69aa6028",
  "object": "chat.completion",
  "created": 1777861533,
  "model": "google/translategemma-12b-it",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "El rápido zorro marrón salta sobre el perro perezoso.\n",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": 106,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": "vllm-0.1.dev16282+gbade09474-77ae533f",
  "usage": {
    "prompt_tokens": 85,
    "total_tokens": 101,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}
</details>

Other translation requests:

=== fr -> en ===
input:  Bonjour, comment allez-vous aujourdhui?
output: Hello, how are you today?

=== en -> ja ===
input:  Where is the train station?
output: 駅はどこですか?

=== cs -> de ===
input:  V nejhorším případě i k prasknutí čočky.
output: Im schlimmsten Fall kann es sogar zum Riss der Linse kommen.

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • tests/entrypoints/test_chat_utils.py (modified, +87/-0)
  • tests/test_config.py (modified, +31/-0)
  • vllm/entrypoints/chat_utils.py (modified, +39/-3)
  • vllm/transformers_utils/config.py (modified, +5/-2)
RAW_BUFFERClick to expand / collapse

The model to consider.

https://huggingface.co/google/translategemma-12b-it

The closest model vllm already supports.

Support for translategemma 12B model for vLLM inference and comet support for evaluations.

What's your difficulty of supporting the model you want?

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue can be addressed by exploring the existing support for the translategemma 12B model in vLLM inference and comet evaluations.

Guidance

  • Review the documentation page for vLLM to understand the current support and limitations for the translategemma 12B model.
  • Check the chatbot at the bottom right corner of the documentation page for answers to frequently asked questions related to model support.
  • Investigate the differences between the desired model and the closest supported model to identify potential workarounds or modifications.

Notes

The provided information lacks specific details about the difficulty or error encountered, making it challenging to provide a precise solution.

Recommendation

Apply workaround: Utilize the existing support for the translategemma 12B model in vLLM inference and comet evaluations as a temporary solution until more comprehensive support is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [New Model]: Translategemma Support [2 pull requests, 3 comments, 3 participants]