PR fix notes

PR #41599: [Model] Support TranslateGemma-12b-it

Repository: vllm-project/vllm
Author: zhangj1an
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/41599

Description (problem / solution / changelog)

Purpose

Closes https://github.com/vllm-project/vllm/issues/41540. Related to https://github.com/vllm-project/vllm/issues/32446

This PR is copied over from https://github.com/vllm-project/vllm/pull/32819, with comments addressed.

Allow per-layer nested RoPE configs to be loaded by transformers v5.
Allow extra fields to be passed in via chat entrypoints. Use ChatCompletionContentPartParam Union type to register known fields.

(Hey @adityapuranik99, what is your email? I have added you as co-author in https://github.com/vllm-project/vllm/pull/41599/commits/9ff6b41e98d6a4cdaf7a8dd902eb53b2a24531d7 but is not showing. )

Test Plan

Run the actual model end to end, asking to translate The quick brown fox jumps over the lazy dog. English to Spanish.

# Install vLLM editable with our changes
VLLM_USE_PRECOMPILED=1 uv pip install --python .venv/bin/python -e . --torch-backend=auto

# Download model (gated; license must be accepted on huggingface.co)
HF_TOKEN=<token> HF_HUB_ENABLE_HF_TRANSFER=1 \
  .venv/bin/hf download google/translategemma-12b-it

# Serve
VLLM_USE_FLASHINFER_SAMPLER=0 HF_TOKEN=<token> \
  .venv/bin/vllm serve google/translategemma-12b-it \
    --max-model-len 8192 \
    --port 8000 \
    --chat-template-content-format openai

# Translation request
curl -sS http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/translategemma-12b-it",
    "messages": [{
      "role": "user",
      "content": [{
        "type": "text",
        "text": "The quick brown fox jumps over the lazy dog.",
        "source_lang_code": "en",
        "target_lang_code": "es"
      }]
    }],
    "max_tokens": 128,
    "temperature": 0.0
  }'

</details>

and also run the unit test,

.venv/bin/python -m pytest \
  tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_text_parts \
  tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_image_parts \
  tests/test_config.py::test_nested_rope_parameters \
  -v

Test Result

3 unit tests passed in 14.92s.

============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.0.3, pluggy-1.6.0 -- /root/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /root/vllm
configfile: pyproject.toml
plugins: anyio-4.13.0, asyncio-1.3.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 3 items

tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_text_parts PASSED [ 33%]
tests/entrypoints/test_chat_utils.py::test_extra_fields_preserved_in_image_parts PASSED [ 66%]
tests/test_config.py::test_nested_rope_parameters PASSED                 [100%]

=============================== warnings summary ===============================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: 14 warnings
  /root/vllm/.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

<frozen importlib._bootstrap_external>:1297
  <frozen importlib._bootstrap_external>:1297: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.

<frozen importlib._bootstrap_external>:1297
  <frozen importlib._bootstrap_external>:1297: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================= 3 passed, 18 warnings in 14.92s ========================

</details>

Output of the English to Spanish translation request listed above: El rápido zorro marrón salta sobre el perro perezoso.

{
  "id": "chatcmpl-b7c34f4f69aa6028",
  "object": "chat.completion",
  "created": 1777861533,
  "model": "google/translategemma-12b-it",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "El rápido zorro marrón salta sobre el perro perezoso.\n",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": 106,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": "vllm-0.1.dev16282+gbade09474-77ae533f",
  "usage": {
    "prompt_tokens": 85,
    "total_tokens": 101,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}

</details>

Other translation requests:

=== fr -> en ===
input:  Bonjour, comment allez-vous aujourdhui?
output: Hello, how are you today?

=== en -> ja ===
input:  Where is the train station?
output: 駅はどこですか？

=== cs -> de ===
input:  V nejhorším případě i k prasknutí čočky.
output: Im schlimmsten Fall kann es sogar zum Riss der Linse kommen.

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

</details>

Changed files

tests/entrypoints/test_chat_utils.py (modified, +87/-0)
tests/test_config.py (modified, +31/-0)
vllm/entrypoints/chat_utils.py (modified, +39/-3)
vllm/transformers_utils/config.py (modified, +5/-2)

extent analysis

TL;DR

The issue can be addressed by exploring the existing support for the translategemma 12B model in vLLM inference and comet evaluations.

Guidance

Review the documentation page for vLLM to understand the current support and limitations for the translategemma 12B model.
Check the chatbot at the bottom right corner of the documentation page for answers to frequently asked questions related to model support.
Investigate the differences between the desired model and the closest supported model to identify potential workarounds or modifications.

Notes

The provided information lacks specific details about the difficulty or error encountered, making it challenging to provide a precise solution.

Recommendation

Apply workaround: Utilize the existing support for the translategemma 12B model in vLLM inference and comet evaluations as a temporary solution until more comprehensive support is available.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [New Model]: Translategemma Support [2 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #41599: [Model] Support TranslateGemma-12b-it

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [New Model]: Translategemma Support [2 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #41599: [Model] Support TranslateGemma-12b-it

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING