transformers - 💡(How to fix) Fix [v5] Issues with tied weights on translation models in v5 [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45005Fetched 2026-04-08 01:30:52
View on GitHub
Comments
2
Participants
2
Timeline
10
Reactions
0
Timeline (top)
subscribed ×3commented ×2mentioned ×2renamed ×2

Code Example

# %%
import torch
import typer
from transformers import MarianMTModel, MarianTokenizer

translations = [
    {
        "model": "Helsinki-NLP/opus-mt-fr-en",
        "input": "Bonjour",
        "expected": "Hello",
    },
    {
        "model": "Helsinki-NLP/opus-mt-es-en",
        "input": "Hola",
        "expected": "Hello",
    },
    {
        "model": "Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en",
        "input": "Hola",
        "expected": "Hello",
    },
]

def translate():

    print(f"Torch version: {torch.__version__}")
    print(f"CUDA Detected: {torch.version.cuda}")
    print(f"CUDA Available: {torch.cuda.is_available()}")

    for item in translations:
        model_name = item["model"]
        language_input = item["input"]
        expected_output = item["expected"]

        device = "cuda:0" if torch.cuda.is_available() else "cpu"
        tokenizer = MarianTokenizer.from_pretrained(model_name)
        model = MarianMTModel.from_pretrained(model_name)

        inputs = tokenizer(language_input, return_tensors="pt", padding=True).to(device)
        model = model.to(device)
        outputs = model.generate(**inputs)
        output = tokenizer.decode(outputs[0], skip_special_tokens=True)

        print(f"Using model {model_name}")
        print(f"Input '{language_input}'")
        print(f"Expected '{expected_output}'")
        print(f"Translated to '{output}'\n")

translate()

---

Torch version: 2.11.0+cu130
CUDA Detected: 13.0
CUDA Available: True
Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-es-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello there.'

---

Torch version: 2.11.0+cu130
CUDA Detected: 13.0
CUDA Available: True
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 256/256 [00:00<00:00, 10362.90it/s]
Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.0/44.0 [00:00<00:00, 319kB/s]
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
source.spm: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 826k/826k [00:00<00:00, 7.20MB/s]
target.spm: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 802k/802k [00:00<00:00, 46.0MB/s]
vocab.json: 1.59MB [00:00, 134MB/s]
config.json: 1.44kB [00:00, 7.95MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 312M/312M [00:10<00:00, 30.0MB/s]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 258/258 [00:00<00:00, 33026.02it/s]
model.safetensors:   0%|                                                                                                                                             | 0.00/312M [00:00<?, ?B/s]The tied weights mapping and config for this model specifies to tie model.shared.weight to model.decoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
The tied weights mapping and config for this model specifies to tie model.shared.weight to model.encoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 293/293 [00:00<00:00, 2.68MB/s]
Using model Helsinki-NLP/opus-mt-es-en                                                                                                                                | 0.00/293 [00:00<?, ?B/s]
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 257/257 [00:00<00:00, 20577.98it/s]
model.safetensors:   0%|                                                                                                                                             | 0.00/312M [00:02<?, ?B/s]The tied weights mapping and config for this model specifies to tie model.shared.weight to lm_head.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
The tied weights mapping and config for this model specifies to tie model.shared.weight to model.decoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
The tied weights mapping and config for this model specifies to tie model.shared.weight to model.encoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
model.safetensors:   0%|                                                                                                                                             | 0.00/312M [00:04<?, ?B/s]Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to '[                                          Saul Peruvian Woo- Nu Buy- Lac- fate  (  (--- (- (- (  --  - ( -- ( --- (  (  (  ( --------- formulations ( formulations ( formulations ( ( ( (  (  (  (  (  (  (  (  (  (  (  ( Johannesburg  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ('

---

Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 258/258 [00:00<00:00, 30717.04it/s]
Using model Helsinki-NLP/opus-mt-es-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 257/257 [00:00<00:00, 30016.04it/s]
The tied weights mapping and config for this model specifies to tie model.shared.weight to lm_head.weight, but both are present in the checkpoints with different values, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning.
Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to '[                                          Saul Peruvian Woo- Nu Buy- Lac- fate  (  (--- (- (- (  --  - ( -- ( --- (  (  (  ( --------- formulations ( formulations ( formulations ( ( ( (  (  (  (  (  (  (  (  (  (  (  ( Johannesburg  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ('

---

Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-es-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello there.'
RAW_BUFFERClick to expand / collapse

System Info

Not working:

  • transformers version: 5.3.0.dev0
  • Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
  • Python version: 3.14.2
  • Huggingface_hub version: 1.8.0
  • Safetensors version: 0.7.0
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.11.0+cu130 (CUDA)
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: yes
  • GPU type: NVIDIA RTX 2000 Ada Generation Laptop GPU

Working:

  • transformers version: 4.57.6
  • Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
  • Python version: 3.14.2
  • Huggingface_hub version: 0.36.2
  • Safetensors version: 0.7.0
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.11.0+cu130 (CUDA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: yes
  • GPU type: NVIDIA RTX 2000 Ada Generation Laptop GPU

Note: additionally reproduced with different systems/gpus. Same result with CPU processing

Who can help?

@ArthurZucker @Cyrilvallez

I suspect this could be related to some of what was going on in #44466.

I'm seeing some models having issues when loaded with v5, but they work fine with 4.57.6. My gut feeling is there is something going on with the tied weights or how the weights are being loaded, but I'm not familiar enough with things to track it down. For some models I'm seeing nonsensical output when trying to translate with either gpu or cpu with transformers v5 while everything works fine in 4.57.6. Actually setting tie_word_embeddings=False breaks things for the working models and doesn't change the output for the already broken model.

In the output below, you can see 5.3.0 raises many warnings, but most of those seem to be resolved in the latest commit.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Code to reproduce:


# %%
import torch
import typer
from transformers import MarianMTModel, MarianTokenizer

translations = [
    {
        "model": "Helsinki-NLP/opus-mt-fr-en",
        "input": "Bonjour",
        "expected": "Hello",
    },
    {
        "model": "Helsinki-NLP/opus-mt-es-en",
        "input": "Hola",
        "expected": "Hello",
    },
    {
        "model": "Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en",
        "input": "Hola",
        "expected": "Hello",
    },
]

def translate():

    print(f"Torch version: {torch.__version__}")
    print(f"CUDA Detected: {torch.version.cuda}")
    print(f"CUDA Available: {torch.cuda.is_available()}")

    for item in translations:
        model_name = item["model"]
        language_input = item["input"]
        expected_output = item["expected"]

        device = "cuda:0" if torch.cuda.is_available() else "cpu"
        tokenizer = MarianTokenizer.from_pretrained(model_name)
        model = MarianMTModel.from_pretrained(model_name)

        inputs = tokenizer(language_input, return_tensors="pt", padding=True).to(device)
        model = model.to(device)
        outputs = model.generate(**inputs)
        output = tokenizer.decode(outputs[0], skip_special_tokens=True)

        print(f"Using model {model_name}")
        print(f"Input '{language_input}'")
        print(f"Expected '{expected_output}'")
        print(f"Translated to '{output}'\n")

translate()

Results with 4.57.6:

Torch version: 2.11.0+cu130
CUDA Detected: 13.0
CUDA Available: True
Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-es-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello there.'

Results with 5.3.0:

Torch version: 2.11.0+cu130
CUDA Detected: 13.0
CUDA Available: True
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 256/256 [00:00<00:00, 10362.90it/s]
Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.0/44.0 [00:00<00:00, 319kB/s]
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
source.spm: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 826k/826k [00:00<00:00, 7.20MB/s]
target.spm: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 802k/802k [00:00<00:00, 46.0MB/s]
vocab.json: 1.59MB [00:00, 134MB/s]
config.json: 1.44kB [00:00, 7.95MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 312M/312M [00:10<00:00, 30.0MB/s]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 258/258 [00:00<00:00, 33026.02it/s]
model.safetensors:   0%|                                                                                                                                             | 0.00/312M [00:00<?, ?B/s]The tied weights mapping and config for this model specifies to tie model.shared.weight to model.decoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
The tied weights mapping and config for this model specifies to tie model.shared.weight to model.encoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 293/293 [00:00<00:00, 2.68MB/s]
Using model Helsinki-NLP/opus-mt-es-en                                                                                                                                | 0.00/293 [00:00<?, ?B/s]
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 257/257 [00:00<00:00, 20577.98it/s]
model.safetensors:   0%|                                                                                                                                             | 0.00/312M [00:02<?, ?B/s]The tied weights mapping and config for this model specifies to tie model.shared.weight to lm_head.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
The tied weights mapping and config for this model specifies to tie model.shared.weight to model.decoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
The tied weights mapping and config for this model specifies to tie model.shared.weight to model.encoder.embed_tokens.weight, but both are present in the checkpoints, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning
model.safetensors:   0%|                                                                                                                                             | 0.00/312M [00:04<?, ?B/s]Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to '[                                          Saul Peruvian Woo- Nu Buy- Lac- fate  (  (--- (- (- (  --  - ( -- ( --- (  (  (  ( --------- formulations ( formulations ( formulations ( ( ( (  (  (  (  (  (  (  (  (  (  (  ( Johannesburg  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ('

Results with latest git+https://github.com/huggingface/transformers.git@35b005bba4de1d4b3c3789451adb5cf7469b1522 :

Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 258/258 [00:00<00:00, 30717.04it/s]
Using model Helsinki-NLP/opus-mt-es-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 257/257 [00:00<00:00, 30016.04it/s]
The tied weights mapping and config for this model specifies to tie model.shared.weight to lm_head.weight, but both are present in the checkpoints with different values, so we will NOT tie them. You should update the config with `tie_word_embeddings=False` to silence this warning.
Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to '[                                          Saul Peruvian Woo- Nu Buy- Lac- fate  (  (--- (- (- (  --  - ( -- ( --- (  (  (  ( --------- formulations ( formulations ( formulations ( ( ( (  (  (  (  (  (  (  (  (  (  (  ( Johannesburg  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  (  ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (  ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ('

Expected behavior

It's expected that all three models load and output a meaningful translation without errors. See the results when run with 4.57.6 (reproduced from above)

Results with 4.57.6:

Using model Helsinki-NLP/opus-mt-fr-en
Input 'Bonjour'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-es-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello.'

Using model Helsinki-NLP/opus-mt-tc-big-cat_oci_spa-en
Input 'Hola'
Expected 'Hello'
Translated to 'Hello there.'

extent analysis

Fix Plan

To fix the issue, we need to update the model configuration to silence the warnings and ensure correct loading of the model weights.

  • Update the model configuration with tie_word_embeddings=False to prevent tying of weights.
  • Ensure that the model weights are loaded correctly by checking the model's configuration and weights.

Here's an example of how to update the model configuration:

from transformers import MarianMTModel, MarianTokenizer

# ...

for item in translations:
    model_name = item["model"]
    language_input = item["input"]
    expected_output = item["expected"]

    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name, tie_word_embeddings=False)  # Update model config

    inputs = tokenizer(language_input, return_tensors="pt", padding=True).to(device)
    model = model.to(device)
    outputs = model.generate(**inputs)
    output = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # ...

Verification

To verify that the fix worked, run the updated code and check the translation outputs. The outputs should be meaningful and match the expected results.

Extra Tips

  • Ensure that the model weights are compatible with the updated configuration.
  • If issues persist, try updating the transformers library to the latest version or checking the model's documentation for specific configuration requirements.
  • Be cautious when updating model configurations, as it may affect the model's performance or behavior. Always test and verify the results after making changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

It's expected that all three models load and output a meaningful translation without errors. See the results when run with 4.57.6 (reproduced from above)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING