transformers - ✅(Solved) Fix Cannot load local dataset with run_image_classification_no_trainer.py [1 pull requests, 6 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44190Fetched 2026-04-08 00:29:54
View on GitHub
Comments
6
Participants
4
Timeline
14
Reactions
0
Author
Timeline (top)
commented ×6mentioned ×2subscribed ×2closed ×1

Fix Action

Fix / Workaround

Result

The model is incorrectly trained on cifar10 even though a custom dataset was specified with --train_dir. The 'config.json' file created in the output directory lists 10 classes, confirming this issue:

{
  "architectures": [
    "ViTForImageClassification"
  ],
  "attention_probs_dropout_prob": 0.0,
  "dtype": "float32",
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "id2label": {
    "0": "airplane",
    "1": "automobile",
    "2": "bird",
    "3": "cat",
    "4": "deer",
    "5": "dog",
    "6": "frog",
    "7": "horse",
    "8": "ship",
    "9": "truck"
  },
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "airplane": "0",
    "automobile": "1",
    "bird": "2",
    "cat": "3",
    "deer": "4",
    "dog": "5",
    "frog": "6",
    "horse": "7",
    "ship": "8",
    "truck": "9"
  },
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channels": 3,
  "num_hidden_layers": 12,
  "patch_size": 16,
  "pooler_act": "tanh",
  "pooler_output_size": 768,
  "problem_type": "single_label_classification",
  "qkv_bias": true,
  "transformers_version": "5.2.0"
}

PR fix notes

PR #44199: Fix local dataset loading priority in run_image_classification_no_tra…

Description (problem / solution / changelog)

What does this PR do?

This PR fixes an issue in run_image_classification_no_trainer.py where the script always loaded dataset_name (e.g., CIFAR10) even when --train_dir or --validation_dir was provided.

Now, when local dataset directories are specified, the script prioritizes loading the local imagefolder dataset instead of falling back to dataset_name.

Reproduction

Before fix: Running with --train_dir still loaded CIFAR10 (10 classes).

After fix: Running with a local beans dataset correctly loads 3 classes:

  • angular_leaf_spot
  • bean_rust
  • healthy

Example:

accelerate launch run_image_classification_no_trainer.py
--train_dir ./beans/train
--validation_dir ./beans/validation

Fixes #44190

Changed files

  • examples/pytorch/image-classification/run_image_classification_no_trainer.py (modified, +3/-2)

Code Example

pip install git+https://github.com/huggingface/accelerate
    accelerate config
    accelerate test

---

accelerate launch run_image_classification_no_trainer.py --image_column_name img --output_dir ./default_model --train_dir ./train`

---

{
  "architectures": [
    "ViTForImageClassification"
  ],
  "attention_probs_dropout_prob": 0.0,
  "dtype": "float32",
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "id2label": {
    "0": "airplane",
    "1": "automobile",
    "2": "bird",
    "3": "cat",
    "4": "deer",
    "5": "dog",
    "6": "frog",
    "7": "horse",
    "8": "ship",
    "9": "truck"
  },
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "airplane": "0",
    "automobile": "1",
    "bird": "2",
    "cat": "3",
    "deer": "4",
    "dog": "5",
    "frog": "6",
    "horse": "7",
    "ship": "8",
    "truck": "9"
  },
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channels": 3,
  "num_hidden_layers": 12,
  "patch_size": 16,
  "pooler_act": "tanh",
  "pooler_output_size": 768,
  "problem_type": "single_label_classification",
  "qkv_bias": true,
  "transformers_version": "5.2.0"
}
RAW_BUFFERClick to expand / collapse

System Info

  • Ubuntu 24.04.4 LTS
  • Python 3.12.3
  • PyTorch 2.10.0

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Save the official example script: run_image_classification_no_trainer.py
  2. Obtain a dataset with 3 classes, like AI-Lab-Makerere/beans, unzip it, and save it to the directory which contains the script. Resize all images to 224x224.
  3. Configure accelerate (same steps as the official docs)
    pip install git+https://github.com/huggingface/accelerate
    accelerate config
    accelerate test
  4. Run the script:
    accelerate launch run_image_classification_no_trainer.py --image_column_name img --output_dir ./default_model --train_dir ./train`
    Here, ./train is the root directory of the beans dataset.

Result

The model is incorrectly trained on cifar10 even though a custom dataset was specified with --train_dir. The 'config.json' file created in the output directory lists 10 classes, confirming this issue:

{
  "architectures": [
    "ViTForImageClassification"
  ],
  "attention_probs_dropout_prob": 0.0,
  "dtype": "float32",
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "id2label": {
    "0": "airplane",
    "1": "automobile",
    "2": "bird",
    "3": "cat",
    "4": "deer",
    "5": "dog",
    "6": "frog",
    "7": "horse",
    "8": "ship",
    "9": "truck"
  },
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "airplane": "0",
    "automobile": "1",
    "bird": "2",
    "cat": "3",
    "deer": "4",
    "dog": "5",
    "frog": "6",
    "horse": "7",
    "ship": "8",
    "truck": "9"
  },
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channels": 3,
  "num_hidden_layers": 12,
  "patch_size": 16,
  "pooler_act": "tanh",
  "pooler_output_size": 768,
  "problem_type": "single_label_classification",
  "qkv_bias": true,
  "transformers_version": "5.2.0"
}

Expected behavior

The model should be fine-tuned on the beans dataset, instead of falling back to CIFAR10.

extent analysis

Fix Plan

1. Update accelerate configuration

The issue is caused by accelerate not properly detecting the custom dataset. We need to update the accelerate configuration to use the custom dataset.

Step 1: Update accelerate configuration file

Create a new file accelerate.json in the root directory of your project with the following content:

{
  "default_dataset": "beans",
  "dataset_dir": "./train"
}

Step 2: Update run_image_classification_no_trainer.py script

Update the run_image_classification_no_trainer.py script to use the custom accelerate.json configuration file:

import json

with open('accelerate.json') as f:
    config = json.load(f)

accelerate.init(
    output_dir='./default_model',
    config=config,
    run_type='train',
    args=['--image_column_name', 'img']
)

2. Verify the fix

Run the script again with the updated configuration:

accelerate launch run_image_classification_no_trainer.py --image_column_name img --output_dir ./default_model --train_dir ./train

Verify that the model is fine-tuned on the beans dataset by checking the config.json file in the output directory. It should list only 3 classes corresponding to the beans dataset.

Extra Tips

  • Make sure to update the accelerate.json file with the correct dataset name and directory.
  • Verify that the run_image_classification_no_trainer.py script is updated correctly to use the custom accelerate.json configuration file.
  • If you're still facing issues, try updating the accelerate library to the latest version using pip install --upgrade accelerate.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The model should be fine-tuned on the beans dataset, instead of falling back to CIFAR10.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING