vllm - ✅(Solved) Fix Tokenizer error with Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated model - TokenizersBackend not found [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38024Fetched 2026-04-08 01:27:00
View on GitHub
Comments
3
Participants
3
Timeline
10
Reactions
0
Timeline (top)
referenced ×4commented ×3cross-referenced ×1subscribed ×1

Error Message

ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Fix Action

Fix / Workaround

The error suggests that the model uses a custom tokenizer class TokenizersBackend that is not recognized by vLLM. Is this a known issue? Are there any workarounds or fixes available?

PR fix notes

PR #38099: [Bugfix] Add fallback for TokenizersBackend tokenizer class

Description (problem / solution / changelog)

Summary

Fixes https://github.com/vllm-project/vllm/issues/38024

Models using transformers 5.x may set "tokenizer_class": "TokenizersBackend" in tokenizer_config.json. When loaded with older transformers, this causes:

ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

This error cannot be resolved by --trust-remote-code since TokenizersBackend is a built-in transformers backend class, not a custom tokenizer.

This PR adds:

  • An allowlist (_FAST_TOKENIZER_FALLBACK_CLASSES) of known forward-compatible tokenizer class names
  • A helper to read tokenizer_class from tokenizer_config.json
  • Fallback logic in CachedHfTokenizer.from_pretrained() that transparently loads via PreTrainedTokenizerFast when the class is in the allowlist
  • A warning logged suggesting users upgrade transformers
  • 10 unit tests covering all branches (fallback success, fallback failure, unknown class, unrelated errors, etc.)

Design decisions:

  • Allowlist-based (not a catch-all fallback) — prevents misfiring on models that genuinely need custom tokenizers
  • Triggers regardless of trust_remote_code — because trust_remote_code cannot resolve this specific issue
  • Precedent: vllm/tokenizers/deepseek_v32.py:88 already uses PreTrainedTokenizerFast.from_pretrained() for a similar purpose

Before / After

<details> <summary>Before fix (error)</summary>
======================================================================
BEFORE FIX: Simulating AutoTokenizer.from_pretrained() with
tokenizer_config.json containing tokenizer_class=TokenizersBackend
======================================================================
RuntimeError: Failed to load the tokenizer. If the tokenizer is a custom
tokenizer not yet available in the HuggingFace transformers library, consider
setting `trust_remote_code=True` in LLM or using the `--trust-remote-code`
flag in the CLI.

>>> Users are told to use --trust-remote-code,
>>> but that does NOT resolve this issue.
</details> <details> <summary>After fix (graceful fallback)</summary>
======================================================================
AFTER FIX: Simulating AutoTokenizer.from_pretrained() with
tokenizer_config.json containing tokenizer_class=TokenizersBackend
======================================================================
WARNING 03-25 19:00:21 [hf.py:172] Tokenizer class 'TokenizersBackend' is not
available in the installed version of transformers. Falling back to
PreTrainedTokenizerFast. Consider upgrading: `pip install --upgrade transformers`
SUCCESS: Tokenizer loaded via PreTrainedTokenizerFast fallback
  type    = MagicMock
  vocab   = 2
  special = ['<s>']

>>> Model loads successfully with graceful fallback.
>>> A warning is logged suggesting: pip install --upgrade transformers
</details>

Test plan

pytest tests/tokenizers_/test_hf.py -v -k "not test_cached_tokenizer"
<details> <summary>Test results (10 passed)</summary>
tests/tokenizers_/test_hf.py::TestFastTokenizerFallbackHelpers::test_fallback_classes_is_frozenset PASSED [ 10%]
tests/tokenizers_/test_hf.py::TestFastTokenizerFallbackHelpers::test_fallback_possible_when_class_in_allowlist PASSED [ 20%]
tests/tokenizers_/test_hf.py::TestFastTokenizerFallbackHelpers::test_fallback_not_possible_for_unknown_class PASSED [ 30%]
tests/tokenizers_/test_hf.py::TestFastTokenizerFallbackHelpers::test_fallback_not_possible_when_config_unreadable PASSED [ 40%]
tests/tokenizers_/test_hf.py::TestTokenizersBackendFallback::test_falls_back_to_pretrained_tokenizer_fast PASSED [ 50%]
tests/tokenizers_/test_hf.py::TestTokenizersBackendFallback::test_raises_runtime_error_for_unknown_class PASSED [ 60%]
tests/tokenizers_/test_hf.py::TestTokenizersBackendFallback::test_unrelated_valueerror_is_reraised PASSED [ 70%]
tests/tokenizers_/test_hf.py::TestTokenizersBackendFallback::test_fallback_works_even_with_trust_remote_code PASSED [ 80%]
tests/tokenizers_/test_hf.py::TestTokenizersBackendFallback::test_fallback_triggers_on_execute_tokenizer_file_error PASSED [ 90%]
tests/tokenizers_/test_hf.py::TestTokenizersBackendFallback::test_fallback_failure_raises_runtime_error PASSED [100%]

================= 10 passed, 2 deselected, 2 warnings in 2.09s =================
</details>

Pre-commit: All hooks passed (ruff, mypy, typos, SPDX headers, etc.)

Notes

  • This PR is NOT a duplicate — no existing open PR addresses #38024
  • AI assistance was used (Claude Code) for implementation and testing
  • This is a defensive fix; upgrading transformers remains the recommended long-term solution

🤖 Generated with Claude Code

Changed files

  • vllm/tokenizers/hf.py (modified, +4/-1)

Code Example

ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

---

sudo docker run -d --name vllm-huihui-qwen35-claude \
  --runtime=nvidia \
  -e CUDA_VISIBLE_DEVICES=4,5,6,7 \
  -v /data:/data \
  -p 8002:8000 \
  --ipc=host \
  vllm/vllm-openai:v0.17.1 \
  python3 -m vllm.entrypoints.openai.api_server \
  --model /data/models/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated \
  --tensor-parallel-size 4 \
  --max-model-len 100000 \
  --gpu-memory-utilization 0.85 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_xml \
  --dtype bfloat16 \
  --kv-cache-dtype fp8 \
  --max-num-seqs 10 \
  --host 0.0.0.0 \
  --port 8000
RAW_BUFFERClick to expand / collapse

Issue Description

When trying to run the model Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated with vLLM, I encounter a tokenizer error.

Model Information

  • Model Name: Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated
  • Model Architecture: Qwen3_5MoeForConditionalGeneration (MoE)
  • Parameters: 35B total, 256 experts, 8 experts per token
  • Quantization: AWQ 4-bit
  • Location: /data/models/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated

Error Message

ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Environment

  • vLLM Version: 0.17.1 and nightly
  • CUDA: Available
  • GPU: 4x GPUs (tensor-parallel-size=4)
  • Python: 3.x

Command Used

sudo docker run -d --name vllm-huihui-qwen35-claude \
  --runtime=nvidia \
  -e CUDA_VISIBLE_DEVICES=4,5,6,7 \
  -v /data:/data \
  -p 8002:8000 \
  --ipc=host \
  vllm/vllm-openai:v0.17.1 \
  python3 -m vllm.entrypoints.openai.api_server \
  --model /data/models/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated \
  --tensor-parallel-size 4 \
  --max-model-len 100000 \
  --gpu-memory-utilization 0.85 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_xml \
  --dtype bfloat16 \
  --kv-cache-dtype fp8 \
  --max-num-seqs 10 \
  --host 0.0.0.0 \
  --port 8000

Model Tokenizer Files

The model contains:

  • tokenizer.json (19 MB)
  • tokenizer_config.json (5.3 KB)

Tried Solutions

  • vLLM v0.17.1 - ❌ Error persists
  • vLLM nightly - ❌ Error persists
  • --tokenizer-mode auto - ❌ Error persists
  • --trust-remote-code - ❌ Error persists
  • --enforce-eager - ❌ Error persists

Question

The error suggests that the model uses a custom tokenizer class TokenizersBackend that is not recognized by vLLM. Is this a known issue? Are there any workarounds or fixes available?

Logs

Full logs are available in the attached file or can be reproduced by running the command above.

Thank you!

extent analysis

Fix Plan

To resolve the ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported error, follow these steps:

  • Ensure the TokenizersBackend class is correctly defined and imported in your code.
  • If using a custom tokenizer, verify that it is properly registered with the vllm library.
  • Try updating the vllm library to the latest version, as this issue may have been resolved in a recent update.

Here's an example of how to register a custom tokenizer:

import vllm

# Define the custom tokenizer class
class TokenizersBackend:
    # Tokenizer implementation

# Register the custom tokenizer
vllm.tokenization.register_tokenizer("TokenizersBackend", TokenizersBackend)

Alternatively, you can try specifying the tokenizer class explicitly when running the vllm command:

sudo docker run -d --name vllm-huihui-qwen35-claude \
  --runtime=nvidia \
  -e CUDA_VISIBLE_DEVICES=4,5,6,7 \
  -v /data:/data \
  -p 8002:8000 \
  --ipc=host \
  vllm/vllm-openai:v0.17.1 \
  python3 -m vllm.entrypoints.openai.api_server \
  --model /data/models/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated \
  --tensor-parallel-size 4 \
  --max-model-len 100000 \
  --gpu-memory-utilization 0.85 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_xml \
  --dtype bfloat16 \
  --kv-cache-dtype fp8 \
  --max-num-seqs 10 \
  --host 0.0.0.0 \
  --port 8000 \
  --tokenizer-class TokenizersBackend

Verification

To verify that the fix worked, run the vllm command again and check for any error messages related to the tokenizer. If the issue persists, try checking the logs for more detailed error information.

Extra Tips

  • Ensure that the custom tokenizer class is correctly implemented and compatible with the vllm library.
  • If using a custom tokenizer, consider submitting a pull request to the vllm repository to add support for your tokenizer.
  • Keep your vllm library up to date to ensure you have the latest features and bug fixes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING