vllm - ✅(Solved) Fix [Bug]: Jamba tool parser crashes on Mistral-style [TOOL_CALLS] models with standard HF tokenizer (e.g., Apriel-Nemotron-15b) [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38674Fetched 2026-04-08 01:58:37
View on GitHub
Comments
3
Participants
3
Timeline
8
Reactions
0
Timeline (top)
commented ×3cross-referenced ×1labeled ×1mentioned ×1

Error Message

Actual result — 500 error: "error": { INFO: 127.0.0.1:55058 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Root Cause

The jamba parser crashes because <tool_calls> is not in the vocab The mistral parser is not a fit because it expects MistralTokenizer (tekken/sentencepiece) The parser hardcodes the old format in init:

PR fix notes

PR #38695: [Bugfix] Support [TOOL_CALLS] single-token format in Jamba tool parser

Description (problem / solution / changelog)

The Jamba tool parser hardcodes <tool_calls>/</tool_calls> XML tags, crashing on models that use a [TOOL_CALLS] single token (e.g., Apriel-Nemotron-15b-Thinker). Auto-detect the format from tokenizer vocabulary.

Fixes #38674

Purpose

The jamba tool call parser crashes with RuntimeError on the first tool call request when used with models that have [TOOL_CALLS] in their tokenizer vocabulary instead of <tool_calls>/</tool_calls> XML tags (e.g., ServiceNow-AI/Apriel-Nemotron-15b-Thinker).

Changes

  • vllm/tool_parsers/jamba_tool_parser.py: Replace hardcoded <tool_calls>/</tool_calls> token setup with vocabulary-based auto-detection. Check for [TOOL_CALLS] first (single-token format), fall back to <tool_calls>/</tool_calls> (tagged format), raise RuntimeError if neither found. Update extract_tool_calls to guard against empty regex matches. Update extract_tool_calls_streaming to branch extraction based on detected format.

  • tests/tool_parsers/test_jamba_tool_parser.py: Add ServiceNow-AI/Apriel-Nemotron-15b-Thinker as a second test model for the single-token format. Add test proving the tagged parser ignores single-token output. Add parametrized extraction and streaming tests for single-token format including array-in-arguments edge case.

  • docs/features/tool_calling.md: Update Jamba section to document both supported formats and add Apriel-Nemotron-15b-Thinker to the supported models list.

Test Plan

pytest tests/tool_parsers/test_jamba_tool_parser.py -v
vllm serve ServiceNow-AI/Apriel-Nemotron-15b-Thinker \
    --enable-auto-tool-choice --tool-call-parser jamba --max-model-len 4096
curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"ServiceNow-AI/Apriel-Nemotron-15b-Thinker", \
       "messages":[{"role":"user","content":"What is the weather in SF?"}], \
       "tools":[{"type":"function","function":{"name":"get_weather", \
       "description":"Get weather","parameters":{"type":"object", \
       "properties":{"location":{"type":"string"}},"required":["location"]}}}], \
       "tool_choice":"required","max_tokens":200}'

Test Result

Unit tests: 16 passed (8 tagged format with Jamba-tiny-dev, 8 single-token format with Apriel-Nemotron-15b-Thinker)

TestResult
Existing tagged format tests (extraction + streaming)8 passed, no regression
Tagged parser ignores single-token outputPassed
Single-token extraction (single, with content, parallel, array args)4 passed
Single-token streaming (no tools, single, with content)3 passed

Changed files

  • docs/features/tool_calling.md (modified, +4/-1)
  • tests/tool_parsers/test_jamba_tool_parser.py (modified, +210/-1)
  • vllm/tool_parsers/jamba_tool_parser.py (modified, +48/-21)

Code Example

vllm serve ServiceNow-AI/Apriel-Nemotron-15b-Thinker \
    --enable-auto-tool-choice \
    --tool-call-parser jamba \
    --max-model-len 4096 \
    --port 8002

---

curl -s http://localhost:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ServiceNow-AI/Apriel-Nemotron-15b-Thinker",
    "messages": [{"role": "user", "content": "What is the weather in San Francisco? Use the get_weather tool."}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
    "tool_choice": "required",
    "max_tokens": 200
  }'

---

{
    "error": {
        "message": "Jamba Tool parser could not locate tool calls start/end tokens in the tokenizer!",
        "type": "InternalServerError",
        "param": null,
        "code": 500
    }
}

---

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "chatcmpl-tool-919478de650130b6",
                        "type": "function",
                        "function": {
                            "name": "get_weather",
                            "arguments": "{\"location\": \"San Francisco\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ]
}
RAW_BUFFERClick to expand / collapse

Your current environment

vLLM version: v0.18.0 PyTorch version: 2.10.0+cu128 Python version: 3.12 GPU: NVIDIA A100-SXM4-80GB OS: Ubuntu 22.04

🐛 Describe the bug

Model: ServiceNow-AI/Apriel-Nemotron-15b-Thinker (HuggingFace)

The jamba tool call parser hardcodes <tool_calls> / </tool_calls> XML tags as the expected tool call delimiters. Models that use the Mistral-style [TOOL_CALLS] single-token format with a standard HuggingFace tokenizer (not MistralTokenizer) cannot use any built-in tool parser:

The jamba parser crashes because <tool_calls> is not in the vocab The mistral parser is not a fit because it expects MistralTokenizer (tekken/sentencepiece) The parser hardcodes the old format in init:

vllm/tool_parsers/jamba_tool_parser.py

self.tool_calls_start_token: str = "<tool_calls>" self.tool_calls_end_token: str = "</tool_calls>" ... self.tool_calls_start_token_id = self.vocab.get(self.tool_calls_start_token) self.tool_calls_end_token_id = self.vocab.get(self.tool_calls_end_token) if self.tool_calls_start_token_id is None or self.tool_calls_end_token_id is None: raise RuntimeError( "Jamba Tool parser could not locate tool calls start/end " "tokens in the tokenizer!" )

How to reproduce Start the server:

vllm serve ServiceNow-AI/Apriel-Nemotron-15b-Thinker \
    --enable-auto-tool-choice \
    --tool-call-parser jamba \
    --max-model-len 4096 \
    --port 8002

Server starts and loads the model successfully. The parser crash happens on the first tool call request (lazy init).

Send a tool call request:

curl -s http://localhost:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ServiceNow-AI/Apriel-Nemotron-15b-Thinker",
    "messages": [{"role": "user", "content": "What is the weather in San Francisco? Use the get_weather tool."}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
    "tool_choice": "required",
    "max_tokens": 200
  }'

Actual result — 500 error:


{
    "error": {
        "message": "Jamba Tool parser could not locate tool calls start/end tokens in the tokenizer!",
        "type": "InternalServerError",
        "param": null,
        "code": 500
    }
}

Server log confirms:

INFO: 127.0.0.1:55058 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error RuntimeError: Jamba Tool parser could not locate tool calls start/end tokens in the tokenizer! Expected result — 200 with tool calls:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "chatcmpl-tool-919478de650130b6",
                        "type": "function",
                        "function": {
                            "name": "get_weather",
                            "arguments": "{\"location\": \"San Francisco\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ]
}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The Jamba tool parser needs to be modified to support the Mistral-style [TOOL_CALLS] single-token format for models using standard HuggingFace tokenizers.

Guidance

  • The issue arises from the hardcoded XML tags in the Jamba tool parser, which are not compatible with the Mistral-style format used by the ServiceNow-AI/Apriel-Nemotron-15b-Thinker model.
  • To fix this, the jamba_tool_parser.py file needs to be updated to support the Mistral-style format, potentially by adding an option to use a different token format.
  • The self.tool_calls_start_token and self.tool_calls_end_token variables could be modified to support the [TOOL_CALLS] token, or an additional parser could be created to handle this format.
  • Before making changes, verify that the model and tokenizer are correctly configured and that the issue is indeed caused by the hardcoded XML tags.

Example

# Modified jamba_tool_parser.py to support Mistral-style format
self.tool_calls_start_token: str = "[TOOL_CALLS]"
self.tool_calls_end_token: str = "[/TOOL_CALLS]"

Note: This is a simplified example and may require additional modifications to work correctly.

Notes

  • The fix may require changes to the jamba_tool_parser.py file, which could potentially introduce compatibility issues with other models or tokenizers.
  • It is essential to thoroughly test the modified parser to ensure it works correctly with different models and tokenizers.

Recommendation

Apply workaround: Modify the jamba_tool_parser.py file to support the Mistral-style format, as this will allow the model to function correctly with the standard HuggingFace tokenizer.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING