vllm - ✅(Solved) Fix [Bug]: Jamba tool parser crashes on Mistral-style [TOOL_CALLS] models with standard HF tokenizer (e.g., Apriel-Nemotron-15b) [1 pull requests, 3 comments, 3 participants]

vllm2026-04-01 01:37:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38674•Fetched 2026-04-08 01:58:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3cross-referenced ×1labeled ×1mentioned ×1

Error Message

Actual result — 500 error: "error": { INFO: 127.0.0.1:55058 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Root Cause

The jamba parser crashes because <tool_calls> is not in the vocab The mistral parser is not a fit because it expects MistralTokenizer (tekken/sentencepiece) The parser hardcodes the old format in init:

PR fix notes

PR #38695: [Bugfix] Support [TOOL_CALLS] single-token format in Jamba tool parser

Repository: vllm-project/vllm
Author: oromanenko-nv
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/38695

Description (problem / solution / changelog)

The Jamba tool parser hardcodes <tool_calls>/</tool_calls> XML tags, crashing on models that use a [TOOL_CALLS] single token (e.g., Apriel-Nemotron-15b-Thinker). Auto-detect the format from tokenizer vocabulary.

Fixes #38674

Purpose

The jamba tool call parser crashes with RuntimeError on the first tool call request when used with models that have [TOOL_CALLS] in their tokenizer vocabulary instead of <tool_calls>/</tool_calls> XML tags (e.g., ServiceNow-AI/Apriel-Nemotron-15b-Thinker).

Changes

vllm/tool_parsers/jamba_tool_parser.py: Replace hardcoded <tool_calls>/</tool_calls> token setup with vocabulary-based auto-detection. Check for [TOOL_CALLS] first (single-token format), fall back to <tool_calls>/</tool_calls> (tagged format), raise RuntimeError if neither found. Update extract_tool_calls to guard against empty regex matches. Update extract_tool_calls_streaming to branch extraction based on detected format.
tests/tool_parsers/test_jamba_tool_parser.py: Add ServiceNow-AI/Apriel-Nemotron-15b-Thinker as a second test model for the single-token format. Add test proving the tagged parser ignores single-token output. Add parametrized extraction and streaming tests for single-token format including array-in-arguments edge case.
docs/features/tool_calling.md: Update Jamba section to document both supported formats and add Apriel-Nemotron-15b-Thinker to the supported models list.

Test Plan

pytest tests/tool_parsers/test_jamba_tool_parser.py -v

vllm serve ServiceNow-AI/Apriel-Nemotron-15b-Thinker \
    --enable-auto-tool-choice --tool-call-parser jamba --max-model-len 4096

curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"ServiceNow-AI/Apriel-Nemotron-15b-Thinker", \
       "messages":[{"role":"user","content":"What is the weather in SF?"}], \
       "tools":[{"type":"function","function":{"name":"get_weather", \
       "description":"Get weather","parameters":{"type":"object", \
       "properties":{"location":{"type":"string"}},"required":["location"]}}}], \
       "tool_choice":"required","max_tokens":200}'

Test Result

Unit tests: 16 passed (8 tagged format with Jamba-tiny-dev, 8 single-token format with Apriel-Nemotron-15b-Thinker)

Test	Result
Existing tagged format tests (extraction + streaming)	8 passed, no regression
Tagged parser ignores single-token output	Passed
Single-token extraction (single, with content, parallel, array args)	4 passed
Single-token streaming (no tools, single, with content)	3 passed

Changed files

docs/features/tool_calling.md (modified, +4/-1)
tests/tool_parsers/test_jamba_tool_parser.py (modified, +210/-1)
vllm/tool_parsers/jamba_tool_parser.py (modified, +48/-21)

Code Example

vllm serve ServiceNow-AI/Apriel-Nemotron-15b-Thinker \
    --enable-auto-tool-choice \
    --tool-call-parser jamba \
    --max-model-len 4096 \
    --port 8002

---

curl -s http://localhost:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ServiceNow-AI/Apriel-Nemotron-15b-Thinker",
    "messages": [{"role": "user", "content": "What is the weather in San Francisco? Use the get_weather tool."}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
    "tool_choice": "required",
    "max_tokens": 200
  }'

---

{
    "error": {
        "message": "Jamba Tool parser could not locate tool calls start/end tokens in the tokenizer!",
        "type": "InternalServerError",
        "param": null,
        "code": 500
    }
}

---

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "chatcmpl-tool-919478de650130b6",
                        "type": "function",
                        "function": {
                            "name": "get_weather",
                            "arguments": "{\"location\": \"San Francisco\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ]
}

RAW_BUFFERClick to expand / collapse

Your current environment

vLLM version: v0.18.0 PyTorch version: 2.10.0+cu128 Python version: 3.12 GPU: NVIDIA A100-SXM4-80GB OS: Ubuntu 22.04

🐛 Describe the bug

Model: ServiceNow-AI/Apriel-Nemotron-15b-Thinker (HuggingFace)

The jamba tool call parser hardcodes <tool_calls> / </tool_calls> XML tags as the expected tool call delimiters. Models that use the Mistral-style [TOOL_CALLS] single-token format with a standard HuggingFace tokenizer (not MistralTokenizer) cannot use any built-in tool parser:

vllm/tool_parsers/jamba_tool_parser.py

self.tool_calls_start_token: str = "<tool_calls>" self.tool_calls_end_token: str = "</tool_calls>" ... self.tool_calls_start_token_id = self.vocab.get(self.tool_calls_start_token) self.tool_calls_end_token_id = self.vocab.get(self.tool_calls_end_token) if self.tool_calls_start_token_id is None or self.tool_calls_end_token_id is None: raise RuntimeError( "Jamba Tool parser could not locate tool calls start/end " "tokens in the tokenizer!" )

How to reproduce Start the server:

vllm serve ServiceNow-AI/Apriel-Nemotron-15b-Thinker \
    --enable-auto-tool-choice \
    --tool-call-parser jamba \
    --max-model-len 4096 \
    --port 8002

Server starts and loads the model successfully. The parser crash happens on the first tool call request (lazy init).

Send a tool call request:

curl -s http://localhost:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ServiceNow-AI/Apriel-Nemotron-15b-Thinker",
    "messages": [{"role": "user", "content": "What is the weather in San Francisco? Use the get_weather tool."}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
    "tool_choice": "required",
    "max_tokens": 200
  }'

Actual result — 500 error:


{
    "error": {
        "message": "Jamba Tool parser could not locate tool calls start/end tokens in the tokenizer!",
        "type": "InternalServerError",
        "param": null,
        "code": 500
    }
}

Server log confirms:

INFO: 127.0.0.1:55058 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error RuntimeError: Jamba Tool parser could not locate tool calls start/end tokens in the tokenizer! Expected result — 200 with tool calls:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "chatcmpl-tool-919478de650130b6",
                        "type": "function",
                        "function": {
                            "name": "get_weather",
                            "arguments": "{\"location\": \"San Francisco\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ]
}

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The Jamba tool parser needs to be modified to support the Mistral-style [TOOL_CALLS] single-token format for models using standard HuggingFace tokenizers.

Guidance

The issue arises from the hardcoded XML tags in the Jamba tool parser, which are not compatible with the Mistral-style format used by the ServiceNow-AI/Apriel-Nemotron-15b-Thinker model.
To fix this, the jamba_tool_parser.py file needs to be updated to support the Mistral-style format, potentially by adding an option to use a different token format.
The self.tool_calls_start_token and self.tool_calls_end_token variables could be modified to support the [TOOL_CALLS] token, or an additional parser could be created to handle this format.
Before making changes, verify that the model and tokenizer are correctly configured and that the issue is indeed caused by the hardcoded XML tags.

Example

# Modified jamba_tool_parser.py to support Mistral-style format
self.tool_calls_start_token: str = "[TOOL_CALLS]"
self.tool_calls_end_token: str = "[/TOOL_CALLS]"

Note: This is a simplified example and may require additional modifications to work correctly.

Notes

The fix may require changes to the jamba_tool_parser.py file, which could potentially introduce compatibility issues with other models or tokenizers.
It is essential to thoroughly test the modified parser to ensure it works correctly with different models and tokenizers.

Recommendation

Apply workaround: Modify the jamba_tool_parser.py file to support the Mistral-style format, as this will allow the model to function correctly with the standard HuggingFace tokenizer.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tokenizer error #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: Jamba tool parser crashes on Mistral-style [TOOL_CALLS] models with standard HF tokenizer (e.g., Apriel-Nemotron-15b) [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #38695: [Bugfix] Support [TOOL_CALLS] single-token format in Jamba tool parser

Description (problem / solution / changelog)

Purpose

Changes

Test Plan

Test Result

Changed files

Code Example

Your current environment

🐛 Describe the bug

vllm/tool_parsers/jamba_tool_parser.py

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: Jamba tool parser crashes on Mistral-style [TOOL_CALLS] models with standard HF tokenizer (e.g., Apriel-Nemotron-15b) [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #38695: [Bugfix] Support [TOOL_CALLS] single-token format in Jamba tool parser

Description (problem / solution / changelog)

Purpose

Changes

Test Plan

Test Result

Changed files

Code Example

Your current environment

🐛 Describe the bug

vllm/tool_parsers/jamba_tool_parser.py

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING