vllm - 💡(How to fix) Fix [Bug]: Granite 3.3 / 4.0 H-Small Python-style tool calls not converted to OpenAI tool_calls format [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#43116Fetched 2026-05-20 03:39:44
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Root Cause

Granite 3.3 and Granite 4.0 H-Small use a Python-style tool call format natively:

get_weather(location="San Francisco", unit="celsius")
search_web(query="vLLM release notes")

The existing Granite4ToolParser (registered as granite4) handles the XML <tool_call> format used by Granite 4.0 Tiny/Base — it does not handle the Python-style output of Granite 3.3 or H-Small.

Passing --tool-call-parser granite4 to these models has no effect since the <tool_call> tokens never appear in their output.


Fix Action

Fix / Workaround

Workaround (Until PR is Merged)

Code Example

get_weather(location="San Francisco", unit="celsius")
search_web(query="vLLM release notes")

---

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-call-parser granite4 \
  --chat-template examples/tool_chat_template_granite.jinja

---

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="x")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="ibm-granite/granite-3.3-8b-instruct",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=tools,
    tool_choice="auto",
)

print(response.choices[0].message.tool_calls)   # ❌ None
print(response.choices[0].message.content)      # ❌ 'get_weather(location="San Francisco")'

---

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
        }
      }]
    }
  }]
}

---

get_weather(location="San Francisco", unit="celsius")

---

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-parser-plugin ./granite_pythonic_tool_parser.py \
  --tool-call-parser granite_pythonic
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When using ibm-granite/granite-3.3-8b-instruct or ibm-granite/granite-4.0-h-small with vLLM's OpenAI-compatible server, the model generates Python-style function calls as plain text in the content field instead of populating the tool_calls array in the OpenAI format.

This breaks all agent frameworks and clients that rely on the OpenAI tool-calling protocol.


Root Cause

Granite 3.3 and Granite 4.0 H-Small use a Python-style tool call format natively:

get_weather(location="San Francisco", unit="celsius")
search_web(query="vLLM release notes")

The existing Granite4ToolParser (registered as granite4) handles the XML <tool_call> format used by Granite 4.0 Tiny/Base — it does not handle the Python-style output of Granite 3.3 or H-Small.

Passing --tool-call-parser granite4 to these models has no effect since the <tool_call> tokens never appear in their output.


Steps to Reproduce

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-call-parser granite4 \
  --chat-template examples/tool_chat_template_granite.jinja
import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="x")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="ibm-granite/granite-3.3-8b-instruct",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=tools,
    tool_choice="auto",
)

print(response.choices[0].message.tool_calls)   # ❌ None
print(response.choices[0].message.content)      # ❌ 'get_weather(location="San Francisco")'

Expected Behavior

tool_calls should be populated:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
        }
      }]
    }
  }]
}

Actual Behavior

tool_calls is None. The raw Python call appears in content:

get_weather(location="San Francisco", unit="celsius")

Environment

  • vLLM version: v0.20.x / v0.21.x (confirmed on both)
  • Models affected: ibm-granite/granite-3.3-8b-instruct, ibm-granite/granite-4.0-h-small
  • Models NOT affected: ibm-granite/granite-4.0-tiny-preview (uses XML format, handled by granite4)

Fix / PR

A new parser GranitePythonicToolParser (registered as --tool-call-parser granite_pythonic) has been implemented and submitted in:

➡️ PR #43113[Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls

The parser:

  • Uses ast.parse (no eval) to safely extract keyword arguments
  • Supports both batch and streaming modes
  • Is tokenizer-agnostic (no special tokens required)
  • Converts Python-style calls → OpenAI tool_calls format

Workaround (Until PR is Merged)

Copy granite_pythonic_tool_parser.py locally and load it via --tool-parser-plugin:

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-parser-plugin ./granite_pythonic_tool_parser.py \
  --tool-call-parser granite_pythonic

Before submitting a new issue...

  • Made sure I already searched for relevant issues (related: #43104)
  • Checked documentation and chatbot
  • Fix is already submitted via PR #43113

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Granite 3.3 / 4.0 H-Small Python-style tool calls not converted to OpenAI tool_calls format [1 participants]