vllm - 💡(How to fix) Fix [Bug]: Granite 3.3 / 4.0 H-Small Python-style tool calls not converted to OpenAI tool

Root Cause

Granite 3.3 and Granite 4.0 H-Small use a Python-style tool call format natively:

get_weather(location="San Francisco", unit="celsius")
search_web(query="vLLM release notes")

The existing Granite4ToolParser (registered as granite4) handles the XML <tool_call> format used by Granite 4.0 Tiny/Base — it does not handle the Python-style output of Granite 3.3 or H-Small.

Passing --tool-call-parser granite4 to these models has no effect since the <tool_call> tokens never appear in their output.

Code Example

get_weather(location="San Francisco", unit="celsius")
search_web(query="vLLM release notes")

---

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-call-parser granite4 \
  --chat-template examples/tool_chat_template_granite.jinja

---

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="x")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="ibm-granite/granite-3.3-8b-instruct",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=tools,
    tool_choice="auto",
)

print(response.choices[0].message.tool_calls)   # ❌ None
print(response.choices[0].message.content)      # ❌ 'get_weather(location="San Francisco")'

---

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
        }
      }]
    }
  }]
}

---

get_weather(location="San Francisco", unit="celsius")

---

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-parser-plugin ./granite_pythonic_tool_parser.py \
  --tool-call-parser granite_pythonic

🐛 Describe the bug

When using ibm-granite/granite-3.3-8b-instruct or ibm-granite/granite-4.0-h-small with vLLM's OpenAI-compatible server, the model generates Python-style function calls as plain text in the content field instead of populating the tool_calls array in the OpenAI format.

This breaks all agent frameworks and clients that rely on the OpenAI tool-calling protocol.

Root Cause

Granite 3.3 and Granite 4.0 H-Small use a Python-style tool call format natively:

get_weather(location="San Francisco", unit="celsius")
search_web(query="vLLM release notes")

Passing --tool-call-parser granite4 to these models has no effect since the <tool_call> tokens never appear in their output.

Steps to Reproduce

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-call-parser granite4 \
  --chat-template examples/tool_chat_template_granite.jinja

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="x")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="ibm-granite/granite-3.3-8b-instruct",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=tools,
    tool_choice="auto",
)

print(response.choices[0].message.tool_calls)   # ❌ None
print(response.choices[0].message.content)      # ❌ 'get_weather(location="San Francisco")'

Expected Behavior

tool_calls should be populated:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
        }
      }]
    }
  }]
}

Actual Behavior

tool_calls is None. The raw Python call appears in content:

get_weather(location="San Francisco", unit="celsius")

Environment

vLLM version: v0.20.x / v0.21.x (confirmed on both)
Models affected: ibm-granite/granite-3.3-8b-instruct, ibm-granite/granite-4.0-h-small
Models NOT affected: ibm-granite/granite-4.0-tiny-preview (uses XML format, handled by granite4)

Fix / PR

A new parser GranitePythonicToolParser (registered as --tool-call-parser granite_pythonic) has been implemented and submitted in:

➡️ PR #43113 — [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls

The parser:

Uses ast.parse (no eval) to safely extract keyword arguments
Supports both batch and streaming modes
Is tokenizer-agnostic (no special tokens required)
Converts Python-style calls → OpenAI tool_calls format

Workaround (Until PR is Merged)

Copy granite_pythonic_tool_parser.py locally and load it via --tool-parser-plugin:

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-parser-plugin ./granite_pythonic_tool_parser.py \
  --tool-call-parser granite_pythonic

Before submitting a new issue...

Made sure I already searched for relevant issues (related: #43104)
Checked documentation and chatbot
Fix is already submitted via PR #43113

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: Granite 3.3 / 4.0 H-Small Python-style tool calls not converted to OpenAI tool_calls format [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround (Until PR is Merged)

Code Example

🐛 Describe the bug

Root Cause

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Fix / PR

Workaround (Until PR is Merged)

Before submitting a new issue...

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Granite 3.3 / 4.0 H-Small Python-style tool calls not converted to OpenAI tool_calls format [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround (Until PR is Merged)

Code Example

🐛 Describe the bug

Root Cause

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Fix / PR

Workaround (Until PR is Merged)

Before submitting a new issue...

Still need to ship something?

RELATED_DISCOVERY

TRENDING