vllm - ✅(Solved) Fix Granite 3.3 8B / 4.0 H-Small tool calls not parsed into OpenAI-compatible format [1 pull requests, 1 comments, 2 participants]

rdwj · 2026-05-19T14:28:19Z

[vllm] When using Granite 3.3 8B or Granite 4.0 H-Small ibm-granite/granite-4.0-h-small with vLLM's OpenAI-compatible server, the model does NOT emit proper Op… When using Granite 3.3 8B or Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) with vLLM's OpenAI-compatible server, the model does NOT emit proper OpenAI-compatible `tool_calls` in responses. Instead, it writes Python code that attempts to call the tools directly. # PR #43113: [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls - Repository: vllm-project/vllm - Author: AkshatRaj00 - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/43113 ## Description (problem / solution / changelog) ## Summary Fixes #43104 Granite 3.3 8B (`ibm-granite/granite-3.3-8b-instruct`) and Granite 4.0 H-Small (`ibm-granite/granite-4.0-h-small`) emit tool invocations as **Python-style function calls** rather than the XML ` ` format handled by the existing `Granite4ToolParser`: ```python # What the model currently outputs (Python-style) get_weather(location="San Francisco", unit="celsius") ``` ```json // What OpenAI-compatible clients expect { "tool_calls": [{ "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}" } }] } ``` Without a matching parser, any agent or framework relying on the OpenAI tool-calling protocol **silently fails** — the raw Python code ends up in `content` instead of `tool_calls`. --- ## What This PR Does ### New file: `vllm/tool_parsers/granite_pythonic_tool_parser.py` Adds `GranitePythonicToolParser` which: | Feature | Detail | |---|---| | **Format detected** | `func_name(kw1=val1, kw2=val2)` — one call per line | | **Argument parsing** | Uses `ast.parse` (no `eval`) for safe kwargs extraction | | **Batch mode** | `extract_tool_calls` — full response | | **Streaming mode** | `extract_tool_calls_streaming` — line-buffered | | **Multiple calls** | Multiple consecutive calls on separate lines all converted | | **Plain text passthrough** | Non-call lines returned as `content` unchanged | | **No special tokens needed** | Tokenizer-agnostic; works out of the box | ### Updated: `vllm/tool_parsers/__init__.py` Registers the new parser as `granite_pythonic` in `_TOOL_PARSERS_TO_REGISTER`. ### New file: `tests/tool_parsers/test_granite_pythonic_tool_parser.py` Unit tests (tokenizer-free) covering: - Single tool call extraction - Multiple sequential tool calls - Tool call with no arguments - Mixed content + tool calls - Plain text passthrough - Streaming character-by-character - `ToolParserManager` registration --- ## Usage ```bash vllm serve ibm-granite/granite-3.3-8b-instruct \ --tool-call-parser granite_pythonic \ --chat-template examples/tool_chat_template_granite.jinja ``` For Granite 4.0 H-Small: ```bash vllm serve ibm-granite/granite-4.0-h-small \ --tool-call-parser granite_pythonic \ --chat-template examples/tool_chat_template_granite.jinja ``` --- ## Testing ```bash pytest tests/tool_parsers/test_granite_pythonic_tool_parser.py -v ``` --- ## Checklist - [x] New parser added in `vllm/tool_parsers/` - [x] Registered in `vllm/tool_parsers/__init__.py` - [x] Unit tests added in `tests/tool_parsers/` - [x] No `eval()` used — argument parsing via `ast.parse` only - [x] Both batch and streaming modes implemented - [x] Existing parsers and tests unaffected - [ ] Docs update for `tool_calling.md` (follow-up if maintainers request) CC @rdwj (issue reporter) @njhill @WoosukKwon ## Changed files - `tests/tool_parsers/test_granite_pythonic_tool_parser.py` (added, +171/-0) - `vllm/tool_parsers/__init__.py` (modified, +4/-0) - `vllm/tool_parsers/granite_pythonic_tool_parser.py` (added, +309/-0) ## Fixed - Fixed by PR: [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls (https://github.com/vllm-project/vllm/pull/43113) ## Description When using Granite 3.3 8B or Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) with vLLM's OpenAI-compatible server, the model does NOT emit proper OpenAI-compatible `tool_calls` in responses. Instead, it writes Python code that attempts to call the tools directly. ## Environment - vLLM version: tested against vllm/vllm-openai:v0.20.1 - Model: ibm-granite/granite-3.3-8b-instruct, ibm-granite/granite-4.0-h-small - API: OpenAI-compatible chat completions endpoint (`/v1/chat/completions`) ## Expected Behavior When tools are provided in the request and the model decides to use one, the response should contain a proper `tool_calls` array in the OpenAI format: ```json { "choices": [{ "message": { "role": "assistant", "content": null, "tool_calls": [{ "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"San Francisco\"}" } }] } }] } ``` ## Actual Behavior The model generates Python code in the `content` field instead of using the `tool_calls` structure: ```python get_weather(locatio

vllm2026-05-19 14:28:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#43104•Fetched 2026-05-20 03:39:49

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rdwj

Participants

rdwj

samlkrystof

Timeline (top)

cross-referenced ×2commented ×1referenced ×1

When using Granite 3.3 8B or Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) with vLLM's OpenAI-compatible server, the model does NOT emit proper OpenAI-compatible tool_calls in responses. Instead, it writes Python code that attempts to call the tools directly.

Root Cause

Fix Action

Fixed

Fixed by PR: [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls (https://github.com/vllm-project/vllm/pull/43113)

PR fix notes

PR #43113: [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls

Repository: vllm-project/vllm
Author: AkshatRaj00
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/43113

Description (problem / solution / changelog)

Summary

Fixes #43104

Granite 3.3 8B (ibm-granite/granite-3.3-8b-instruct) and Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) emit tool invocations as Python-style function calls rather than the XML <tool_call> format handled by the existing Granite4ToolParser:

# What the model currently outputs (Python-style)
get_weather(location="San Francisco", unit="celsius")

// What OpenAI-compatible clients expect
{
  "tool_calls": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
    }
  }]
}

Without a matching parser, any agent or framework relying on the OpenAI tool-calling protocol silently fails — the raw Python code ends up in content instead of tool_calls.

What This PR Does

New file: `vllm/tool_parsers/granite_pythonic_tool_parser.py`

Adds GranitePythonicToolParser which:

Feature	Detail
Format detected	`func_name(kw1=val1, kw2=val2)` — one call per line
Argument parsing	Uses `ast.parse` (no `eval`) for safe kwargs extraction
Batch mode	`extract_tool_calls` — full response
Streaming mode	`extract_tool_calls_streaming` — line-buffered
Multiple calls	Multiple consecutive calls on separate lines all converted
Plain text passthrough	Non-call lines returned as `content` unchanged
No special tokens needed	Tokenizer-agnostic; works out of the box

Updated: `vllm/tool_parsers/init.py`

Registers the new parser as granite_pythonic in _TOOL_PARSERS_TO_REGISTER.

New file: `tests/tool_parsers/test_granite_pythonic_tool_parser.py`

Unit tests (tokenizer-free) covering:

Single tool call extraction
Multiple sequential tool calls
Tool call with no arguments
Mixed content + tool calls
Plain text passthrough
Streaming character-by-character
ToolParserManager registration

Usage

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-call-parser granite_pythonic \
  --chat-template examples/tool_chat_template_granite.jinja

For Granite 4.0 H-Small:

vllm serve ibm-granite/granite-4.0-h-small \
  --tool-call-parser granite_pythonic \
  --chat-template examples/tool_chat_template_granite.jinja

Testing

pytest tests/tool_parsers/test_granite_pythonic_tool_parser.py -v

Checklist

New parser added in vllm/tool_parsers/
Registered in vllm/tool_parsers/__init__.py
Unit tests added in tests/tool_parsers/
No eval() used — argument parsing via ast.parse only
Both batch and streaming modes implemented
Existing parsers and tests unaffected
Docs update for tool_calling.md (follow-up if maintainers request)

CC @rdwj (issue reporter) @njhill @WoosukKwon

Changed files

tests/tool_parsers/test_granite_pythonic_tool_parser.py (added, +171/-0)
vllm/tool_parsers/__init__.py (modified, +4/-0)
vllm/tool_parsers/granite_pythonic_tool_parser.py (added, +309/-0)

Code Example

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\"}"
        }
      }]
    }
  }]
}

---

get_weather(location="San Francisco")

RAW_BUFFERClick to expand / collapse

Description

Environment

vLLM version: tested against vllm/vllm-openai:v0.20.1
Model: ibm-granite/granite-3.3-8b-instruct, ibm-granite/granite-4.0-h-small
API: OpenAI-compatible chat completions endpoint (/v1/chat/completions)

Expected Behavior

When tools are provided in the request and the model decides to use one, the response should contain a proper tool_calls array in the OpenAI format:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\"}"
        }
      }]
    }
  }]
}

Actual Behavior

The model generates Python code in the content field instead of using the tool_calls structure:

get_weather(location="San Francisco")

This appears to be the model's native tool-calling format, but it's not being parsed into OpenAI-compatible tool_calls by vLLM's --tool-call-parser.

Impact

Agents and frameworks that depend on the OpenAI-compatible tool calling protocol cannot use Granite 3.3 8B or Granite 4.0 H-Small for tool-based workflows, even though the model is clearly capable of understanding and attempting to use tools.

Possible Solution

The --tool-call-parser flag should handle Granite's Python-style tool call format and transform it into the OpenAI-compatible structure. This may require adding a Granite-specific parser (similar to how other models have custom parsers) or extending an existing Python-based parser to recognize Granite's output pattern.

Other models with custom tool formats (DeepSeek, Gemma4, etc.) have dedicated parsers
Issue #27661 discusses consolidated tool call parser implementations

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix Granite 3.3 8B / 4.0 H-Small tool calls not parsed into OpenAI-compatible format [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #43113: [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls

Description (problem / solution / changelog)

Summary

What This PR Does

New file: `vllm/tool_parsers/granite_pythonic_tool_parser.py`

Updated: `vllm/tool_parsers/init.py`

New file: `tests/tool_parsers/test_granite_pythonic_tool_parser.py`

Usage

Testing

Checklist

Changed files

Code Example

Description

Environment

Expected Behavior

Actual Behavior

Impact

Possible Solution

Related

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix Granite 3.3 8B / 4.0 H-Small tool calls not parsed into OpenAI-compatible format [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #43113: [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls

Description (problem / solution / changelog)

Summary

What This PR Does

New file: vllm/tool_parsers/granite_pythonic_tool_parser.py

Updated: vllm/tool_parsers/__init__.py

New file: tests/tool_parsers/test_granite_pythonic_tool_parser.py

Usage

Testing

Checklist

Changed files

Code Example

Description

Environment

Expected Behavior

Actual Behavior

Impact

Possible Solution

Related

Still need to ship something?

RELATED_DISCOVERY

TRENDING

New file: `vllm/tool_parsers/granite_pythonic_tool_parser.py`

Updated: `vllm/tool_parsers/init.py`

New file: `tests/tool_parsers/test_granite_pythonic_tool_parser.py`