vllm - ✅(Solved) Fix Granite 3.3 8B / 4.0 H-Small tool calls not parsed into OpenAI-compatible format [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#43104Fetched 2026-05-20 03:39:49
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2commented ×1referenced ×1

When using Granite 3.3 8B or Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) with vLLM's OpenAI-compatible server, the model does NOT emit proper OpenAI-compatible tool_calls in responses. Instead, it writes Python code that attempts to call the tools directly.

Root Cause

When using Granite 3.3 8B or Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) with vLLM's OpenAI-compatible server, the model does NOT emit proper OpenAI-compatible tool_calls in responses. Instead, it writes Python code that attempts to call the tools directly.

Fix Action

Fixed

PR fix notes

PR #43113: [Tool Parser] Add GranitePythonicToolParser for Granite 3.3 / 4.0 H-Small Python-style tool calls

Description (problem / solution / changelog)

Summary

Fixes #43104

Granite 3.3 8B (ibm-granite/granite-3.3-8b-instruct) and Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) emit tool invocations as Python-style function calls rather than the XML <tool_call> format handled by the existing Granite4ToolParser:

# What the model currently outputs (Python-style)
get_weather(location="San Francisco", unit="celsius")
// What OpenAI-compatible clients expect
{
  "tool_calls": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
    }
  }]
}

Without a matching parser, any agent or framework relying on the OpenAI tool-calling protocol silently fails — the raw Python code ends up in content instead of tool_calls.


What This PR Does

New file: vllm/tool_parsers/granite_pythonic_tool_parser.py

Adds GranitePythonicToolParser which:

FeatureDetail
Format detectedfunc_name(kw1=val1, kw2=val2) — one call per line
Argument parsingUses ast.parse (no eval) for safe kwargs extraction
Batch modeextract_tool_calls — full response
Streaming modeextract_tool_calls_streaming — line-buffered
Multiple callsMultiple consecutive calls on separate lines all converted
Plain text passthroughNon-call lines returned as content unchanged
No special tokens neededTokenizer-agnostic; works out of the box

Updated: vllm/tool_parsers/__init__.py

Registers the new parser as granite_pythonic in _TOOL_PARSERS_TO_REGISTER.

New file: tests/tool_parsers/test_granite_pythonic_tool_parser.py

Unit tests (tokenizer-free) covering:

  • Single tool call extraction
  • Multiple sequential tool calls
  • Tool call with no arguments
  • Mixed content + tool calls
  • Plain text passthrough
  • Streaming character-by-character
  • ToolParserManager registration

Usage

vllm serve ibm-granite/granite-3.3-8b-instruct \
  --tool-call-parser granite_pythonic \
  --chat-template examples/tool_chat_template_granite.jinja

For Granite 4.0 H-Small:

vllm serve ibm-granite/granite-4.0-h-small \
  --tool-call-parser granite_pythonic \
  --chat-template examples/tool_chat_template_granite.jinja

Testing

pytest tests/tool_parsers/test_granite_pythonic_tool_parser.py -v

Checklist

  • New parser added in vllm/tool_parsers/
  • Registered in vllm/tool_parsers/__init__.py
  • Unit tests added in tests/tool_parsers/
  • No eval() used — argument parsing via ast.parse only
  • Both batch and streaming modes implemented
  • Existing parsers and tests unaffected
  • Docs update for tool_calling.md (follow-up if maintainers request)

CC @rdwj (issue reporter) @njhill @WoosukKwon

Changed files

  • tests/tool_parsers/test_granite_pythonic_tool_parser.py (added, +171/-0)
  • vllm/tool_parsers/__init__.py (modified, +4/-0)
  • vllm/tool_parsers/granite_pythonic_tool_parser.py (added, +309/-0)

Code Example

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\"}"
        }
      }]
    }
  }]
}

---

get_weather(location="San Francisco")
RAW_BUFFERClick to expand / collapse

Description

When using Granite 3.3 8B or Granite 4.0 H-Small (ibm-granite/granite-4.0-h-small) with vLLM's OpenAI-compatible server, the model does NOT emit proper OpenAI-compatible tool_calls in responses. Instead, it writes Python code that attempts to call the tools directly.

Environment

  • vLLM version: tested against vllm/vllm-openai:v0.20.1
  • Model: ibm-granite/granite-3.3-8b-instruct, ibm-granite/granite-4.0-h-small
  • API: OpenAI-compatible chat completions endpoint (/v1/chat/completions)

Expected Behavior

When tools are provided in the request and the model decides to use one, the response should contain a proper tool_calls array in the OpenAI format:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\"}"
        }
      }]
    }
  }]
}

Actual Behavior

The model generates Python code in the content field instead of using the tool_calls structure:

get_weather(location="San Francisco")

This appears to be the model's native tool-calling format, but it's not being parsed into OpenAI-compatible tool_calls by vLLM's --tool-call-parser.

Impact

Agents and frameworks that depend on the OpenAI-compatible tool calling protocol cannot use Granite 3.3 8B or Granite 4.0 H-Small for tool-based workflows, even though the model is clearly capable of understanding and attempting to use tools.

Possible Solution

The --tool-call-parser flag should handle Granite's Python-style tool call format and transform it into the OpenAI-compatible structure. This may require adding a Granite-specific parser (similar to how other models have custom parsers) or extending an existing Python-based parser to recognize Granite's output pattern.

Related

  • Other models with custom tool formats (DeepSeek, Gemma4, etc.) have dedicated parsers
  • Issue #27661 discusses consolidated tool call parser implementations

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix Granite 3.3 8B / 4.0 H-Small tool calls not parsed into OpenAI-compatible format [1 pull requests, 1 comments, 2 participants]