vllm - ✅(Solved) Fix [Bug]: GLM47 Tool Call Bug [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37277Fetched 2026-04-08 00:48:21
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
closed ×1commented ×1labeled ×1

Error Message

`#!/usr/bin/env python3

-- coding: utf-8 --

""" Simple tool to test LLM structured output support. """

import os from typing import List, Optional from pydantic import BaseModel, Field from langchain_openai import ChatOpenAI import time

--- 1. Define the structured output schema ---

class TestProfile(BaseModel): """Schema for the extracted user profile.""" name: str = Field(description="Full name of the user") age: int = Field(description="Age of the user") email: Optional[str] = Field(default=None, description="Email address") interests: List[str] = Field(description="List of interests or hobbies")

--- 2. Define test cases ---

TEST_PROMPTS = [ "My name is Alex Johnson, I'm 28 years old. I enjoy hiking and photography.", "The user is Taylor Smith, age 35, contact at [email protected]. Likes reading sci-fi and gaming.", "Chris Lee. 42. Interests include jazz music and woodworking.", ]

--- 3. Core test function ---

def test_structured_output(model_name: str, api_key: str, base_url: str): """ Tests a single model's ability to output structured data. """ print(f"\n🔍 Testing model: {model_name}") print("-" * 40)

try:
    # Create the LLM client
    llm = ChatOpenAI(
        model=model_name,
        openai_api_key=api_key,
        openai_api_base=base_url,
        temperature=0.1,
    )

    # Bind the structured output schema
    structured_llm = llm.with_structured_output(TestProfile)

    for i, prompt in enumerate(TEST_PROMPTS, 1):
        print(f"\nPrompt {i}: '{prompt[:50]}...'")

        try:
            start = time.time()
            # Invoke the model
            result: TestProfile = structured_llm.invoke([
                ("system", "Extract the user profile information from the following text."),
                ("user", prompt)
            ])
            elapsed = time.time() - start

            print(f"   ✅ SUCCESS ({elapsed:.2f}s)")
            print(f"      -> Name: {result.name}, Age: {result.age}")
            print(f"      -> Email: {result.email}")
            print(f"      -> Interests: {', '.join(result.interests)}")

        except Exception as e:
            print(f"   ❌ FAILED")
            print(f"      -> Error: {type(e).__name__}: {e}")

except Exception as e:
    print(f"   ⚠️  Failed to initialize model client.")
    print(f"      -> Error: {e}")

--- 4. Main execution block ---

if name == "main": # Configuration (Set your own environment variables or replace here) API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here") BASE_URL = os.getenv("OPENAI_BASE_URL", "http://your.api.base.url/v1") MODEL_TO_TEST = "glm-4-plus" # Change this to your target model

print("🚀 Starting structured output compatibility test...")
test_structured_output(MODEL_TO_TEST, API_KEY, BASE_URL)
print("\nTest complete.")`

Fix Action

Fixed

PR fix notes

PR #37386: fix(glm47): improve tool call parsing and content normalization

Description (problem / solution / changelog)

Summary

  • Improve GLM-4.7 func_detail_regex: Use \S+? instead of .*? for the function name capture group, and make the arg group greedy (.* vs .*?) so all argument pairs are captured correctly. This produces cleaner function names without trailing whitespace/newlines.
  • Simplify func_arg_regex: Replace redundant (?:\\n|\s)* with \s* between </arg_key> and <arg_value> tags.
  • Normalize empty content to None: In Glm4MoeModelToolParser.extract_tool_calls, return content=None instead of content="" when there is no meaningful text before the tool call. This aligns with the OpenAI API convention where content is null when the assistant only produces tool calls.
  • Add GLM-4.7-specific tests: New test file covering zero-argument tool calls, inline args (no newline between name and args), newline-separated args, multiple tool calls, content normalization, and streaming scenarios.
  • Update existing GLM-4.5 tests: Fix expected_content values from "" to None to match the content normalization change.

Test plan

  • Existing GLM-4.5 parser tests pass with updated expected values
  • New GLM-4.7 parser tests cover all reported failure scenarios from #37277, #32436, #33877
  • pre-commit run --all-files passes

Fixes #37277 Related: #32436, #33877

Changed files

  • tests/tool_parsers/test_glm47_moe_tool_parser.py (added, +168/-0)
  • tests/tool_parsers/test_glm4_moe_tool_parser.py (modified, +3/-3)
  • vllm/tool_parsers/glm47_moe_tool_parser.py (modified, +16/-2)
  • vllm/tool_parsers/glm4_moe_tool_parser.py (modified, +6/-1)

Code Example

`#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Simple tool to test LLM structured output support.
"""

import os
from typing import List, Optional
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
import time

# --- 1. Define the structured output schema ---
class TestProfile(BaseModel):
    """Schema for the extracted user profile."""
    name: str = Field(description="Full name of the user")
    age: int = Field(description="Age of the user")
    email: Optional[str] = Field(default=None, description="Email address")
    interests: List[str] = Field(description="List of interests or hobbies")


# --- 2. Define test cases ---
TEST_PROMPTS = [
    "My name is Alex Johnson, I'm 28 years old. I enjoy hiking and photography.",
    "The user is Taylor Smith, age 35, contact at [email protected]. Likes reading sci-fi and gaming.",
    "Chris Lee. 42. Interests include jazz music and woodworking.",
]


# --- 3. Core test function ---
def test_structured_output(model_name: str, api_key: str, base_url: str):
    """
    Tests a single model's ability to output structured data.
    """
    print(f"\n🔍 Testing model: {model_name}")
    print("-" * 40)

    try:
        # Create the LLM client
        llm = ChatOpenAI(
            model=model_name,
            openai_api_key=api_key,
            openai_api_base=base_url,
            temperature=0.1,
        )

        # Bind the structured output schema
        structured_llm = llm.with_structured_output(TestProfile)

        for i, prompt in enumerate(TEST_PROMPTS, 1):
            print(f"\nPrompt {i}: '{prompt[:50]}...'")

            try:
                start = time.time()
                # Invoke the model
                result: TestProfile = structured_llm.invoke([
                    ("system", "Extract the user profile information from the following text."),
                    ("user", prompt)
                ])
                elapsed = time.time() - start

                print(f"   ✅ SUCCESS ({elapsed:.2f}s)")
                print(f"      -> Name: {result.name}, Age: {result.age}")
                print(f"      -> Email: {result.email}")
                print(f"      -> Interests: {', '.join(result.interests)}")

            except Exception as e:
                print(f"   ❌ FAILED")
                print(f"      -> Error: {type(e).__name__}: {e}")

    except Exception as e:
        print(f"   ⚠️  Failed to initialize model client.")
        print(f"      -> Error: {e}")


# --- 4. Main execution block ---
if __name__ == "__main__":
    # Configuration (Set your own environment variables or replace here)
    API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here")
    BASE_URL = os.getenv("OPENAI_BASE_URL", "http://your.api.base.url/v1")
    MODEL_TO_TEST = "glm-4-plus"  # Change this to your target model

    print("🚀 Starting structured output compatibility test...")
    test_structured_output(MODEL_TO_TEST, API_KEY, BASE_URL)
    print("\nTest complete.")`
RAW_BUFFERClick to expand / collapse

Your current environment

Env:vllm 0.17.1 GLM4.7 FP8, openai api 8*h20

python3 -m vllm.entrypoints.openai.api_server
--host "0.0.0.0"
--port "8000"
--model /models/GLM-4.7-FP8/
--served-model-name local-glm4-7
--tensor-parallel-size "8"
--enable-chunked-prefill
--enable-expert-parallel
--max_num_batched_tokens "4096"
--gpu-memory-utilization "0.9"
--enable-prefix-caching
--enable-auto-tool-choice
--tool-call-parser glm47
--reasoning-parser glm45
--speculative-config.num_speculative_tokens "1"
--speculative-config.method mtp
--enable-prompt-tokens-details
--uvicorn-log-level info

🐛 Describe the bug

Using tool call Might fail

`#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Simple tool to test LLM structured output support.
"""

import os
from typing import List, Optional
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
import time

# --- 1. Define the structured output schema ---
class TestProfile(BaseModel):
    """Schema for the extracted user profile."""
    name: str = Field(description="Full name of the user")
    age: int = Field(description="Age of the user")
    email: Optional[str] = Field(default=None, description="Email address")
    interests: List[str] = Field(description="List of interests or hobbies")


# --- 2. Define test cases ---
TEST_PROMPTS = [
    "My name is Alex Johnson, I'm 28 years old. I enjoy hiking and photography.",
    "The user is Taylor Smith, age 35, contact at [email protected]. Likes reading sci-fi and gaming.",
    "Chris Lee. 42. Interests include jazz music and woodworking.",
]


# --- 3. Core test function ---
def test_structured_output(model_name: str, api_key: str, base_url: str):
    """
    Tests a single model's ability to output structured data.
    """
    print(f"\n🔍 Testing model: {model_name}")
    print("-" * 40)

    try:
        # Create the LLM client
        llm = ChatOpenAI(
            model=model_name,
            openai_api_key=api_key,
            openai_api_base=base_url,
            temperature=0.1,
        )

        # Bind the structured output schema
        structured_llm = llm.with_structured_output(TestProfile)

        for i, prompt in enumerate(TEST_PROMPTS, 1):
            print(f"\nPrompt {i}: '{prompt[:50]}...'")

            try:
                start = time.time()
                # Invoke the model
                result: TestProfile = structured_llm.invoke([
                    ("system", "Extract the user profile information from the following text."),
                    ("user", prompt)
                ])
                elapsed = time.time() - start

                print(f"   ✅ SUCCESS ({elapsed:.2f}s)")
                print(f"      -> Name: {result.name}, Age: {result.age}")
                print(f"      -> Email: {result.email}")
                print(f"      -> Interests: {', '.join(result.interests)}")

            except Exception as e:
                print(f"   ❌ FAILED")
                print(f"      -> Error: {type(e).__name__}: {e}")

    except Exception as e:
        print(f"   ⚠️  Failed to initialize model client.")
        print(f"      -> Error: {e}")


# --- 4. Main execution block ---
if __name__ == "__main__":
    # Configuration (Set your own environment variables or replace here)
    API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here")
    BASE_URL = os.getenv("OPENAI_BASE_URL", "http://your.api.base.url/v1")
    MODEL_TO_TEST = "glm-4-plus"  # Change this to your target model

    print("🚀 Starting structured output compatibility test...")
    test_structured_output(MODEL_TO_TEST, API_KEY, BASE_URL)
    print("\nTest complete.")`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue with the tool call failing, we need to adjust the model configuration and the way we invoke the model. Here are the steps:

  • Update the MODEL_TO_TEST variable to match the served model name in the API server configuration.
  • Ensure the OPENAI_API_KEY and OPENAI_BASE_URL environment variables are set correctly.
  • Modify the test_structured_output function to handle potential exceptions and errors.

Example Code

# Update the model name to match the served model name
MODEL_TO_TEST = "local-glm4-7"

# Ensure the API key and base URL are set correctly
API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here")
BASE_URL = os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1")

# Modify the test function to handle exceptions
def test_structured_output(model_name: str, api_key: str, base_url: str):
    try:
        llm = ChatOpenAI(
            model=model_name,
            openai_api_key=api_key,
            openai_api_base=base_url,
            temperature=0.1,
        )
        structured_llm = llm.with_structured_output(TestProfile)
        
        for i, prompt in enumerate(TEST_PROMPTS, 1):
            try:
                result: TestProfile = structured_llm.invoke([
                    ("system", "Extract the user profile information from the following text."),
                    ("user", prompt)
                ])
                print(f"   ✅ SUCCESS")
                print(f"      -> Name: {result.name}, Age: {result.age}")
                print(f"      -> Email: {result.email}")
                print(f"      -> Interests: {', '.join(result.interests)}")
            except Exception as e:
                print(f"   ❌ FAILED")
                print(f"      -> Error: {type(e).__name__}: {e}")
    except Exception as e:
        print(f"   ⚠️  Failed to initialize model client.")
        print(f"      -> Error: {e}")

Verification

To verify the fix, run the updated code and check the output for successful invocations of the model. Ensure that the model name, API key, and base URL are correct and match the configuration of the API server.

Extra Tips

  • Make sure to replace the your_api_key_here and http://your.api.base.url/v1 placeholders with the actual values for your OpenAI API key and base URL.
  • If you encounter issues with the model invocation, check the API server logs for errors and adjust the configuration as needed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: GLM47 Tool Call Bug [1 pull requests, 1 comments, 2 participants]