vllm - ✅(Solved) Fix [Bug]: GLM47 Tool Call Bug [1 pull requests, 1 comments, 2 participants]

vllm2026-03-17 09:14:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37277•Fetched 2026-04-08 00:48:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

xi1212

Participants

xi1212

yanghui1-arch

Timeline (top)

closed ×1commented ×1labeled ×1

Error Message

`#!/usr/bin/env python3

-- coding: utf-8 --

""" Simple tool to test LLM structured output support. """

import os from typing import List, Optional from pydantic import BaseModel, Field from langchain_openai import ChatOpenAI import time

--- 1. Define the structured output schema ---

class TestProfile(BaseModel): """Schema for the extracted user profile.""" name: str = Field(description="Full name of the user") age: int = Field(description="Age of the user") email: Optional[str] = Field(default=None, description="Email address") interests: List[str] = Field(description="List of interests or hobbies")

--- 2. Define test cases ---

TEST_PROMPTS = [ "My name is Alex Johnson, I'm 28 years old. I enjoy hiking and photography.", "The user is Taylor Smith, age 35, contact at [email protected]. Likes reading sci-fi and gaming.", "Chris Lee. 42. Interests include jazz music and woodworking.", ]

--- 3. Core test function ---

def test_structured_output(model_name: str, api_key: str, base_url: str): """ Tests a single model's ability to output structured data. """ print(f"\n🔍 Testing model: {model_name}") print("-" * 40)

try:
    # Create the LLM client
    llm = ChatOpenAI(
        model=model_name,
        openai_api_key=api_key,
        openai_api_base=base_url,
        temperature=0.1,
    )

    # Bind the structured output schema
    structured_llm = llm.with_structured_output(TestProfile)

    for i, prompt in enumerate(TEST_PROMPTS, 1):
        print(f"\nPrompt {i}: '{prompt[:50]}...'")

        try:
            start = time.time()
            # Invoke the model
            result: TestProfile = structured_llm.invoke([
                ("system", "Extract the user profile information from the following text."),
                ("user", prompt)
            ])
            elapsed = time.time() - start

            print(f"   ✅ SUCCESS ({elapsed:.2f}s)")
            print(f"      -> Name: {result.name}, Age: {result.age}")
            print(f"      -> Email: {result.email}")
            print(f"      -> Interests: {', '.join(result.interests)}")

        except Exception as e:
            print(f"   ❌ FAILED")
            print(f"      -> Error: {type(e).__name__}: {e}")

except Exception as e:
    print(f"   ⚠️  Failed to initialize model client.")
    print(f"      -> Error: {e}")

--- 4. Main execution block ---

if name == "main": # Configuration (Set your own environment variables or replace here) API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here") BASE_URL = os.getenv("OPENAI_BASE_URL", "http://your.api.base.url/v1") MODEL_TO_TEST = "glm-4-plus" # Change this to your target model

print("🚀 Starting structured output compatibility test...")
test_structured_output(MODEL_TO_TEST, API_KEY, BASE_URL)
print("\nTest complete.")`

Fix Action

Fixed

Fixed by PR: fix(glm47): improve tool call parsing and content normalization (https://github.com/vllm-project/vllm/pull/37386)

PR fix notes

PR #37386: fix(glm47): improve tool call parsing and content normalization

Repository: vllm-project/vllm
Author: karanb192
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/37386

Description (problem / solution / changelog)

Summary

Improve GLM-4.7 func_detail_regex: Use \S+? instead of .*? for the function name capture group, and make the arg group greedy (.* vs .*?) so all argument pairs are captured correctly. This produces cleaner function names without trailing whitespace/newlines.
Simplify func_arg_regex: Replace redundant (?:\\n|\s)* with \s* between </arg_key> and <arg_value> tags.
Normalize empty content to None: In Glm4MoeModelToolParser.extract_tool_calls, return content=None instead of content="" when there is no meaningful text before the tool call. This aligns with the OpenAI API convention where content is null when the assistant only produces tool calls.
Add GLM-4.7-specific tests: New test file covering zero-argument tool calls, inline args (no newline between name and args), newline-separated args, multiple tool calls, content normalization, and streaming scenarios.
Update existing GLM-4.5 tests: Fix expected_content values from "" to None to match the content normalization change.

Test plan

Existing GLM-4.5 parser tests pass with updated expected values
New GLM-4.7 parser tests cover all reported failure scenarios from #37277, #32436, #33877
pre-commit run --all-files passes

Fixes #37277 Related: #32436, #33877

Changed files

tests/tool_parsers/test_glm47_moe_tool_parser.py (added, +168/-0)
tests/tool_parsers/test_glm4_moe_tool_parser.py (modified, +3/-3)
vllm/tool_parsers/glm47_moe_tool_parser.py (modified, +16/-2)
vllm/tool_parsers/glm4_moe_tool_parser.py (modified, +6/-1)

Code Example

`#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Simple tool to test LLM structured output support.
"""

import os
from typing import List, Optional
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
import time

# --- 1. Define the structured output schema ---
class TestProfile(BaseModel):
    """Schema for the extracted user profile."""
    name: str = Field(description="Full name of the user")
    age: int = Field(description="Age of the user")
    email: Optional[str] = Field(default=None, description="Email address")
    interests: List[str] = Field(description="List of interests or hobbies")


# --- 2. Define test cases ---
TEST_PROMPTS = [
    "My name is Alex Johnson, I'm 28 years old. I enjoy hiking and photography.",
    "The user is Taylor Smith, age 35, contact at [email protected]. Likes reading sci-fi and gaming.",
    "Chris Lee. 42. Interests include jazz music and woodworking.",
]


# --- 3. Core test function ---
def test_structured_output(model_name: str, api_key: str, base_url: str):
    """
    Tests a single model's ability to output structured data.
    """
    print(f"\n🔍 Testing model: {model_name}")
    print("-" * 40)

    try:
        # Create the LLM client
        llm = ChatOpenAI(
            model=model_name,
            openai_api_key=api_key,
            openai_api_base=base_url,
            temperature=0.1,
        )

        # Bind the structured output schema
        structured_llm = llm.with_structured_output(TestProfile)

        for i, prompt in enumerate(TEST_PROMPTS, 1):
            print(f"\nPrompt {i}: '{prompt[:50]}...'")

            try:
                start = time.time()
                # Invoke the model
                result: TestProfile = structured_llm.invoke([
                    ("system", "Extract the user profile information from the following text."),
                    ("user", prompt)
                ])
                elapsed = time.time() - start

                print(f"   ✅ SUCCESS ({elapsed:.2f}s)")
                print(f"      -> Name: {result.name}, Age: {result.age}")
                print(f"      -> Email: {result.email}")
                print(f"      -> Interests: {', '.join(result.interests)}")

            except Exception as e:
                print(f"   ❌ FAILED")
                print(f"      -> Error: {type(e).__name__}: {e}")

    except Exception as e:
        print(f"   ⚠️  Failed to initialize model client.")
        print(f"      -> Error: {e}")


# --- 4. Main execution block ---
if __name__ == "__main__":
    # Configuration (Set your own environment variables or replace here)
    API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here")
    BASE_URL = os.getenv("OPENAI_BASE_URL", "http://your.api.base.url/v1")
    MODEL_TO_TEST = "glm-4-plus"  # Change this to your target model

    print("🚀 Starting structured output compatibility test...")
    test_structured_output(MODEL_TO_TEST, API_KEY, BASE_URL)
    print("\nTest complete.")`

RAW_BUFFERClick to expand / collapse

Your current environment

Env：vllm 0.17.1 GLM4.7 FP8， openai api 8*h20

python3 -m vllm.entrypoints.openai.api_server
--host "0.0.0.0"
--port "8000"
--model /models/GLM-4.7-FP8/
--served-model-name local-glm4-7
--tensor-parallel-size "8"
--enable-chunked-prefill
--enable-expert-parallel
--max_num_batched_tokens "4096"
--gpu-memory-utilization "0.9"
--enable-prefix-caching
--enable-auto-tool-choice
--tool-call-parser glm47
--reasoning-parser glm45
--speculative-config.num_speculative_tokens "1"
--speculative-config.method mtp
--enable-prompt-tokens-details
--uvicorn-log-level info

🐛 Describe the bug

Using tool call Might fail

`#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Simple tool to test LLM structured output support.
"""

import os
from typing import List, Optional
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
import time

# --- 1. Define the structured output schema ---
class TestProfile(BaseModel):
    """Schema for the extracted user profile."""
    name: str = Field(description="Full name of the user")
    age: int = Field(description="Age of the user")
    email: Optional[str] = Field(default=None, description="Email address")
    interests: List[str] = Field(description="List of interests or hobbies")


# --- 2. Define test cases ---
TEST_PROMPTS = [
    "My name is Alex Johnson, I'm 28 years old. I enjoy hiking and photography.",
    "The user is Taylor Smith, age 35, contact at [email protected]. Likes reading sci-fi and gaming.",
    "Chris Lee. 42. Interests include jazz music and woodworking.",
]


# --- 3. Core test function ---
def test_structured_output(model_name: str, api_key: str, base_url: str):
    """
    Tests a single model's ability to output structured data.
    """
    print(f"\n🔍 Testing model: {model_name}")
    print("-" * 40)

    try:
        # Create the LLM client
        llm = ChatOpenAI(
            model=model_name,
            openai_api_key=api_key,
            openai_api_base=base_url,
            temperature=0.1,
        )

        # Bind the structured output schema
        structured_llm = llm.with_structured_output(TestProfile)

        for i, prompt in enumerate(TEST_PROMPTS, 1):
            print(f"\nPrompt {i}: '{prompt[:50]}...'")

            try:
                start = time.time()
                # Invoke the model
                result: TestProfile = structured_llm.invoke([
                    ("system", "Extract the user profile information from the following text."),
                    ("user", prompt)
                ])
                elapsed = time.time() - start

                print(f"   ✅ SUCCESS ({elapsed:.2f}s)")
                print(f"      -> Name: {result.name}, Age: {result.age}")
                print(f"      -> Email: {result.email}")
                print(f"      -> Interests: {', '.join(result.interests)}")

            except Exception as e:
                print(f"   ❌ FAILED")
                print(f"      -> Error: {type(e).__name__}: {e}")

    except Exception as e:
        print(f"   ⚠️  Failed to initialize model client.")
        print(f"      -> Error: {e}")


# --- 4. Main execution block ---
if __name__ == "__main__":
    # Configuration (Set your own environment variables or replace here)
    API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here")
    BASE_URL = os.getenv("OPENAI_BASE_URL", "http://your.api.base.url/v1")
    MODEL_TO_TEST = "glm-4-plus"  # Change this to your target model

    print("🚀 Starting structured output compatibility test...")
    test_structured_output(MODEL_TO_TEST, API_KEY, BASE_URL)
    print("\nTest complete.")`

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue with the tool call failing, we need to adjust the model configuration and the way we invoke the model. Here are the steps:

Update the MODEL_TO_TEST variable to match the served model name in the API server configuration.
Ensure the OPENAI_API_KEY and OPENAI_BASE_URL environment variables are set correctly.
Modify the test_structured_output function to handle potential exceptions and errors.

Example Code

# Update the model name to match the served model name
MODEL_TO_TEST = "local-glm4-7"

# Ensure the API key and base URL are set correctly
API_KEY = os.getenv("OPENAI_API_KEY", "your_api_key_here")
BASE_URL = os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1")

# Modify the test function to handle exceptions
def test_structured_output(model_name: str, api_key: str, base_url: str):
    try:
        llm = ChatOpenAI(
            model=model_name,
            openai_api_key=api_key,
            openai_api_base=base_url,
            temperature=0.1,
        )
        structured_llm = llm.with_structured_output(TestProfile)
        
        for i, prompt in enumerate(TEST_PROMPTS, 1):
            try:
                result: TestProfile = structured_llm.invoke([
                    ("system", "Extract the user profile information from the following text."),
                    ("user", prompt)
                ])
                print(f"   ✅ SUCCESS")
                print(f"      -> Name: {result.name}, Age: {result.age}")
                print(f"      -> Email: {result.email}")
                print(f"      -> Interests: {', '.join(result.interests)}")
            except Exception as e:
                print(f"   ❌ FAILED")
                print(f"      -> Error: {type(e).__name__}: {e}")
    except Exception as e:
        print(f"   ⚠️  Failed to initialize model client.")
        print(f"      -> Error: {e}")

Verification

To verify the fix, run the updated code and check the output for successful invocations of the model. Ensure that the model name, API key, and base URL are correct and match the configuration of the API server.

Extra Tips

Make sure to replace the your_api_key_here and http://your.api.base.url/v1 placeholders with the actual values for your OpenAI API key and base URL.
If you encounter issues with the model invocation, check the API server logs for errors and adjust the configuration as needed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #indexing error #inference speed #output truncation #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: GLM47 Tool Call Bug [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

-- coding: utf-8 --

--- 1. Define the structured output schema ---

--- 2. Define test cases ---

--- 3. Core test function ---

--- 4. Main execution block ---

Fix Action

Fixed

PR fix notes

PR #37386: fix(glm47): improve tool call parsing and content normalization

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: GLM47 Tool Call Bug [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

-- coding: utf-8 --

--- 1. Define the structured output schema ---

--- 2. Define test cases ---

--- 3. Core test function ---

--- 4. Main execution block ---

Fix Action

Fixed

PR fix notes

PR #37386: fix(glm47): improve tool call parsing and content normalization

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING