hermes - 💡(How to fix) Fix Feature: Content-bound tool call extraction layer (fallback for models emitting tool calls as text/XML)

Code Example

# In chat_completion_helpers.py, before strip_think_blocks() call:
def _extract_content_tool_calls(content: str) -> tuple[str, list[dict]]:
    """Extract tool calls embedded in content text.
    
    Returns (cleaned_content, extracted_tool_calls).
    If tool calls are found, they are removed from content and 
    returned in OpenAI tool_calls format.
    """
    tool_calls = []
    
    # Pattern registry — each entry: (regex, parser_func)
    patterns = [
        # Generic JSON blocks: ◂{"name":"...","arguments":{...}}▸
        (r'◂\s*(\{.*?\})\s*▸', _parse_json_tool_call),
        # MiniMax XML: <minimax:tool_call><invoke name="..."><parameter ...>
        (r'◂minimax:tool_call▸\s*◂invoke\s+name="([^"]+)"▸(.*?)◂/invoke▸\s*◂/minimax:tool_call▸', _parse_minimax_xml),
        # Standard XML: ◂tool_call▸...◂/tool_call▸ containing JSON
        (r'◂tool_call[^>]*▸\s*(\{.*?\})\s*◂/tool_call▸', _parse_json_tool_call),
        # Function-call XML: ◂function_call▸{"name":"..."}◂/function_call▸
        (r'◂function_call[^>]*▸\s*(\{.*?\})\s*◂/function_call▸', _parse_json_tool_call),
        # Gemma style: <function name="...">JSON</function>
        (r'<function\s+name="([^"]+)">\s*(.*?)\s*</function>', _parse_gemma_function),
    ]
    # ... apply patterns, build tool_calls, clean content

Problem

When certain LLM providers (MiniMax M2.7, DeepSeek V4, open-weight models on OpenRouter/NIM) emit tool calls as text content instead of via the structured tool_calls response field, Hermes silently discards the tool invocations and displays raw XML/text to the user. The agent loop never executes the tools.

This affects:

MiniMax M2.7 via NVIDIA NIM: emits <minimax:tool_call><invoke name="..."><parameter ...> XML blocks in content
DeepSeek V4 via certain proxies: emits _execute\ntool_name: ...\ncommand: ... text blocks in content
Open-weight models (Hermes-4, Gemma variants) via OpenRouter: emit ◂{...}▸ or <function_call> blocks in content
Any model that fails to use structured function calling, regardless of cause

Current behavior

The existing strip_think_blocks() in agent/agent_runtime_helpers.py (L484-527) recognizes and strips 6 XML tool-call tag families from content:

◂tool_call▸...◂/tool_call▸
◂tool_calls▸...◂/tool_calls▸
◂function_call▸...◂/function_call▸
etc.

But it only removes them — it never extracts and executes the tool calls. The content is cleaned for display, but the tool invocations are silently lost.

Additionally, <minimax:tool_call> (namespace-prefixed) and _execute text format have zero code coverage — they pass through as raw text.

Evidence: Code path analysis

Format	File:Line	Current handling	Tool executed?
`◂tool_call▸JSON◂/tool_call▸`	`agent_runtime_helpers.py:486`	Stripped by `re.sub`	❌
`◂function_call▸...◂/function_call▸`	`agent_runtime_helpers.py:486`	Stripped by `re.sub`	❌
`<function name="...">...`	`agent_runtime_helpers.py:497`	Stripped (boundary-gated)	❌
`◂minimax:tool_call▸...`	No code exists	Passes through as text	❌
`_execute\ntool_name:...`	No code exists	Passes through as text	❌
`◂{...}▸` (Copilot ACP)	`copilot_acp_client.py:30`	Parsed and executed	✅
Structured `tool_calls` field	`chat_completion_helpers.py:484`	Parsed and executed	✅

The only code path that extracts tool calls from content text is copilot_acp_client.py — and it's scoped exclusively to the Codex/ACP transport. The main chat completion path (streaming + non-streaming) has zero content→tool-call extraction.

Related issues

#27834 — MiniMax/DeepSeek XML tool calls rendered as text
#741 — Model outputs tool calls as text (closed as "Hermes model not supported", but the architectural gap remains)
#28238 — Strip reasoning_content for providers that reject it
#27930 — Strip reasoning_content for OpenAI-compatible providers

Proposed Solution

Content-Bound Tool Call Extraction Layer

Add a content→tool_calls extraction step in build_assistant_message() (chat_completion_helpers.py) that runs before strip_think_blocks(). This creates a fallback path: if the structured tool_calls field is empty but the content contains recognizable tool call patterns, extract them into the standard msg["tool_calls"] format.

Design

# In chat_completion_helpers.py, before strip_think_blocks() call:
def _extract_content_tool_calls(content: str) -> tuple[str, list[dict]]:
    """Extract tool calls embedded in content text.
    
    Returns (cleaned_content, extracted_tool_calls).
    If tool calls are found, they are removed from content and 
    returned in OpenAI tool_calls format.
    """
    tool_calls = []
    
    # Pattern registry — each entry: (regex, parser_func)
    patterns = [
        # Generic JSON blocks: ◂{"name":"...","arguments":{...}}▸
        (r'◂\s*(\{.*?\})\s*▸', _parse_json_tool_call),
        # MiniMax XML: <minimax:tool_call><invoke name="..."><parameter ...>
        (r'◂minimax:tool_call▸\s*◂invoke\s+name="([^"]+)"▸(.*?)◂/invoke▸\s*◂/minimax:tool_call▸', _parse_minimax_xml),
        # Standard XML: ◂tool_call▸...◂/tool_call▸ containing JSON
        (r'◂tool_call[^>]*▸\s*(\{.*?\})\s*◂/tool_call▸', _parse_json_tool_call),
        # Function-call XML: ◂function_call▸{"name":"..."}◂/function_call▸
        (r'◂function_call[^>]*▸\s*(\{.*?\})\s*◂/function_call▸', _parse_json_tool_call),
        # Gemma style: <function name="...">JSON</function>
        (r'<function\s+name="([^"]+)">\s*(.*?)\s*</function>', _parse_gemma_function),
    ]
    # ... apply patterns, build tool_calls, clean content

Key properties

Fallback-only: Only activates when assistant_message.tool_calls is empty/None. Structured tool calls always take priority.
Pattern registry: New XML/text formats can be added without touching core logic. Providers register their format.
Zero breaking changes: Existing behavior is preserved — strip_think_blocks() still runs on the remaining content.
Reuse existing code: copilot_acp_client.py already has _TOOL_CALL_BLOCK_RE and _TOOL_CALL_JSON_RE — these patterns can be consolidated.

Implementation scope

Component	Lines	Description
`_extract_content_tool_calls()`	~80	Core extraction function with pattern registry
Integration in `build_assistant_message()`	~5	Call before `strip_think_blocks()`
Integration in streaming path	~10	Apply after content accumulation
Tests	~120	Per-pattern tests + integration
Total	~215

Implementation Notes

The existing _TOOL_CALL_BLOCK_RE and _TOOL_CALL_JSON_RE in copilot_acp_client.py already handle ◂{...}▸ JSON extraction. These should be consolidated into a shared utility.
strip_think_blocks() should be updated to also handle <minimax:tool_call> namespace-prefixed tags (currently missed entirely).
For streaming: tool call XML tags arrive across multiple chunks. Content accumulation should buffer partial XML until a complete tag pair is detected, then extract. This is similar to how tool_calls_acc buffers structured tool call deltas.
Provider-specific patterns (like <minimax:tool_call>) could be registered via the plugin system if maintainers prefer extensibility over a fixed list.

Impact

Fixes #27834 without requiring model-side changes
Addresses the architectural gap that caused #741 (any model emitting tool calls as text)
Improves reliability for all OpenRouter/NIM/proxy configurations where structured tool calls may be lost
No performance cost — extraction only runs when tool_calls is empty, which is already the failure case

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering