ollama - ✅(Solved) Fix /api/generate returns HTTP 500 {"error":"EOF"} with qwen3.5:9b when prompt requests <tool_call> XML output [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14986Fetched 2026-04-08 01:08:21
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1labeled ×1referenced ×1

/api/generate works for plain text prompts and simple XML-like prompts, but can return HTTP 500 with {"error":"EOF"} when the prompt asks the model to emit <tool_call>...</tool_call> style XML output.

This reproduces even without using native Ollama tools. The prompt only contains tool-like XML tags as plain text instructions.

Possibly related to other Qwen XML/tool-call parsing issues, but this reproduces on qwen3.5:9b with /api/generate and without native tools.

Error Message

import json import urllib.request import urllib.error

HOST = "http://127.0.0.1:11434" MODEL = "qwen3.5:9b"

tests = [ ("plain_text", "hello"), ("simple_xml", "<user_request>查上海天气</user_request>"), ("xml_with_tool_hint", "<tools>[{"name":"get_weather"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个 <tool_call>...</tool_call>"), ("minitest1","<tools>[{"name":"get_weather"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个工具调用块"), ]

for name, prompt in tests: payload = { "model": MODEL, "stream": False, "prompt": prompt, } req = urllib.request.Request( f"{HOST}/api/generate", data=json.dumps(payload).encode("utf-8"), headers={"Content-Type": "application/json"}, method="POST", ) print(f"\n=== TEST: {name} ===") try: with urllib.request.urlopen(req, timeout=120) as resp: body = resp.read().decode("utf-8") print("HTTP", resp.status) print(body[:1000]) except urllib.error.HTTPError as e: body = e.read().decode("utf-8", errors="replace") print("HTTPError", e.code) print(body) except Exception as e: print("Exception", repr(e))

Root Cause

The server should not return HTTP 500 / {"error":"EOF"} because the prompt contains <tool_call>-style XML text.

Fix Action

Fixed

PR fix notes

PR #15011: model/parsers: fall back to content when qwen tool call XML parse fails

Description (problem / solution / changelog)

Summary

Fixes #14986

Root Cause

The Qwen3.5 model always has a builtin parser registered (qwen3.5). When a user prompt instructs the model to emit <tool_call>...</tool_call>-style XML as plain text — without registering native Ollama tools — the Qwen3CoderParser still scans for those delimiters in the model output, finds them, and calls parseToolCall().

If the content between the tags is not valid Qwen3-coder XML (e.g. a JSON payload such as {"name": "get_weather", ...}), xml.Unmarshal returns an error (typically EOF). That error propagated all the way up to GenerateHandler / ChatHandler, which sent {"error":"EOF"} and returned HTTP 500.

Fix

In Qwen3CoderParser.Add(), when parseToolCall fails:

  • Before: return the error immediately
  • After: log a warning and write the raw tool-call text (including the wrapping tags) into the content strings.Builder, then break

This means the caller always receives a usable HTTP 200 response with the raw model output instead of an internal server error.

Safety: when real tools are registered and the model produces well-formed Qwen3-coder XML, parseToolCall succeeds and the existing path is taken unchanged.

Changes

  • model/parsers/qwen3coder.go: on parse failure, fall back to content instead of returning an error
  • model/parsers/qwen35_test.go: add regression test that initialises the parser with no tools (matching the /api/generate code path) and feeds it a JSON-style <tool_call> block — expects no error and non-empty content

Testing

New test: TestQwen35ParserToolCallAsPlainTextFallback


CLA Confirmation: I have read, understood, and agree to the Ollama Contributor License Agreement (CLA). I understand that this contribution may be used under the terms of the MIT license.

Changed files

  • model/parsers/qwen35_test.go (modified, +32/-0)
  • model/parsers/qwen3coder.go (modified, +10/-2)

Code Example

import json
import urllib.request
import urllib.error

HOST = "http://127.0.0.1:11434"
MODEL = "qwen3.5:9b"

tests = [
    ("plain_text", "hello"),
    ("simple_xml", "<user_request>查上海天气</user_request>"),
    ("xml_with_tool_hint", "<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个 <tool_call>...</tool_call>"),
    ("minitest1","<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个工具调用块"),
]

for name, prompt in tests:
    payload = {
        "model": MODEL,
        "stream": False,
        "prompt": prompt,
    }
    req = urllib.request.Request(
        f"{HOST}/api/generate",
        data=json.dumps(payload).encode("utf-8"),
        headers={"Content-Type": "application/json"},
        method="POST",
    )
    print(f"\n=== TEST: {name} ===")
    try:
        with urllib.request.urlopen(req, timeout=120) as resp:
            body = resp.read().decode("utf-8")
            print("HTTP", resp.status)
            print(body[:1000])
    except urllib.error.HTTPError as e:
        body = e.read().decode("utf-8", errors="replace")
        print("HTTPError", e.code)
        print(body)
    except Exception as e:
        print("Exception", repr(e))

---

hello

---

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:46:44.373588553Z","response":"Hello! 👋 How can I help you today?", ...}

---

<user_request>查上海天气</user_request>

---

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:08.663588206Z","response":"很抱歉,作为人工智能助手,我暂时无法直接获取实时的天气数据。...", ...}

---

<tools>[{"name":"get_weather"}]</tools>
<user_request>查上海天气</user_request>
请只输出一个 <tool_call>...</tool_call>

---

HTTPError 500
{"error":"EOF"}

---

<tools>[{"name":"get_weather"}]</tools>
<user_request>查上海天气</user_request>
请只输出一个工具调用块

---

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:56.226133671Z","response":"

---

### Expected behavior

The server should not return HTTP 500 / `{"error":"EOF"}` because the prompt contains `<tool_call>`-style XML text.

Even if Ollama or the model-side parser dislikes this format, the request should fail gracefully, for example by:

- returning raw model text, or
- returning a controlled parse error with a clear message

but not an internal server error.

### Actual behavior

When the prompt explicitly asks for `<tool_call>...</tool_call>` or `<tool_call></tool_call>`, `/api/generate` can fail with:

---

### Notes

This seems to be triggered specifically by the `<tool_call>` XML pattern in the prompt.

Important detail: this reproduction does **not** use native Ollama tools. The issue appears to happen even when these tags are just plain prompt text.

### Relevant log output
RAW_BUFFERClick to expand / collapse

What is the issue?

Environment

  • Ollama version: v0.18.2
  • Model: qwen3.5:9b
  • Endpoint: /api/generate
  • Native Ollama tools: not used

Summary

/api/generate works for plain text prompts and simple XML-like prompts, but can return HTTP 500 with {"error":"EOF"} when the prompt asks the model to emit <tool_call>...</tool_call> style XML output.

This reproduces even without using native Ollama tools. The prompt only contains tool-like XML tags as plain text instructions.

Possibly related to other Qwen XML/tool-call parsing issues, but this reproduces on qwen3.5:9b with /api/generate and without native tools.

Minimal reproduction

I used the following script:

import json
import urllib.request
import urllib.error

HOST = "http://127.0.0.1:11434"
MODEL = "qwen3.5:9b"

tests = [
    ("plain_text", "hello"),
    ("simple_xml", "<user_request>查上海天气</user_request>"),
    ("xml_with_tool_hint", "<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个 <tool_call>...</tool_call>"),
    ("minitest1","<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个工具调用块"),
]

for name, prompt in tests:
    payload = {
        "model": MODEL,
        "stream": False,
        "prompt": prompt,
    }
    req = urllib.request.Request(
        f"{HOST}/api/generate",
        data=json.dumps(payload).encode("utf-8"),
        headers={"Content-Type": "application/json"},
        method="POST",
    )
    print(f"\n=== TEST: {name} ===")
    try:
        with urllib.request.urlopen(req, timeout=120) as resp:
            body = resp.read().decode("utf-8")
            print("HTTP", resp.status)
            print(body[:1000])
    except urllib.error.HTTPError as e:
        body = e.read().decode("utf-8", errors="replace")
        print("HTTPError", e.code)
        print(body)
    except Exception as e:
        print("Exception", repr(e))

Results

1) plain_text

Prompt:

hello

Result:

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:46:44.373588553Z","response":"Hello! 👋 How can I help you today?", ...}

2) simple_xml

Prompt:

<user_request>查上海天气</user_request>

Result:

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:08.663588206Z","response":"很抱歉,作为人工智能助手,我暂时无法直接获取实时的天气数据。...", ...}

3) xml_with_tool_hint

Prompt:

<tools>[{"name":"get_weather"}]</tools>
<user_request>查上海天气</user_request>
请只输出一个 <tool_call>...</tool_call>

Result:

HTTPError 500
{"error":"EOF"}

5) minitest1

Prompt:

<tools>[{"name":"get_weather"}]</tools>
<user_request>查上海天气</user_request>
请只输出一个工具调用块

Result:

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:56.226133671Z","response":"```json
{
  \"tool_name\": \"get_weather\",
  \"parameters\": {
    \"location\": \"上海\",
    \"city_name\": \"Shanghai\",
    \"country\": \"CN\"
  }
}
```", ...}

Expected behavior

The server should not return HTTP 500 / {"error":"EOF"} because the prompt contains <tool_call>-style XML text.

Even if Ollama or the model-side parser dislikes this format, the request should fail gracefully, for example by:

  • returning raw model text, or
  • returning a controlled parse error with a clear message

but not an internal server error.

Actual behavior

When the prompt explicitly asks for <tool_call>...</tool_call> or <tool_call></tool_call>, /api/generate can fail with:

{"error":"EOF"}

Notes

This seems to be triggered specifically by the <tool_call> XML pattern in the prompt.

Important detail: this reproduction does not use native Ollama tools. The issue appears to happen even when these tags are just plain prompt text.

Relevant log output

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.18.2

extent analysis

Fix Plan

To resolve the issue of /api/generate returning HTTP 500 with {"error":"EOF"} when the prompt contains <tool_call>...</tool_call> style XML output, we need to modify the server-side code to handle this specific XML pattern more robustly. Here are the steps:

  • Update the XML parsing logic: Modify the XML parser to correctly handle the <tool_call> tags, even when they appear as plain text in the prompt.
  • Implement a try-except block: Wrap the code that generates the response in a try-except block to catch any exceptions that may occur when parsing the prompt.
  • Return a controlled error message: If an exception occurs, return a controlled error message instead of an internal server error.

Example code:

try:
    # Generate response
    response = generate_response(prompt)
    return {"model": model, "response": response}
except Exception as e:
    # Return a controlled error message
    return {"error": "Failed to generate response: " + str(e)}

In the generate_response function, we need to update the XML parsing logic to handle the <tool_call> tags:

import xml.etree.ElementTree as ET

def generate_response(prompt):
    # Parse the prompt as XML
    try:
        root = ET.fromstring(prompt)
    except ET.ParseError:
        # If the prompt is not valid XML, return a controlled error message
        return "Invalid XML prompt"
    
    # Handle the <tool_call> tags
    tool_calls = root.findall(".//tool_call")
    if tool_calls:
        # Generate the response based on the <tool_call> tags
        response = generate_tool_call_response(tool_calls)
        return response
    else:
        # Generate the response based on the prompt text
        return generate_text_response(prompt)

Verification

To verify that the fix worked, we can test the /api/generate endpoint with the same prompts that previously caused the HTTP 500 error. We should see a controlled error message or a successful response instead of an internal server error.

Extra Tips

  • Make sure to test the updated code thoroughly to ensure that it handles all possible edge cases.
  • Consider adding additional logging to help diagnose any issues that may arise in the future.
  • If the issue persists, try to isolate the problem by testing individual components of the code to identify the root cause.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The server should not return HTTP 500 / {"error":"EOF"} because the prompt contains <tool_call>-style XML text.

Even if Ollama or the model-side parser dislikes this format, the request should fail gracefully, for example by:

  • returning raw model text, or
  • returning a controlled parse error with a clear message

but not an internal server error.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING