ollama - 💡(How to fix) Fix Regression: Severe queue delay + tool loop hang in Ollama v0.23.2 (MLX / macOS)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

What is the issue?

Summary

Ollama v0.23.2 introduces two major regressions on macOS (Apple Silicon, MLX backend):

Severe queue delay — up to 2–3 minutes between tasks Tool loop hang — model enters repeated search_web / fetch_url loop with no completion

These issues were not present in v0.23.1 and render v0.23.2 unsuitable for production use.

Environment OS: macOS (Apple Silicon) Hardware: Apple system (MLX backend in use) Ollama version: 0.23.2 Previous working version: 0.23.1 Interface: Open WebUI Model: qwen3.6:35b-a3b-mlx-bf16 Issue 1 — Queue Delay Regression Behavior After completing a request, the next request is delayed significantly Observed delay: 2–3 minutes between tasks Occurs even with: no concurrent jobs idle system sufficient RAM (no memory pressure) Expected Behavior Immediate or near-immediate task start (as in v0.23.1) Actual Behavior Requests sit idle before execution begins Appears to be queueing or scheduling regression Issue 2 — Tool Loop Hang (search_web / fetch_url) Behavior Model enters repeated tool calls: search_web fetch_url No final response is produced Loop continues indefinitely Observations Occurs without any other jobs pending Seen in Open WebUI tool activity panel: multiple search calls growing source list (e.g., 10+ sources) Requires manual interruption Failure Rate Observed in 2 out of 5 prompts (~40%) Expected Behavior Model completes tool use and returns a final answer Actual Behavior Infinite or long-running tool loop with no completion Reproduction Queue Delay

Run a prompt using:

ollama run qwen3.6:35b-a3b-mlx-bf16 After completion, immediately submit another prompt Observe delay before execution begins (up to several minutes) Tool Loop Use Open WebUI with tool-enabled model Submit a prompt requiring web lookup Observe repeated search_web / fetch_url calls No final response returned Impact Breaks multi-user and sequential workflows Makes system appear unresponsive Requires manual intervention to stop tool loops Not suitable for production environments Additional Notes System shows no resource constraints (RAM healthy, minimal swap) No concurrency required to reproduce Appears related to: scheduling / queue handling tool execution loop control Request

Please investigate:

task scheduling / queue handling changes in 0.23.2 tool execution termination conditions interaction between MLX backend and tool loop handling

Happy to provide additional logs or run targeted tests if needed.

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Regression: Severe queue delay + tool loop hang in Ollama v0.23.2 (MLX / macOS)