ollama - 💡(How to fix) Fix Tool calling is not streaming on macOS with MLX, causing timeout when write tool outputs large code

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When using tool calling with MLX backend on macOS, the tool call responses are not streaming. This causes severe issues when a tool like write needs to output large amounts of code — the client waits a long time without any response and eventually times out.

Root Cause

When using tool calling with MLX backend on macOS, the tool call responses are not streaming. This causes severe issues when a tool like write needs to output large amounts of code — the client waits a long time without any response and eventually times out.

Code Example

ollama pull qwen3.6:27b-coding-nvfp4
RAW_BUFFERClick to expand / collapse

Summary

When using tool calling with MLX backend on macOS, the tool call responses are not streaming. This causes severe issues when a tool like write needs to output large amounts of code — the client waits a long time without any response and eventually times out.

Environment

  • OS: macOS (Apple Silicon)
  • Backend: MLX
  • Model: qwen3.6:27b-coding-nvfp4
  • Ollama version: Latest main

Expected behavior

Tool call outputs should be streamed incrementally to the client, just like regular text completion streaming. This matches the behavior when not using MLX (e.g., CUDA backend), where tool calls stream properly and clients don't timeout.

Actual behavior

When a tool call is triggered (e.g., write tool generating a large file with hundreds or thousands of lines of code), the entire tool output is buffered and only sent at the very end. This means:

  1. The client receives no intermediate chunks for an extended period
  2. For large code outputs, the wait can be 30 seconds or more
  3. Client-side timeouts are triggered (typically around 30-60s depending on the client)
  4. The request fails even though Ollama is still generating in the background

Steps to reproduce

  1. Pull the model on macOS with MLX support:
ollama pull qwen3.6:27b-coding-nvfp4
  1. Use the chat API with tool/function calling enabled, providing a prompt that triggers a tool to generate a large file (e.g., 500+ lines of code)

  2. Observe that during tool execution, no streaming chunks are received by the client until the entire tool output is complete

  3. The client times out before receiving the final response

Additional context

This is particularly problematic for coding models used in agent workflows where code generation via tools is very common. Non-streaming tool calls significantly degrade the user experience and reliability of AI coding assistants on macOS.

This issue may be related to how the MLX backend handles tool call chunking/streaming compared to other backends.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Tool call outputs should be streamed incrementally to the client, just like regular text completion streaming. This matches the behavior when not using MLX (e.g., CUDA backend), where tool calls stream properly and clients don't timeout.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Tool calling is not streaming on macOS with MLX, causing timeout when write tool outputs large code