Error Message

From the user's point of view, the command takes a long time and eventually starts showing messages about timeouts. grep ERROR ~/.claude/debug/latest shows [ERROR] Error streaming, falling back to non-streaming mode: The operation timed out. and things like [ERROR] API error (attempt 1/11): undefined Request timed out.. No file is created.

Fix Action

Fixed

Fixed by PR: Fix streaming timeout during tool call composition (https://github.com/ollama/ollama/pull/14932)

PR fix notes

PR #14932: Fix streaming timeout during tool call composition

Repository: ollama/ollama
Author: joaquinhuigomez
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/14932

Description (problem / solution / changelog)

(No description)

Changed files

anthropic/anthropic.go (modified, +11/-0)
server/routes.go (modified, +4/-1)

Code Example

$ ollama ps
NAME                    ID              SIZE     PROCESSOR          CONTEXT    UNTIL               
glm-4.7-flash:latest    d1a8a26252f1    21 GB    46%/54% CPU/GPU    65536      59 minutes from now

---

if req.Stream != nil && *req.Stream {
    ch <- res
}

---

if r.Message.Thinking == "" && r.Message.Content == "" && !r.Done && len(r.Message.ToolCalls) == 0 {
    events = append(events, StreamEvent{
                    Event: "ping",
                    Data: PingEvent{
                              Type: "ping",
                    },
             })
}

This is a simpler approach that would also resolve the behavior described in https://github.com/ollama/ollama/issues/14858 . I'll redescribe the issue so you don't have to read an obsolete request.

Environment

Ollama 0.18
Claude Code 2.1.70
OS/hardware: Ubuntu 24.04, AMD Ryzen 5, GeForce RTX 3060

If it matters, I'm running everything locally, ethernet cable unplugged.

$ ollama ps
NAME                    ID              SIZE     PROCESSOR          CONTEXT    UNTIL               
glm-4.7-flash:latest    d1a8a26252f1    21 GB    46%/54% CPU/GPU    65536      59 minutes from now

Summary

While a model is composing a tool call, no responses are sent to the client, even in streaming mode. During that time, the client may time out, causing the request to repeatedly fail and be retried for (in my testing) about an hour.

At least for the Anthropic API, this can be avoided by sending ping events - this works for me on a local build.

Reproduction Steps

Run Claude Code via Ollama: ollama launch claude --model glm-4.7-flash
At the prompt, enter something like Recall a summary of each book of the Old Testament and write them to old_testament_summaries.md

Note: While this is sufficient to reproduce the behavior on my system, the underlying issue is a timeout; a more powerful computer might be able to successfully complete the task quickly enough that you don't see it. Setting CUDA_VISIBLE_DEVICES="" to force CPU inference or specifying a longer text should make the issue more obvious.

Desired behavior

old_testament_summaries.md should be created and contain text.

Current behavior

Observations

https://ollama.com/blog/streaming-tool initially made me think this was expected to work, but reading it more closely I see that it only talks about streaming tool responses, not composing the arguments, so I'm filing this as a feature request instead of as a bug.

I believe this is due to tools/tools.go:Add() gathering up responses before parsing them into a ToolCall. Certainly anthropic/anthropic.go:Process() is not set up to stream tool call arguments - it expects tool calls to be given in one shot. A packet trace supports this theory - I see Ollama talking to the model and coming up with text, while the Claude-Ollama connection has no traffic and Claude eventually gives up.

Proposed Changes

Disclaimer: While this suggestion fixes my use case, I'm not confident this won't break other things - someone more knowledgeable needs to verify this is a good approach.

server/routes.go:ChatHandler

https://github.com/ollama/ollama/blob/bbbad97686205cfd897a9e4e931889a3598a0652/server/routes.go#L2443-L2448 only passes along events if the parser comes up with non-empty output. When tools/tools.go:Add() is gathering up responses, it generates empty output, and so during that time nothing makes it to anthropic/anthropic.go:Process()

Instead of doing nothing when there's empty output, other layers should still be notified so they can react if desired.

if req.Stream != nil && *req.Stream {
    ch <- res
}

anthropic/anthropic.go:Process()

anthropic.go defines a PingEvent, though it's never used anywhere. If Process sees an empty message, it could emit a PingEvent:

if r.Message.Thinking == "" && r.Message.Content == "" && !r.Done && len(r.Message.ToolCalls) == 0 {
    events = append(events, StreamEvent{
                    Event: "ping",
                    Data: PingEvent{
                              Type: "ping",
                    },
             })
}

Considerations

https://github.com/ollama/ollama/blob/bbbad97686205cfd897a9e4e931889a3598a0652/server/routes.go#L610-L614 suggests that quashing empty events was a conscious choice to avoid triggering bad client behavior. What was the bad behavior and is there another way to avoid it? Does a similar change need to happen here?

extent analysis

Fix Plan

To resolve the issue of no responses being sent to the client while a model is composing a tool call, we need to modify the code to send ping events in streaming mode. Here are the steps:

In server/routes.go:ChatHandler, modify the code to pass along events even if the parser comes up with empty output:

if req.Stream != nil && *req.Stream {
    ch <- res
}

In anthropic/anthropic.go:Process(), add a check for empty messages and emit a PingEvent:

if r.Message.Thinking == "" && r.Message.Content == "" && !r.Done && len(r.Message.ToolCalls) == 0 {
    events = append(events, StreamEvent{
                    Event: "ping",
                    Data: PingEvent{
                              Type: "ping",
                    },
             })
}

Verification

To verify that the fix worked, run the reproduction steps again and check for the following:

The client should no longer time out while the model is composing a tool call.
The old_testament_summaries.md file should be created and contain text.
The debug logs should no longer show timeout errors.

Extra Tips

Make sure to test the changes thoroughly to avoid introducing any regressions.
Consider adding additional logging or monitoring to detect any potential issues with the ping events.
Review the code changes to ensure they align with the project's coding standards and best practices.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix Feature Request: anthropic/anthropic.go:Process() should send ping events during tool call composition to avoid a streaming timeout [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #14932: Fix streaming timeout during tool call composition

Description (problem / solution / changelog)

Changed files

Code Example

Environment

Summary

Reproduction Steps

Desired behavior

Current behavior

Observations

Proposed Changes

server/routes.go:ChatHandler

anthropic/anthropic.go:Process()

Considerations

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix Feature Request: anthropic/anthropic.go:Process() should send ping events during tool call composition to avoid a streaming timeout [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #14932: Fix streaming timeout during tool call composition

Description (problem / solution / changelog)

Changed files

Code Example

Environment

Summary

Reproduction Steps

Desired behavior

Current behavior

Observations

Proposed Changes

server/routes.go:ChatHandler

anthropic/anthropic.go:Process()

Considerations

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING