ollama - ✅(Solved) Fix OpenAI-compatible streaming: tool_calls index is always 0 for multiple tool calls [2 pull requests]

ollama2026-04-09 17:25:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Workaround

HTTP proxy that reassigns correct sequential indices based on unique tool call id values: opencode-bench/fix-proxy

PR fix notes

PR #14277: fix(openai-compatible): handle reused tool call index from Ollama

Repository: vercel/ai
Author: CPIDLE
State: open | merged: False
Link: https://github.com/vercel/ai/pull/14277

Description (problem / solution / changelog)

Summary

Fix streaming parser to handle providers (Ollama) that reuse index: 0 for all tool calls in a single response
Add test case for parallel tool calls with reused indices

Problem

Ollama's OpenAI-compatible streaming API sends all tool calls with index: 0 instead of incrementing (0, 1, 2...). This causes the @ai-sdk/openai-compatible streaming parser to either:

Skip the 2nd+ tool calls (when the first has already hasFinished)
Merge their arguments into the first call (corrupting the JSON)

This results in a 100% failure rate on any task requiring multiple tool calls in one response.

Verified with: Ollama 0.20.2/0.20.4, models: qwen3-coder:30b, gemma4:e4b, qwen2.5-coder:7b

Ollama issue: https://github.com/ollama/ollama/issues/15457

Fix

When toolCalls[index] is already occupied by a different tool call (different id), treat the incoming delta as a new tool call by assigning index = toolCalls.length.

This is backwards-compatible: providers that send correct indices are unaffected since the id will match the existing entry.

Test plan

Added test: should handle parallel tool calls with reused index (Ollama bug)
Verified with real Ollama streaming: dual-file write goes from 0/12 to 12/12 pass
Existing parallel tool call tests still pass (correct indices unaffected)

Changed files

packages/openai-compatible/src/chat/openai-compatible-chat-language-model.test.ts (modified, +68/-0)
packages/openai-compatible/src/chat/openai-compatible-chat-language-model.ts (modified, +14/-1)

PR #15467: model/parsers: fix missing parallel tool call indices

Repository: ollama/ollama
Author: drifkin
State: closed | merged: True
Link: https://github.com/ollama/ollama/pull/15467

Description (problem / solution / changelog)

We were missing setting the function index for several models that can make parallel tool calls.

In the future we may want to consider putting some sort of post-parse hook and relieve the parsers of this duty.

Fixes: #15457

Changed files

model/parsers/cogito.go (modified, +9/-2)
model/parsers/cogito_test.go (modified, +4/-2)
model/parsers/deepseek3.go (modified, +7/-0)
model/parsers/deepseek3_test.go (modified, +4/-2)
model/parsers/functiongemma.go (modified, +10/-3)
model/parsers/functiongemma_test.go (modified, +6/-0)
model/parsers/gemma4.go (modified, +7/-0)
model/parsers/gemma4_test.go (modified, +4/-2)
model/parsers/lfm2.go (modified, +7/-0)
model/parsers/lfm2_test.go (modified, +8/-4)
model/parsers/ministral.go (modified, +7/-0)
model/parsers/ministral_test.go (modified, +49/-0)
model/parsers/olmo3.go (modified, +9/-2)
model/parsers/olmo3_test.go (modified, +2/-0)
model/parsers/qwen3vl.go (modified, +7/-0)
model/parsers/qwen3vl_nonthinking_test.go (modified, +46/-0)

Code Example

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b",
    "stream": true,
    "messages": [
      {"role": "system", "content": "Use the provided tools."},
      {"role": "user", "content": "Create hello.py with print(\"hello\") and world.py with print(\"world\")."}
    ],
    "tools": [{
      "type": "function",
      "function": {
        "name": "file_write",
        "description": "Write content to a file",
        "parameters": {
          "type": "object",
          "properties": {
            "filePath": {"type": "string"},
            "content": {"type": "string"}
          },
          "required": ["filePath", "content"]
        }
      }
    }]
  }'

---

data: {...,"tool_calls":[{"id":"abc123","function":{"arguments":"{\"filePath\":\"hello.py\",...}","name":"file_write"},"type":"function","index":0}]}
data: {...,"tool_calls":[{"id":"def456","function":{"arguments":"{\"filePath\":\"world.py\",...}","name":"file_write"},"type":"function","index":0}]}

---

data: {...,"tool_calls":[{...,"index":0}]}   # first tool call
data: {...,"tool_calls":[{...,"index":1}]}   # second tool call

RAW_BUFFERClick to expand / collapse

What is the issue?

When a model returns multiple tool calls in a single streaming response via the OpenAI-compatible API (/v1/chat/completions), all tool call chunks have index: 0 instead of incrementing indices (0, 1, 2...).

This is different from #7881 which was about the index field being missing entirely (fixed in v0.4.7). The field is now present but always 0.

Reproduction

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b",
    "stream": true,
    "messages": [
      {"role": "system", "content": "Use the provided tools."},
      {"role": "user", "content": "Create hello.py with print(\"hello\") and world.py with print(\"world\")."}
    ],
    "tools": [{
      "type": "function",
      "function": {
        "name": "file_write",
        "description": "Write content to a file",
        "parameters": {
          "type": "object",
          "properties": {
            "filePath": {"type": "string"},
            "content": {"type": "string"}
          },
          "required": ["filePath", "content"]
        }
      }
    }]
  }'

Actual output (abbreviated)

data: {...,"tool_calls":[{"id":"abc123","function":{"arguments":"{\"filePath\":\"hello.py\",...}","name":"file_write"},"type":"function","index":0}]}
data: {...,"tool_calls":[{"id":"def456","function":{"arguments":"{\"filePath\":\"world.py\",...}","name":"file_write"},"type":"function","index":0}]}

Both chunks have "index": 0 despite having different id values.

Expected output

The second tool call should have "index": 1:

data: {...,"tool_calls":[{...,"index":0}]}   # first tool call
data: {...,"tool_calls":[{...,"index":1}]}   # second tool call

Per OpenAI's streaming spec, index should enumerate tool calls sequentially.

Impact

The Vercel AI SDK (@ai-sdk/openai-compatible) uses index as the array key to track tool calls. When all indices are 0, the second tool call either gets merged into the first or silently dropped, causing 100% failure rate on any task requiring multiple tool calls in one response.

Tested with:

Ollama 0.20.2 (local) and 0.20.4 (remote)
Models: qwen2.5-coder:7b, qwen3-coder:30b, gemma4:e2b, gemma4:e4b
Also occurs when going through LiteLLM (ollama_chat/ backend)

Workaround

HTTP proxy that reassigns correct sequential indices based on unique tool call id values: opencode-bench/fix-proxy

Environment

OS: Linux (DGX Spark) + Windows 11
Ollama: 0.20.2 / 0.20.4
GPU: RTX 4090 / Grace CPU

extent analysis

TL;DR

The issue can be fixed by implementing a correction to the index field in the tool call responses to ensure sequential enumeration.

Guidance

Verify that the issue is indeed caused by the non-incrementing index field in the tool call responses by checking the API responses for multiple tool calls.
Consider using the provided HTTP proxy workaround (opencode-bench/fix-proxy) to reassign correct sequential indices based on unique tool call id values.
Review the OpenAI-compatible API (/v1/chat/completions) implementation to identify why the index field is not being incremented correctly.
Test the fix with different models and environments to ensure the issue is fully resolved.

Example

No code snippet is provided as the issue is related to the API response and not a specific code implementation.

Notes

The issue seems to be specific to the Ollama API implementation and not a general problem with the OpenAI-compatible API. The provided workaround using an HTTP proxy can be a temporary solution until the root cause is identified and fixed.

Recommendation

Apply the workaround using the HTTP proxy (opencode-bench/fix-proxy) to correct the index field in the tool call responses, as this provides a immediate solution to the problem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #environment setup #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

ollama - ✅(Solved) Fix OpenAI-compatible streaming: tool_calls index is always 0 for multiple tool calls [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

PR fix notes

PR #14277: fix(openai-compatible): handle reused tool call index from Ollama

Description (problem / solution / changelog)

Summary

Problem

Fix

Test plan

Changed files

PR #15467: model/parsers: fix missing parallel tool call indices

Description (problem / solution / changelog)

Changed files

Code Example

What is the issue?

Reproduction

Actual output (abbreviated)

Expected output

Impact

Workaround

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING