ollama - 💡(How to fix) Fix Claude Code & Ollama Integration - Invalid tool parameters & CPU Fallback [8 comments, 3 participants]

ollama2026-04-07 12:39:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15390•Fetched 2026-04-08 03:01:15

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×8labeled ×1

When using Claude Code (CLI) with a local Ollama instance, the agent consistently fails during tool execution (e.g., entering "Plan Mode" or reading files). The model generates invalid JSON for the tool calls, leading to a loop of "Invalid tool parameters" errors.

Additionally, specific configurations cause extreme CPU spikes (100%+) and slow response times (50s+), which seems to be related to an unintended vision-processing overhead and a Flash Attention fallback.

Root Cause

Code Example

environment:
      - OLLAMA_SCHED_SPREAD=true
      - OLLAMA_NUM_CTX=32768
      - OLLAMA_FLASH_ATTENTION=0  # Setting to 1 causes 100% CPU load instead of GPU boost
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

---

time=2026-04-07T12:23:41.133Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc ... FlashAttention:Disabled KvSize:32768 ...}"
time=2026-04-07T12:23:41.392Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=147.336691ms shape="[2560 256]"
time=2026-04-07T12:23:41.579Z level=INFO source=ggml.go:494 msg="offloaded 43/43 layers to GPU"
[GIN] 2026/04/07 - 12:24:32 | 200 | 53.364869091s | 192.168.66.36 | POST "/v1/messages?beta=true"

RAW_BUFFERClick to expand / collapse

What is the issue?

Description

System Environment

OS: Linux (Docker Deployment)
GPU: 2x NVIDIA GeForce RTX 3060 (12GB VRAM each)
Ollama Version: 0.20.3
Model: gemma4 (Local blob: sha256-4c27e0f5...)
Claude Code Command: ollama launch cloude

Docker Configuration (`docker-compose.yml`)

    environment:
      - OLLAMA_SCHED_SPREAD=true
      - OLLAMA_NUM_CTX=32768
      - OLLAMA_FLASH_ATTENTION=0  # Setting to 1 causes 100% CPU load instead of GPU boost
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Steps to Reproduce

Connect Claude Code to the local Ollama instance.

Provide a complex coding task (e.g., "Fix my project architecture and MQTT connection").

The agent attempts to initialize its internal "Plan Mode" tool.

The CLI returns: ⎿ Invalid tool parameters.

The model enters a loop: It apologizes for the wrong parameters and retries with the same (or similarly broken) JSON schema until the process is aborted.

Suspected Causes

Tool Parameter Formatting: The gemma4 model (likely due to its template or architecture) does not produce the exact JSON schema required by Claude Code's tool definitions.

Vision Encoder Overhead: The runner executes vision-related code (vision: encoded) for code-only prompts, which increases latency significantly and might interfere with the attention mechanism.

Flash Attention Regression: OLLAMA_FLASH_ATTENTION=1 results in a massive CPU spike. This suggests that the presence of the vision projector forces a fallback to a CPU-based attention implementation that is not optimized for large contexts.

Context Management: There is a discrepancy between the requested NUM_CTX=32768 and the actual prompt processing speed/stability wenn tools are involved.

Additional Context

The CPU load remains normal only wenn OLLAMA_FLASH_ATTENTION is disabled. However, the tool-calling issue persists regardless of this setting, preventing the agent from completing multi-step tasks.

Relevant log output

time=2026-04-07T12:23:41.133Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc ... FlashAttention:Disabled KvSize:32768 ...}"
time=2026-04-07T12:23:41.392Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=147.336691ms shape="[2560 256]"
time=2026-04-07T12:23:41.579Z level=INFO source=ggml.go:494 msg="offloaded 43/43 layers to GPU"
[GIN] 2026/04/07 - 12:24:32 | 200 | 53.364869091s | 192.168.66.36 | POST "/v1/messages?beta=true"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

TL;DR

Disable OLLAMA_FLASH_ATTENTION and adjust OLLAMA_NUM_CTX to a lower value to mitigate CPU spikes and tool parameter issues.

Guidance

Verify that the gemma4 model template or architecture is compatible with Claude Code's tool definitions to resolve the JSON schema mismatch.
Investigate the vision encoder overhead by checking if the vision: encoded log message is related to the tool execution, and consider optimizing or disabling vision-related code for code-only prompts.
Test with a lower OLLAMA_NUM_CTX value (e.g., 16384 or 8192) to reduce context management discrepancies and potential performance issues.
Monitor CPU load and response times after applying these changes to ensure the fixes are effective.

Example

No code snippet is provided as the issue is related to configuration and model compatibility.

Notes

The provided log output suggests that the OLLAMA_FLASH_ATTENTION setting has a significant impact on CPU load, and disabling it may help mitigate the issue. However, the root cause of the tool parameter formatting issue remains unclear and may require further investigation into the gemma4 model or Claude Code's tool definitions.

Recommendation

Apply workaround: Disable OLLAMA_FLASH_ATTENTION and adjust OLLAMA_NUM_CTX to a lower value, as this may help mitigate the CPU spikes and tool parameter issues, allowing for further debugging and potential resolution of the underlying compatibility problem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#generation error #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Claude Code & Ollama Integration - Invalid tool parameters & CPU Fallback [8 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What is the issue?

Description

System Environment

Docker Configuration (`docker-compose.yml`)

Steps to Reproduce

Suspected Causes

Additional Context

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Claude Code & Ollama Integration - Invalid tool parameters & CPU Fallback [8 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What is the issue?

Description

System Environment

Docker Configuration (docker-compose.yml)

Steps to Reproduce

Suspected Causes

Additional Context

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Docker Configuration (`docker-compose.yml`)