langchain - ✅(Solved) Fix `ChatOllama.bind_tools()` with large system prompt suppresses reasoning output on `gemma4:26b` (investigating upstream) [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36569Fetched 2026-04-08 03:01:00
View on GitHub
Comments
4
Participants
3
Timeline
19
Reactions
0
Author
Assignees
Timeline (top)
commented ×4labeled ×3renamed ×3subscribed ×3

When ChatOllama is configured with reasoning=True and used with bind_tools() on a tool with a moderately complex schema, combined with a large system prompt (~2-3k tokens), gemma4:26b stops emitting reasoning_content in streamed chunks entirely.

Key observations from the original reporter's bisection (langchain-ai/deepagents#2445):

  • Large prompt alone: reasoning present
  • Simple tool alone: reasoning present
  • Complex tool + small prompt: reasoning present (reduced)
  • Complex tool + large prompt: reasoning drops to zero

The suppression appears specific to gemma4:26b — smaller Gemma 4 variants were not affected.

This may be a model-level quirk where the Ollama request payload (tool schemas + large system prompt) pushes the model into a mode where it skips reasoning. Worth investigating whether ChatOllama can structure the request differently (e.g., tool schema placement, prompt ordering) to avoid triggering this.

Error Message

Error Message and Stack Trace (if applicable)

No error — reasoning tokens are silently dropped. Expected output shows non-zero reasoning counts for all four cases; actual output:

Root Cause

When ChatOllama is configured with reasoning=True and used with bind_tools() on a tool with a moderately complex schema, combined with a large system prompt (~2-3k tokens), gemma4:26b stops emitting reasoning_content in streamed chunks entirely.

Key observations from the original reporter's bisection (langchain-ai/deepagents#2445):

  • Large prompt alone: reasoning present
  • Simple tool alone: reasoning present
  • Complex tool + small prompt: reasoning present (reduced)
  • Complex tool + large prompt: reasoning drops to zero

The suppression appears specific to gemma4:26b — smaller Gemma 4 variants were not affected.

This may be a model-level quirk where the Ollama request payload (tool schemas + large system prompt) pushes the model into a mode where it skips reasoning. Worth investigating whether ChatOllama can structure the request differently (e.g., tool schema placement, prompt ordering) to avoid triggering this.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

PR fix notes

PR #2: fix(ollama): isolate tool schemas in ChatOllama to prevent reasoning suppression

Description (problem / solution / changelog)

Fixes #36569.

When using ChatOllama.bind_tools() with moderately complex tool schemas and a massive SystemMessage prompt, some reasoning models (like gemma4:26b) stop emitting <think> reasoning traces.

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

Summary by CodeRabbit

  • Bug Fixes
    • Improved message handling for tool-calling scenarios to ensure proper system message management and correct message sequencing when tools are active.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Changed files

  • libs/partners/ollama/langchain_ollama/chat_models.py (modified, +13/-0)

Code Example

import asyncio

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama

# Any tool with a complex schema works — using a minimal stand-in for
# Deep Agents' TodoListMiddleware.write_todos tool, which has a large
# JSON schema with nested objects/enums.
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Literal


class TodoItem(BaseModel):
    id: str = Field(description="Unique identifier")
    title: str = Field(description="Short title")
    status: Literal["not_started", "in_progress", "completed"] = "not_started"
    priority: Literal["high", "medium", "low"] = "medium"
    details: str = Field(default="", description="Extended description")


class TodoList(BaseModel):
    todos: list[TodoItem] = Field(description="Full list of todo items")


@tool(args_schema=TodoList)
def write_todos(todos: list[TodoItem]) -> str:
    """Write and replace the full todo list."""
    return "ok"


# A large system prompt (~2-3k tokens), similar in size to Deep Agents'
# BASE_AGENT_PROMPT. The exact content doesn't matter — length does.
LARGE_PROMPT = "You are a helpful assistant.\n\n" + (
    "## Guidelines\n\n"
    + "\n".join(
        f"- Rule {i}: Follow best practices for step {i}." for i in range(200)
    )
)


async def count(label, runnable, messages):
    counts = {"reasoning": 0, "text": 0}
    async for chunk in runnable.astream(messages):
        ak = getattr(chunk, "additional_kwargs", None)
        rc = ak.get("reasoning_content") if isinstance(ak, dict) else None
        content = getattr(chunk, "content", None)
        if rc:
            counts["reasoning"] += 1
        elif content:
            counts["text"] += 1
    print(label, counts)


async def main():
    llm = ChatOllama(
        model="gemma4:26b",
        base_url="http://localhost:11434",
        reasoning=True,
        num_ctx=8192,
        temperature=0,
    )

    small = [
        SystemMessage(content="You are helpful."),
        HumanMessage(content="hola"),
    ]
    big = [
        SystemMessage(content=LARGE_PROMPT),
        HumanMessage(content="hola"),
    ]

    await count("no tools, small prompt", llm, small)
    await count("no tools, big prompt", llm, big)
    await count("tools, small prompt", llm.bind_tools([write_todos]), small)
    await count("tools, big prompt", llm.bind_tools([write_todos]), big)


asyncio.run(main())

---

no tools, small prompt {'reasoning': 12, 'text': 5}
no tools, big prompt {'reasoning': 15, 'text': 6}
tools, small prompt {'reasoning': 8, 'text': 4}
tools, big prompt {'reasoning': 0, 'text': 7}   # <-- reasoning gone
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain-ollama

Related Issues / PRs

Cross-ref from upstream report: langchain-ai/deepagents#2445

Reproduction Steps / Example Code (Python)

import asyncio

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama

# Any tool with a complex schema works — using a minimal stand-in for
# Deep Agents' TodoListMiddleware.write_todos tool, which has a large
# JSON schema with nested objects/enums.
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Literal


class TodoItem(BaseModel):
    id: str = Field(description="Unique identifier")
    title: str = Field(description="Short title")
    status: Literal["not_started", "in_progress", "completed"] = "not_started"
    priority: Literal["high", "medium", "low"] = "medium"
    details: str = Field(default="", description="Extended description")


class TodoList(BaseModel):
    todos: list[TodoItem] = Field(description="Full list of todo items")


@tool(args_schema=TodoList)
def write_todos(todos: list[TodoItem]) -> str:
    """Write and replace the full todo list."""
    return "ok"


# A large system prompt (~2-3k tokens), similar in size to Deep Agents'
# BASE_AGENT_PROMPT. The exact content doesn't matter — length does.
LARGE_PROMPT = "You are a helpful assistant.\n\n" + (
    "## Guidelines\n\n"
    + "\n".join(
        f"- Rule {i}: Follow best practices for step {i}." for i in range(200)
    )
)


async def count(label, runnable, messages):
    counts = {"reasoning": 0, "text": 0}
    async for chunk in runnable.astream(messages):
        ak = getattr(chunk, "additional_kwargs", None)
        rc = ak.get("reasoning_content") if isinstance(ak, dict) else None
        content = getattr(chunk, "content", None)
        if rc:
            counts["reasoning"] += 1
        elif content:
            counts["text"] += 1
    print(label, counts)


async def main():
    llm = ChatOllama(
        model="gemma4:26b",
        base_url="http://localhost:11434",
        reasoning=True,
        num_ctx=8192,
        temperature=0,
    )

    small = [
        SystemMessage(content="You are helpful."),
        HumanMessage(content="hola"),
    ]
    big = [
        SystemMessage(content=LARGE_PROMPT),
        HumanMessage(content="hola"),
    ]

    await count("no tools, small prompt", llm, small)
    await count("no tools, big prompt", llm, big)
    await count("tools, small prompt", llm.bind_tools([write_todos]), small)
    await count("tools, big prompt", llm.bind_tools([write_todos]), big)


asyncio.run(main())

Error Message and Stack Trace (if applicable)

No error — reasoning tokens are silently dropped. Expected output shows non-zero reasoning counts for all four cases; actual output:

no tools, small prompt {'reasoning': 12, 'text': 5}
no tools, big prompt {'reasoning': 15, 'text': 6}
tools, small prompt {'reasoning': 8, 'text': 4}
tools, big prompt {'reasoning': 0, 'text': 7}   # <-- reasoning gone

Description

When ChatOllama is configured with reasoning=True and used with bind_tools() on a tool with a moderately complex schema, combined with a large system prompt (~2-3k tokens), gemma4:26b stops emitting reasoning_content in streamed chunks entirely.

Key observations from the original reporter's bisection (langchain-ai/deepagents#2445):

  • Large prompt alone: reasoning present
  • Simple tool alone: reasoning present
  • Complex tool + small prompt: reasoning present (reduced)
  • Complex tool + large prompt: reasoning drops to zero

The suppression appears specific to gemma4:26b — smaller Gemma 4 variants were not affected.

This may be a model-level quirk where the Ollama request payload (tool schemas + large system prompt) pushes the model into a mode where it skips reasoning. Worth investigating whether ChatOllama can structure the request differently (e.g., tool schema placement, prompt ordering) to avoid triggering this.

System Info

Reported versions from langchain-ai/deepagents#2445:

  • langchain-ollama: 1.0.1
  • langchain: 1.2.14
  • langchain_core: (paired with langchain 1.2.14)
  • ollama python package: 0.6.1
  • Model: gemma4:26b via local Ollama server
  • Platform: Linux

extent analysis

TL;DR

The issue can be mitigated by restructuring the ChatOllama request to avoid triggering the model's reasoning suppression, potentially by adjusting tool schema placement or prompt ordering.

Guidance

  • Investigate alternative ways to structure the Ollama request payload, such as placing the tool schema before or after the large system prompt, to see if this affects the model's reasoning output.
  • Test the same scenario with smaller Gemma 4 variants to confirm that the issue is specific to gemma4:26b.
  • Consider adding logging or debugging statements to the ChatOllama code to inspect the request payload and response from the Ollama server, which may provide insight into the cause of the reasoning suppression.
  • Evaluate the impact of reducing the complexity of the tool schema or the size of the system prompt on the model's reasoning output.

Example

No code example is provided, as the issue is likely related to the interaction between the ChatOllama library and the Ollama model, and requires further investigation into the request payload and response.

Notes

The root cause of the issue is unclear, but it appears to be related to the specific combination of the gemma4:26b model, the complex tool schema, and the large system prompt. Further experimentation and debugging are needed to determine the best course of action.

Recommendation

Apply a workaround by restructuring the ChatOllama request payload, as this may help avoid triggering the model's reasoning suppression. This approach is recommended because it does not require modifying the underlying model or library code, and can be implemented and tested quickly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING