langchain - ✅(Solved) Fix `ChatOllama.bind_tools()` with large system prompt suppresses reasoning output on `gemma4:26b` (investigating upstream) [1 pull requests, 4 comments, 3 participants]

langchain2026-04-06 18:45:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#36569•Fetched 2026-04-08 03:01:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

commented ×4labeled ×3renamed ×3subscribed ×3

When ChatOllama is configured with reasoning=True and used with bind_tools() on a tool with a moderately complex schema, combined with a large system prompt (~2-3k tokens), gemma4:26b stops emitting reasoning_content in streamed chunks entirely.

Key observations from the original reporter's bisection (langchain-ai/deepagents#2445):

Large prompt alone: reasoning present
Simple tool alone: reasoning present
Complex tool + small prompt: reasoning present (reduced)
Complex tool + large prompt: reasoning drops to zero

The suppression appears specific to gemma4:26b — smaller Gemma 4 variants were not affected.

This may be a model-level quirk where the Ollama request payload (tool schemas + large system prompt) pushes the model into a mode where it skips reasoning. Worth investigating whether ChatOllama can structure the request differently (e.g., tool schema placement, prompt ordering) to avoid triggering this.

Error Message

Error Message and Stack Trace (if applicable)

No error — reasoning tokens are silently dropped. Expected output shows non-zero reasoning counts for all four cases; actual output:

Root Cause

Key observations from the original reporter's bisection (langchain-ai/deepagents#2445):

Large prompt alone: reasoning present
Simple tool alone: reasoning present
Complex tool + small prompt: reasoning present (reduced)
Complex tool + large prompt: reasoning drops to zero

The suppression appears specific to gemma4:26b — smaller Gemma 4 variants were not affected.

Fix Action

Fix / Workaround

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

PR fix notes

PR #2: fix(ollama): isolate tool schemas in ChatOllama to prevent reasoning suppression

Repository: isatyamks/langchain
Author: isatyamks
State: open | merged: False
Link: https://github.com/isatyamks/langchain/pull/2

Description (problem / solution / changelog)

Fixes #36569.

When using ChatOllama.bind_tools() with moderately complex tool schemas and a massive SystemMessage prompt, some reasoning models (like gemma4:26b) stop emitting <think> reasoning traces.

Summary by CodeRabbit

Bug Fixes
- Improved message handling for tool-calling scenarios to ensure proper system message management and correct message sequencing when tools are active.

Changed files

libs/partners/ollama/langchain_ollama/chat_models.py (modified, +13/-0)

Code Example

import asyncio

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama

# Any tool with a complex schema works — using a minimal stand-in for
# Deep Agents' TodoListMiddleware.write_todos tool, which has a large
# JSON schema with nested objects/enums.
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Literal


class TodoItem(BaseModel):
    id: str = Field(description="Unique identifier")
    title: str = Field(description="Short title")
    status: Literal["not_started", "in_progress", "completed"] = "not_started"
    priority: Literal["high", "medium", "low"] = "medium"
    details: str = Field(default="", description="Extended description")


class TodoList(BaseModel):
    todos: list[TodoItem] = Field(description="Full list of todo items")


@tool(args_schema=TodoList)
def write_todos(todos: list[TodoItem]) -> str:
    """Write and replace the full todo list."""
    return "ok"


# A large system prompt (~2-3k tokens), similar in size to Deep Agents'
# BASE_AGENT_PROMPT. The exact content doesn't matter — length does.
LARGE_PROMPT = "You are a helpful assistant.\n\n" + (
    "## Guidelines\n\n"
    + "\n".join(
        f"- Rule {i}: Follow best practices for step {i}." for i in range(200)
    )
)


async def count(label, runnable, messages):
    counts = {"reasoning": 0, "text": 0}
    async for chunk in runnable.astream(messages):
        ak = getattr(chunk, "additional_kwargs", None)
        rc = ak.get("reasoning_content") if isinstance(ak, dict) else None
        content = getattr(chunk, "content", None)
        if rc:
            counts["reasoning"] += 1
        elif content:
            counts["text"] += 1
    print(label, counts)


async def main():
    llm = ChatOllama(
        model="gemma4:26b",
        base_url="http://localhost:11434",
        reasoning=True,
        num_ctx=8192,
        temperature=0,
    )

    small = [
        SystemMessage(content="You are helpful."),
        HumanMessage(content="hola"),
    ]
    big = [
        SystemMessage(content=LARGE_PROMPT),
        HumanMessage(content="hola"),
    ]

    await count("no tools, small prompt", llm, small)
    await count("no tools, big prompt", llm, big)
    await count("tools, small prompt", llm.bind_tools([write_todos]), small)
    await count("tools, big prompt", llm.bind_tools([write_todos]), big)


asyncio.run(main())

---

no tools, small prompt {'reasoning': 12, 'text': 5}
no tools, big prompt {'reasoning': 15, 'text': 6}
tools, small prompt {'reasoning': 8, 'text': 4}
tools, big prompt {'reasoning': 0, 'text': 7}   # <-- reasoning gone

RAW_BUFFERClick to expand / collapse

Checked other resources

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

langchain-ollama

Related Issues / PRs

Cross-ref from upstream report: langchain-ai/deepagents#2445

Reproduction Steps / Example Code (Python)

import asyncio

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama

# Any tool with a complex schema works — using a minimal stand-in for
# Deep Agents' TodoListMiddleware.write_todos tool, which has a large
# JSON schema with nested objects/enums.
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Literal


class TodoItem(BaseModel):
    id: str = Field(description="Unique identifier")
    title: str = Field(description="Short title")
    status: Literal["not_started", "in_progress", "completed"] = "not_started"
    priority: Literal["high", "medium", "low"] = "medium"
    details: str = Field(default="", description="Extended description")


class TodoList(BaseModel):
    todos: list[TodoItem] = Field(description="Full list of todo items")


@tool(args_schema=TodoList)
def write_todos(todos: list[TodoItem]) -> str:
    """Write and replace the full todo list."""
    return "ok"


# A large system prompt (~2-3k tokens), similar in size to Deep Agents'
# BASE_AGENT_PROMPT. The exact content doesn't matter — length does.
LARGE_PROMPT = "You are a helpful assistant.\n\n" + (
    "## Guidelines\n\n"
    + "\n".join(
        f"- Rule {i}: Follow best practices for step {i}." for i in range(200)
    )
)


async def count(label, runnable, messages):
    counts = {"reasoning": 0, "text": 0}
    async for chunk in runnable.astream(messages):
        ak = getattr(chunk, "additional_kwargs", None)
        rc = ak.get("reasoning_content") if isinstance(ak, dict) else None
        content = getattr(chunk, "content", None)
        if rc:
            counts["reasoning"] += 1
        elif content:
            counts["text"] += 1
    print(label, counts)


async def main():
    llm = ChatOllama(
        model="gemma4:26b",
        base_url="http://localhost:11434",
        reasoning=True,
        num_ctx=8192,
        temperature=0,
    )

    small = [
        SystemMessage(content="You are helpful."),
        HumanMessage(content="hola"),
    ]
    big = [
        SystemMessage(content=LARGE_PROMPT),
        HumanMessage(content="hola"),
    ]

    await count("no tools, small prompt", llm, small)
    await count("no tools, big prompt", llm, big)
    await count("tools, small prompt", llm.bind_tools([write_todos]), small)
    await count("tools, big prompt", llm.bind_tools([write_todos]), big)


asyncio.run(main())

Error Message and Stack Trace (if applicable)

No error — reasoning tokens are silently dropped. Expected output shows non-zero reasoning counts for all four cases; actual output:

no tools, small prompt {'reasoning': 12, 'text': 5}
no tools, big prompt {'reasoning': 15, 'text': 6}
tools, small prompt {'reasoning': 8, 'text': 4}
tools, big prompt {'reasoning': 0, 'text': 7}   # <-- reasoning gone

Description

Key observations from the original reporter's bisection (langchain-ai/deepagents#2445):

Large prompt alone: reasoning present
Simple tool alone: reasoning present
Complex tool + small prompt: reasoning present (reduced)
Complex tool + large prompt: reasoning drops to zero

The suppression appears specific to gemma4:26b — smaller Gemma 4 variants were not affected.

System Info

Reported versions from langchain-ai/deepagents#2445:

langchain-ollama: 1.0.1
langchain: 1.2.14
langchain_core: (paired with langchain 1.2.14)
ollama python package: 0.6.1
Model: gemma4:26b via local Ollama server
Platform: Linux

extent analysis

TL;DR

The issue can be mitigated by restructuring the ChatOllama request to avoid triggering the model's reasoning suppression, potentially by adjusting tool schema placement or prompt ordering.

Guidance

Investigate alternative ways to structure the Ollama request payload, such as placing the tool schema before or after the large system prompt, to see if this affects the model's reasoning output.
Test the same scenario with smaller Gemma 4 variants to confirm that the issue is specific to gemma4:26b.
Consider adding logging or debugging statements to the ChatOllama code to inspect the request payload and response from the Ollama server, which may provide insight into the cause of the reasoning suppression.
Evaluate the impact of reducing the complexity of the tool schema or the size of the system prompt on the model's reasoning output.

Example

No code example is provided, as the issue is likely related to the interaction between the ChatOllama library and the Ollama model, and requires further investigation into the request payload and response.

Notes

The root cause of the issue is unclear, but it appears to be related to the specific combination of the gemma4:26b model, the complex tool schema, and the large system prompt. Further experimentation and debugging are needed to determine the best course of action.

Recommendation

Apply a workaround by restructuring the ChatOllama request payload, as this may help avoid triggering the model's reasoning suppression. This approach is recommended because it does not require modifying the underlying model or library code, and can be implemented and tested quickly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#agent execution #callback error #memory management #API rate limit #retriever error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - ✅(Solved) Fix `ChatOllama.bind_tools()` with large system prompt suppresses reasoning output on `gemma4:26b` (investigating upstream) [1 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #2: fix(ollama): isolate tool schemas in ChatOllama to prevent reasoning suppression

Description (problem / solution / changelog)

Summary by CodeRabbit

Changed files

Code Example

Checked other resources

Package (Required)

Related Issues / PRs

Reproduction Steps / Example Code (Python)

Error Message and Stack Trace (if applicable)

Description

System Info

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

langchain - ✅(Solved) Fix `ChatOllama.bind_tools()` with large system prompt suppresses reasoning output on `gemma4:26b` (investigating upstream) [1 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #2: fix(ollama): isolate tool schemas in ChatOllama to prevent reasoning suppression

Description (problem / solution / changelog)

Summary by CodeRabbit

Changed files

Code Example

Checked other resources

Package (Required)

Related Issues / PRs

Reproduction Steps / Example Code (Python)

Error Message and Stack Trace (if applicable)

Description

System Info

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING