ollama - ✅(Solved) Fix /v1/responses endpoint ignores reasoning_effort: "none" — thinking not disabled for gemma4:e2b [1 pull requests, 1 participants]

ollama2026-04-16 21:57:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15635•Fetched 2026-04-17 08:27:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

waywardgeek

Participants

waywardgeek

Timeline (top)

labeled ×1

PR fix notes

PR #15664: openai: honor reasoning_effort in /v1/responses endpoint

Repository: ollama/ollama
Author: balgaly
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/15664

Description (problem / solution / changelog)

Fixes #15635.

Problem

/v1/chat/completions correctly disables thinking when reasoning_effort: "none" is passed, but /v1/responses ignores reasoning.effort entirely. Reproduction from the issue:

# Chat Completions — works (0.5s, no reasoning)
curl -s http://localhost:11434/v1/chat/completions \
  -d '{"model":"gemma4:e2b","messages":[{"role":"user","content":"Reply with ONLY: [1,3]"}],"reasoning_effort":"none","stream":false}'

# Responses — thinking NOT disabled (4.7s, full reasoning)
curl -s http://localhost:11434/v1/responses \
  -d '{"model":"gemma4:e2b","input":"Reply with ONLY: [1,3]","reasoning":{"effort":"none"},"stream":false}'

Root cause

openai/responses.go:FromResponsesRequest deserializes r.Reasoning.Effort (it is already used later for echoing reasoning config back to the client) but never converts it into the api.ChatRequest.Think field. The Chat Completions path at openai/openai.go:625-644 already performs this mapping.

Fix

Mirror the existing Chat Completions logic in FromResponsesRequest:

effort == "none" → Think{Value: false} (suppresses thinking)
effort == "high" | "medium" | "low" → Think{Value: <effort>}
any other value → the same validation error the Chat Completions path returns

Validation, error message, and accepted effort values are identical across the two endpoints, so behavior stays consistent.

Tests

TestFromResponsesRequest_ReasoningEffort covers:

"none" disables thinking (Think.Value == false)
"low", "medium", "high" set the string value
missing reasoning leaves Think unset
invalid effort returns an error

All existing ./openai/... and ./middleware/... tests still pass.

Notes

Only reasoning.effort is handled here — OpenAI's Responses API does not expose a flat reasoning_effort (that is a Chat Completions shape). Clients that sent {"reasoning_effort": "none"} on /v1/responses before this PR were effectively sending an unknown field; after the PR they should move to the canonical {"reasoning": {"effort": "none"}} form (which also did not work before).

Changed files

openai/responses.go (modified, +14/-0)
openai/responses_test.go (modified, +73/-0)

RAW_BUFFERClick to expand / collapse

What is the issue?

Description The /v1/chat/completions endpoint correctly disables thinking when reasoning_effort: "none" is set, but the /v1/responses endpoint ignores it entirely. The model still produces full thinking output, adding significant latency.

Environment Ollama: HEAD-2bb7ea0 (also reproduced on 0.20.7 stable) Model: gemma4:e2b OS: macOS (Apple Silicon) Reproduction Chat Completions — works correctly (0.5s, no thinking):

curl -s http://localhost:11434/v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "gemma4:e2b", "messages": [{"role": "user", "content": "Reply with ONLY: [1,3]"}], "reasoning_effort": "none", "stream": false }' Result: 6 tokens, no reasoning field, ~0.5s

Responses API — thinking NOT disabled (4.7s, full reasoning):

curl -s http://localhost:11434/v1/responses
-H "Content-Type: application/json"
-d '{ "model": "gemma4:e2b", "input": "Reply with ONLY: [1,3]", "reasoning_effort": "none", "stream": false }' Result: 200+ tokens, full reasoning output block with thinking content, ~4.7s

Also tried "reasoning": {"effort": "none"} on the Responses endpoint — same result.

Expected Behavior Both endpoints should honor reasoning_effort: "none" and suppress thinking output. The Ollama docs list "Reasoning/thinking control (for thinking models)" as a supported feature for Chat Completions, and list reasoning_effort in the Responses API supported fields.

Impact This is a significant latency issue for applications using thinking models as lightweight classifiers via the Responses API. In our case, we use gemma4:e2b as a relevance judge for memory recall — it should return a JSON array in ~0.5s but takes ~5s due to unnecessary thinking.

Relevant log output

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

Ollama: HEAD-2bb7ea0 (also reproduced on 0.20.7 stable)

extent analysis

TL;DR

The /v1/responses endpoint may require a different parameter or configuration to disable thinking output when reasoning_effort is set to "none".

Guidance

Verify that the reasoning_effort parameter is correctly handled in the /v1/responses endpoint by checking the Ollama documentation and code for any specific requirements or overrides.
Try using the reasoning object with effort set to "none" in the /v1/responses endpoint, as mentioned in the issue, to see if it makes a difference: "reasoning": {"effort": "none"}.
Compare the implementation of the /v1/chat/completions endpoint, where reasoning_effort is correctly honored, to identify potential differences or clues for fixing the /v1/responses endpoint.
Check for any open issues or pull requests in the Ollama repository related to the /v1/responses endpoint and reasoning_effort parameter.

Example

No code snippet is provided as the issue does not contain sufficient information to create a concrete example.

Notes

The issue may be specific to the gemma4:e2b model or the Ollama: HEAD-2bb7ea0 version, and further investigation is needed to determine the root cause.

Recommendation

Apply workaround: Try using a different parameter or configuration for the /v1/responses endpoint to disable thinking output, as the current reasoning_effort parameter does not seem to be working as expected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt template #agent execution #callback error #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix /v1/responses endpoint ignores reasoning_effort: "none" — thinking not disabled for gemma4:e2b [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #15664: openai: honor reasoning_effort in /v1/responses endpoint

Description (problem / solution / changelog)

Problem

Root cause

Fix

Tests

Notes

Changed files

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix /v1/responses endpoint ignores reasoning_effort: "none" — thinking not disabled for gemma4:e2b [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #15664: openai: honor reasoning_effort in /v1/responses endpoint

Description (problem / solution / changelog)

Problem

Root cause

Fix

Tests

Notes

Changed files

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING