ollama - ✅(Solved) Fix /v1/responses endpoint ignores reasoning_effort: "none" — thinking not disabled for gemma4:e2b [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15635Fetched 2026-04-17 08:27:03
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

PR fix notes

PR #15664: openai: honor reasoning_effort in /v1/responses endpoint

Description (problem / solution / changelog)

Fixes #15635.

Problem

/v1/chat/completions correctly disables thinking when reasoning_effort: "none" is passed, but /v1/responses ignores reasoning.effort entirely. Reproduction from the issue:

# Chat Completions — works (0.5s, no reasoning)
curl -s http://localhost:11434/v1/chat/completions \
  -d '{"model":"gemma4:e2b","messages":[{"role":"user","content":"Reply with ONLY: [1,3]"}],"reasoning_effort":"none","stream":false}'

# Responses — thinking NOT disabled (4.7s, full reasoning)
curl -s http://localhost:11434/v1/responses \
  -d '{"model":"gemma4:e2b","input":"Reply with ONLY: [1,3]","reasoning":{"effort":"none"},"stream":false}'

Root cause

openai/responses.go:FromResponsesRequest deserializes r.Reasoning.Effort (it is already used later for echoing reasoning config back to the client) but never converts it into the api.ChatRequest.Think field. The Chat Completions path at openai/openai.go:625-644 already performs this mapping.

Fix

Mirror the existing Chat Completions logic in FromResponsesRequest:

  • effort == "none"Think{Value: false} (suppresses thinking)
  • effort == "high" | "medium" | "low"Think{Value: <effort>}
  • any other value → the same validation error the Chat Completions path returns

Validation, error message, and accepted effort values are identical across the two endpoints, so behavior stays consistent.

Tests

TestFromResponsesRequest_ReasoningEffort covers:

  • "none" disables thinking (Think.Value == false)
  • "low", "medium", "high" set the string value
  • missing reasoning leaves Think unset
  • invalid effort returns an error

All existing ./openai/... and ./middleware/... tests still pass.

Notes

  • Only reasoning.effort is handled here — OpenAI's Responses API does not expose a flat reasoning_effort (that is a Chat Completions shape). Clients that sent {"reasoning_effort": "none"} on /v1/responses before this PR were effectively sending an unknown field; after the PR they should move to the canonical {"reasoning": {"effort": "none"}} form (which also did not work before).

Changed files

  • openai/responses.go (modified, +14/-0)
  • openai/responses_test.go (modified, +73/-0)
RAW_BUFFERClick to expand / collapse

What is the issue?

Description The /v1/chat/completions endpoint correctly disables thinking when reasoning_effort: "none" is set, but the /v1/responses endpoint ignores it entirely. The model still produces full thinking output, adding significant latency.

Environment Ollama: HEAD-2bb7ea0 (also reproduced on 0.20.7 stable) Model: gemma4:e2b OS: macOS (Apple Silicon) Reproduction Chat Completions — works correctly (0.5s, no thinking):

curl -s http://localhost:11434/v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "gemma4:e2b", "messages": [{"role": "user", "content": "Reply with ONLY: [1,3]"}], "reasoning_effort": "none", "stream": false }' Result: 6 tokens, no reasoning field, ~0.5s

Responses API — thinking NOT disabled (4.7s, full reasoning):

curl -s http://localhost:11434/v1/responses
-H "Content-Type: application/json"
-d '{ "model": "gemma4:e2b", "input": "Reply with ONLY: [1,3]", "reasoning_effort": "none", "stream": false }' Result: 200+ tokens, full reasoning output block with thinking content, ~4.7s

Also tried "reasoning": {"effort": "none"} on the Responses endpoint — same result.

Expected Behavior Both endpoints should honor reasoning_effort: "none" and suppress thinking output. The Ollama docs list "Reasoning/thinking control (for thinking models)" as a supported feature for Chat Completions, and list reasoning_effort in the Responses API supported fields.

Impact This is a significant latency issue for applications using thinking models as lightweight classifiers via the Responses API. In our case, we use gemma4:e2b as a relevance judge for memory recall — it should return a JSON array in ~0.5s but takes ~5s due to unnecessary thinking.

Relevant log output

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

Ollama: HEAD-2bb7ea0 (also reproduced on 0.20.7 stable)

extent analysis

TL;DR

The /v1/responses endpoint may require a different parameter or configuration to disable thinking output when reasoning_effort is set to "none".

Guidance

  • Verify that the reasoning_effort parameter is correctly handled in the /v1/responses endpoint by checking the Ollama documentation and code for any specific requirements or overrides.
  • Try using the reasoning object with effort set to "none" in the /v1/responses endpoint, as mentioned in the issue, to see if it makes a difference: "reasoning": {"effort": "none"}.
  • Compare the implementation of the /v1/chat/completions endpoint, where reasoning_effort is correctly honored, to identify potential differences or clues for fixing the /v1/responses endpoint.
  • Check for any open issues or pull requests in the Ollama repository related to the /v1/responses endpoint and reasoning_effort parameter.

Example

No code snippet is provided as the issue does not contain sufficient information to create a concrete example.

Notes

The issue may be specific to the gemma4:e2b model or the Ollama: HEAD-2bb7ea0 version, and further investigation is needed to determine the root cause.

Recommendation

Apply workaround: Try using a different parameter or configuration for the /v1/responses endpoint to disable thinking output, as the current reasoning_effort parameter does not seem to be working as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - ✅(Solved) Fix /v1/responses endpoint ignores reasoning_effort: "none" — thinking not disabled for gemma4:e2b [1 pull requests, 1 participants]