ollama - 💡(How to fix) Fix Add logprobs support to OpenAI-compatible /v1/chat/completions endpoint

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

{
  "model": "gemma4",
  "messages": [{"role": "user", "content": "Hello"}],
  "logprobs": true,
  "top_logprobs": 5
}

---

{
  "content": [
    {
      "token": "Hello",
      "logprob": -0.0012,
      "bytes": [72, 101, 108, 108, 111],
      "top_logprobs": [
        {"token": "Hello", "logprob": -0.0012, "bytes": [72, 101, 108, 108, 111]},
        {"token": "Hi",    "logprob": -7.42,   "bytes": [72, 105]}
      ]
    }
  ]
}
RAW_BUFFERClick to expand / collapse

What are you trying to do?

Expose the token-level logprobs already returned by Ollama's native /api/generate endpoint through the OpenAI-compatible /v1/chat/completions endpoint as well, matching the OpenAI API shape (logprobs: true, optional top_logprobs: N, response under choices[0].logprobs.content[].top_logprobs).

Why

Ollama already computes and returns logprobs on /api/generate (see #13497). Tools built against the OpenAI SDK — including LLMbench (https://github.com/vector-lab-tools/LLMbench), a research instrument for the comparative close reading of LLM outputs — talk to Ollama via the /v1/chat/completions compat layer. That layer currently silently drops the logprobs and top_logprobs request fields, so logprob-dependent features (token-probability visualisations, sampling-distribution probes, grammar-probe phase B/C, etc.) cannot use any locally hosted model, even though the underlying runtime has the data.

The OpenAI-compat surface is the natural integration point for the wider tooling ecosystem (LangChain, LiteLLM, llm CLI, Vercel AI SDK, every research workbench in the Critical AI / digital humanities space). Adding logprobs there would unlock token-level analysis against open-weight models for a large number of downstream tools without each having to special-case Ollama's native API.

Proposed shape

Match OpenAI's existing schema so SDKs don't need provider-specific branches:

Request:

{
  "model": "gemma4",
  "messages": [{"role": "user", "content": "Hello"}],
  "logprobs": true,
  "top_logprobs": 5
}

Response (choices[0].logprobs):

{
  "content": [
    {
      "token": "Hello",
      "logprob": -0.0012,
      "bytes": [72, 101, 108, 108, 111],
      "top_logprobs": [
        {"token": "Hello", "logprob": -0.0012, "bytes": [72, 101, 108, 108, 111]},
        {"token": "Hi",    "logprob": -7.42,   "bytes": [72, 105]}
      ]
    }
  ]
}

The internal data is already present (#13497 shows it in /api/generate); this is mainly a translation/wiring task in the compat layer, plus honouring top_logprobs for the top-K view.

Related

  • #13497 — Logprobs on /api/generate and a UTF-8 display bug
  • #13638 — Logprobs not returned from Ollama Cloud API (closed)
  • #3795 — logit_bias support (related OpenAI-compat gap)

Happy to test against LLMbench's logprob views once a build is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING