openclaw - ✅(Solved) Fix [Bug]: mlx-vlm report 0% context usage — prompt_tokens vs input_tokens field mismatch [2 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#49316Fetched 2026-04-08 00:56:37
View on GitHub
Comments
5
Participants
4
Timeline
13
Reactions
0
Author
Timeline (top)
commented ×5cross-referenced ×4labeled ×2referenced ×1

When using mlx-vlm as a local model server via openai-completions API, the Telegram status always shows Context: 0/Xk (0%) regardless of actual token usage.

Root Cause

mlx-vlm returns usage fields with different names than the OpenAI standard:

FieldOpenAI Standardmlx-vlm / vLLM
Input tokensprompt_tokensinput_tokens
Output tokenscompletion_tokensoutput_tokens

Example mlx-vlm streaming response:

{
  "usage": {
    "input_tokens": 11,
    "output_tokens": 5,
    "total_tokens": 16,
    "prompt_tps": 21.55,
    "generation_tps": 81.56
  }
}

Note: prompt_tokens and completion_tokens are absent.

parseChunkUsage() in @mariozechner/pi-ai/dist/providers/openai-completions.js only reads prompt_tokens/completion_tokens, resulting in 0 for both input and output.

Fix Action

Fixed

PR fix notes

PR #2325: fix: fall back to input_tokens/output_tokens for OpenAI-compatible servers

Description (problem / solution / changelog)

Problem

Some OpenAI-compatible servers (mlx-vlm, vLLM) return usage fields with different names than the OpenAI standard:

FieldOpenAI Standardmlx-vlm / vLLM
Input tokensprompt_tokensinput_tokens
Output tokenscompletion_tokensoutput_tokens

parseChunkUsage() in openai-completions.ts only reads prompt_tokens/completion_tokens, so when these fields are absent, both input and output are reported as 0 — causing context usage to always show 0%.

Fix

Use nullish coalescing (??) to fall back to input_tokens/output_tokens when the standard fields are not present:

const input = (rawUsage.prompt_tokens ?? rawUsage.input_tokens ?? 0) - cachedTokens;
const outputTokens = (rawUsage.completion_tokens ?? rawUsage.output_tokens ?? 0) + reasoningTokens;

Using ?? instead of || ensures that an explicit 0 value for prompt_tokens is respected and doesn't trigger the fallback.

Related

Fixes openclaw/openclaw#49316

Changed files

  • packages/ai/src/providers/openai-completions.ts (modified, +6/-2)

PR #49357: fix: normalize mlx-vlm usage token aliases

Description (problem / solution / changelog)

Summary

  • normalize input_tokens / output_tokens usage aliases in normalizeUsage()
  • add a regression test covering mlx-vlm / vLLM-style usage payloads

Fixes #49316.

Notes

  • This keeps the change narrowly scoped to usage normalization so /status and related displays can consume mlx-vlm token usage correctly.
  • I did not run the full workspace test suite in this fresh shallow clone because dependencies are not installed by default in this cron flow.

Changed files

  • src/agents/usage.test.ts (modified, +15/-0)
  • src/agents/usage.ts (modified, +4/-0)

Code Example

{
  "usage": {
    "input_tokens": 11,
    "output_tokens": 5,
    "total_tokens": 16,
    "prompt_tps": 21.55,
    "generation_tps": 81.56
  }
}

---

"mlx-local": {
     "baseUrl": "http://127.0.0.1:8000/v1",
     "apiKey": "none",
     "api": "openai-completions",
     "models": [{ "id": "./models/Qwen3.5-35B-A3B-bf16", "contextWindow": 131072 }]
   }

---
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

When using mlx-vlm as a local model server via openai-completions API, the Telegram status always shows Context: 0/Xk (0%) regardless of actual token usage.

Root Cause

mlx-vlm returns usage fields with different names than the OpenAI standard:

FieldOpenAI Standardmlx-vlm / vLLM
Input tokensprompt_tokensinput_tokens
Output tokenscompletion_tokensoutput_tokens

Example mlx-vlm streaming response:

{
  "usage": {
    "input_tokens": 11,
    "output_tokens": 5,
    "total_tokens": 16,
    "prompt_tps": 21.55,
    "generation_tps": 81.56
  }
}

Note: prompt_tokens and completion_tokens are absent.

parseChunkUsage() in @mariozechner/pi-ai/dist/providers/openai-completions.js only reads prompt_tokens/completion_tokens, resulting in 0 for both input and output.

Steps to reproduce

  1. Set up mlx-vlm server (python -m mlx_vlm.server --host 127.0.0.1 --port 8000)
  2. Configure openclaw with mlx-vlm as openai-completions provider:
    "mlx-local": {
      "baseUrl": "http://127.0.0.1:8000/v1",
      "apiKey": "none",
      "api": "openai-completions",
      "models": [{ "id": "./models/Qwen3.5-35B-A3B-bf16", "contextWindow": 131072 }]
    }
  3. Send any message via Telegram
  4. Run /status

Expected behavior

Context should display actual usage, e.g. Context: 14k/131k (11%)

Actual behavior

Context always shows Context: 0/131k (0%) even after multiple messages.

OpenClaw version

OpenClaw: 2026.3.13 (61d171a)

Operating system

macOS Darwin 25.3.0 (Apple Silicon)

Install method

npm

Model

Model: Qwen3.5-35B-A3B-bf16 (65GB, bf16)

Provider / routing chain

mlx-vlm (openai-completions) → http://127.0.0.1:8000/v1 → local Qwen3.5-35B-A3B-bf16

Config file / key location

No response

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

Fix Plan

To fix the issue, we need to update the parseChunkUsage() function in @mariozechner/pi-ai/dist/providers/openai-completions.js to read the correct usage fields from the mlx-vlm response.

  • Update the parseChunkUsage() function to read input_tokens and output_tokens instead of prompt_tokens and completion_tokens:
function parseChunkUsage(chunk) {
  const usage = chunk.usage;
  const inputTokens = usage.input_tokens || 0;
  const outputTokens = usage.output_tokens || 0;
  // ...
}
  • Alternatively, you can add a check to handle both cases:
function parseChunkUsage(chunk) {
  const usage = chunk.usage;
  const inputTokens = usage.input_tokens || usage.prompt_tokens || 0;
  const outputTokens = usage.output_tokens || usage.completion_tokens || 0;
  // ...
}

Verification

To verify that the fix worked, you can:

  • Send a message via Telegram
  • Run /status and check that the Context displays the actual usage, e.g. Context: 14k/131k (11%)

Extra Tips

  • Make sure to update the @mariozechner/pi-ai package to the latest version after applying the fix.
  • If you're using a custom build of @mariozechner/pi-ai, you may need to rebuild the package after applying the fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Context should display actual usage, e.g. Context: 14k/131k (11%)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: mlx-vlm report 0% context usage — prompt_tokens vs input_tokens field mismatch [2 pull requests, 5 comments, 4 participants]