openclaw - ✅(Solved) Fix [Bug]: mlx-vlm report 0% context usage — prompt_tokens vs input_tokens field mismatch [2 pull requests, 5 comments, 4 participants]

Q: Expected behavior

Context should display actual usage, e.g. `Context: 14k/131k (11%)`

openclaw2026-03-18 00:17:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#49316•Fetched 2026-04-08 00:56:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×5cross-referenced ×4labeled ×2referenced ×1

When using mlx-vlm as a local model server via openai-completions API, the Telegram status always shows Context: 0/Xk (0%) regardless of actual token usage.

Root Cause

mlx-vlm returns usage fields with different names than the OpenAI standard:

Field	OpenAI Standard	mlx-vlm / vLLM
Input tokens	`prompt_tokens`	`input_tokens`
Output tokens	`completion_tokens`	`output_tokens`

Example mlx-vlm streaming response:

{
  "usage": {
    "input_tokens": 11,
    "output_tokens": 5,
    "total_tokens": 16,
    "prompt_tps": 21.55,
    "generation_tps": 81.56
  }
}

Note: prompt_tokens and completion_tokens are absent.

parseChunkUsage() in @mariozechner/pi-ai/dist/providers/openai-completions.js only reads prompt_tokens/completion_tokens, resulting in 0 for both input and output.

Fix Action

Fixed

Fixed by PR: fix: fall back to input_tokens/output_tokens for OpenAI-compatible servers (https://github.com/badlogic/pi-mono/pull/2325)
Fixed by PR: fix: normalize mlx-vlm usage token aliases (https://github.com/openclaw/openclaw/pull/49357)

PR fix notes

PR #2325: fix: fall back to input_tokens/output_tokens for OpenAI-compatible servers

Repository: badlogic/pi-mono
Author: ddpie
State: closed | merged: False
Link: https://github.com/badlogic/pi-mono/pull/2325

Description (problem / solution / changelog)

Problem

Some OpenAI-compatible servers (mlx-vlm, vLLM) return usage fields with different names than the OpenAI standard:

Field	OpenAI Standard	mlx-vlm / vLLM
Input tokens	`prompt_tokens`	`input_tokens`
Output tokens	`completion_tokens`	`output_tokens`

parseChunkUsage() in openai-completions.ts only reads prompt_tokens/completion_tokens, so when these fields are absent, both input and output are reported as 0 — causing context usage to always show 0%.

Fix

Use nullish coalescing (??) to fall back to input_tokens/output_tokens when the standard fields are not present:

const input = (rawUsage.prompt_tokens ?? rawUsage.input_tokens ?? 0) - cachedTokens;
const outputTokens = (rawUsage.completion_tokens ?? rawUsage.output_tokens ?? 0) + reasoningTokens;

Using ?? instead of || ensures that an explicit 0 value for prompt_tokens is respected and doesn't trigger the fallback.

Fixes openclaw/openclaw#49316

Changed files

packages/ai/src/providers/openai-completions.ts (modified, +6/-2)

PR #49357: fix: normalize mlx-vlm usage token aliases

Repository: openclaw/openclaw
Author: choiking
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/49357

Description (problem / solution / changelog)

Summary

normalize input_tokens / output_tokens usage aliases in normalizeUsage()
add a regression test covering mlx-vlm / vLLM-style usage payloads

Fixes #49316.

Notes

This keeps the change narrowly scoped to usage normalization so /status and related displays can consume mlx-vlm token usage correctly.
I did not run the full workspace test suite in this fresh shallow clone because dependencies are not installed by default in this cron flow.

Changed files

src/agents/usage.test.ts (modified, +15/-0)
src/agents/usage.ts (modified, +4/-0)

Code Example

{
  "usage": {
    "input_tokens": 11,
    "output_tokens": 5,
    "total_tokens": 16,
    "prompt_tps": 21.55,
    "generation_tps": 81.56
  }
}

---

"mlx-local": {
     "baseUrl": "http://127.0.0.1:8000/v1",
     "apiKey": "none",
     "api": "openai-completions",
     "models": [{ "id": "./models/Qwen3.5-35B-A3B-bf16", "contextWindow": 131072 }]
   }

---

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

When using mlx-vlm as a local model server via openai-completions API, the Telegram status always shows Context: 0/Xk (0%) regardless of actual token usage.

Root Cause

mlx-vlm returns usage fields with different names than the OpenAI standard:

Field	OpenAI Standard	mlx-vlm / vLLM
Input tokens	`prompt_tokens`	`input_tokens`
Output tokens	`completion_tokens`	`output_tokens`

Example mlx-vlm streaming response:

{
  "usage": {
    "input_tokens": 11,
    "output_tokens": 5,
    "total_tokens": 16,
    "prompt_tps": 21.55,
    "generation_tps": 81.56
  }
}

Note: prompt_tokens and completion_tokens are absent.

parseChunkUsage() in @mariozechner/pi-ai/dist/providers/openai-completions.js only reads prompt_tokens/completion_tokens, resulting in 0 for both input and output.

Steps to reproduce

Set up mlx-vlm server (python -m mlx_vlm.server --host 127.0.0.1 --port 8000)

Configure openclaw with mlx-vlm as openai-completions provider:

"mlx-local": {
  "baseUrl": "http://127.0.0.1:8000/v1",
  "apiKey": "none",
  "api": "openai-completions",
  "models": [{ "id": "./models/Qwen3.5-35B-A3B-bf16", "contextWindow": 131072 }]
}

Send any message via Telegram
Run /status

Expected behavior

Context should display actual usage, e.g. Context: 14k/131k (11%)

Actual behavior

Context always shows Context: 0/131k (0%) even after multiple messages.

OpenClaw version

OpenClaw: 2026.3.13 (61d171a)

Operating system

macOS Darwin 25.3.0 (Apple Silicon)

Install method

npm

Model

Model: Qwen3.5-35B-A3B-bf16 (65GB, bf16)

Provider / routing chain

mlx-vlm (openai-completions) → http://127.0.0.1:8000/v1 → local Qwen3.5-35B-A3B-bf16

Config file / key location

No response

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

Fix Plan

To fix the issue, we need to update the parseChunkUsage() function in @mariozechner/pi-ai/dist/providers/openai-completions.js to read the correct usage fields from the mlx-vlm response.

Update the parseChunkUsage() function to read input_tokens and output_tokens instead of prompt_tokens and completion_tokens:

function parseChunkUsage(chunk) {
  const usage = chunk.usage;
  const inputTokens = usage.input_tokens || 0;
  const outputTokens = usage.output_tokens || 0;
  // ...
}

Alternatively, you can add a check to handle both cases:

function parseChunkUsage(chunk) {
  const usage = chunk.usage;
  const inputTokens = usage.input_tokens || usage.prompt_tokens || 0;
  const outputTokens = usage.output_tokens || usage.completion_tokens || 0;
  // ...
}

Verification

To verify that the fix worked, you can:

Send a message via Telegram
Run /status and check that the Context displays the actual usage, e.g. Context: 14k/131k (11%)

Extra Tips

Make sure to update the @mariozechner/pi-ai package to the latest version after applying the fix.
If you're using a custom build of @mariozechner/pi-ai, you may need to rebuild the package after applying the fix.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Context should display actual usage, e.g. Context: 14k/131k (11%)

#api #ssr #installation #tensor shape #autograd error #generation error #database connection #vector store #embedding generation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: mlx-vlm report 0% context usage — prompt_tokens vs input_tokens field mismatch [2 pull requests, 5 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #2325: fix: fall back to input_tokens/output_tokens for OpenAI-compatible servers

Description (problem / solution / changelog)

Problem

Fix

Related

Changed files

PR #49357: fix: normalize mlx-vlm usage token aliases

Description (problem / solution / changelog)

Summary

Notes

Changed files

Code Example

Bug type

Summary

Root Cause

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Config file / key location

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING