openclaw - 💡(How to fix) Fix [Bug]: omlx Provider Fails to Report Token Usage — Dashboard Shows ?/131k (?%) and Compactions: 0

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Describe the bug

When using the omlx provider (MLX local inference via http://127.0.0.1:8200/v1), the OpenClaw dashboard UI fails to correctly display or track context window usage and compaction statistics. Specifically, the /status output shows 📚 Context: ?/131k (?%) and 🧹 Compactions: 0 even when the session has accumulated substantial conversation history.

The ? placeholders indicate that OpenClaw cannot resolve token counts from the omlx provider's response, making it impossible to monitor how close the session is to its 131k context window limit or verify whether auto-compaction is functioning properly.

To Reproduce

Configure the omlx provider in openclaw.json with a local MLX endpoint (e.g., http://127.0.0.1:8200/v1) serving Qwen3.6-35B-A3B-UD-MLX-4bit with contextWindow: 131072 Start a direct chat session and send multiple messages to build up conversation history Run /status or check the dashboard session card Observe 📚 Context: ?/131k (?%) instead of actual token counts like 45k/131k (34%) Run openclaw status --usage — token usage data is similarly unavailable for omlx sessions Expected behavior

The context usage should display actual token counts (e.g., 📚 Context: 45,231/131,072 (34%)) based on the provider's usage field in the completion response Compaction count should increment correctly when auto-compaction triggers near the context limit The ? and (?%) placeholders should only appear when token tracking is genuinely unavailable, not when the provider returns valid usage data Screenshots

N/A — the issue manifests as ? placeholders in the status display rather than a visual error.

Desktop (please complete the following information):

macOS Version: 26.5.7 oMLX Version: 0.3.8 (OpenClaw 2026.5.7, provider is a local MLX inference server on port 8200) Additional context

The omlx provider is configured as a standard OpenAI-compatible endpoint (api: "openai-completions") pointing to a local MLX inference server. The model definition includes contextWindow: 131072 and maxTokens: 8192.

The issue likely stems from one of these:

The MLX server's completion response doesn't include a properly structured usage object (with prompt_tokens, completion_tokens, total_tokens fields) that OpenClaw expects from OpenAI-compatible APIs OpenClaw's token counting logic doesn't handle the response format from this specific MLX backend correctly The cost fields are all 0 (free/self-hosted), which may cause the token accounting pipeline to skip usage tracking entirely The provider config looks like:

"omlx": { "baseUrl": "http://127.0.0.1:8200/v1", "apiKey": "omlx", "api": "openai-completions", "models": [{ "id": "Qwen3.6-35B-A3B-UD-MLX-4bit", "contextWindow": 131072, "maxTokens": 8192, "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 } }] } Other providers (mimo, volcengine, my-gpt) display token counts correctly, so this appears specific to the omlx/MLX provider path.


Compactions: 0 — the context compression feature is also broken. Instead of triggering when the context nears its threshold, it waits until overflow occurs, at which point OpenClaw force-triggers a compaction. But this halts the running task, which then restarts from scratch, overflows again, and gets stuck in an endless loop.

Error Message

N/A — the issue manifests as ? placeholders in the status display rather than a visual error.

Root Cause

Describe the bug

When using the omlx provider (MLX local inference via http://127.0.0.1:8200/v1), the OpenClaw dashboard UI fails to correctly display or track context window usage and compaction statistics. Specifically, the /status output shows 📚 Context: ?/131k (?%) and 🧹 Compactions: 0 even when the session has accumulated substantial conversation history.

The ? placeholders indicate that OpenClaw cannot resolve token counts from the omlx provider's response, making it impossible to monitor how close the session is to its 131k context window limit or verify whether auto-compaction is functioning properly.

To Reproduce

Configure the omlx provider in openclaw.json with a local MLX endpoint (e.g., http://127.0.0.1:8200/v1) serving Qwen3.6-35B-A3B-UD-MLX-4bit with contextWindow: 131072 Start a direct chat session and send multiple messages to build up conversation history Run /status or check the dashboard session card Observe 📚 Context: ?/131k (?%) instead of actual token counts like 45k/131k (34%) Run openclaw status --usage — token usage data is similarly unavailable for omlx sessions Expected behavior

The context usage should display actual token counts (e.g., 📚 Context: 45,231/131,072 (34%)) based on the provider's usage field in the completion response Compaction count should increment correctly when auto-compaction triggers near the context limit The ? and (?%) placeholders should only appear when token tracking is genuinely unavailable, not when the provider returns valid usage data Screenshots

N/A — the issue manifests as ? placeholders in the status display rather than a visual error.

Desktop (please complete the following information):

macOS Version: 26.5.7 oMLX Version: 0.3.8 (OpenClaw 2026.5.7, provider is a local MLX inference server on port 8200) Additional context

The omlx provider is configured as a standard OpenAI-compatible endpoint (api: "openai-completions") pointing to a local MLX inference server. The model definition includes contextWindow: 131072 and maxTokens: 8192.

The issue likely stems from one of these:

The MLX server's completion response doesn't include a properly structured usage object (with prompt_tokens, completion_tokens, total_tokens fields) that OpenClaw expects from OpenAI-compatible APIs OpenClaw's token counting logic doesn't handle the response format from this specific MLX backend correctly The cost fields are all 0 (free/self-hosted), which may cause the token accounting pipeline to skip usage tracking entirely The provider config looks like:

"omlx": { "baseUrl": "http://127.0.0.1:8200/v1", "apiKey": "omlx", "api": "openai-completions", "models": [{ "id": "Qwen3.6-35B-A3B-UD-MLX-4bit", "contextWindow": 131072, "maxTokens": 8192, "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 } }] } Other providers (mimo, volcengine, my-gpt) display token counts correctly, so this appears specific to the omlx/MLX provider path.


Compactions: 0 — the context compression feature is also broken. Instead of triggering when the context nears its threshold, it waits until overflow occurs, at which point OpenClaw force-triggers a compaction. But this halts the running task, which then restarts from scratch, overflows again, and gets stuck in an endless loop.

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Describe the bug

When using the omlx provider (MLX local inference via http://127.0.0.1:8200/v1), the OpenClaw dashboard UI fails to correctly display or track context window usage and compaction statistics. Specifically, the /status output shows 📚 Context: ?/131k (?%) and 🧹 Compactions: 0 even when the session has accumulated substantial conversation history.

The ? placeholders indicate that OpenClaw cannot resolve token counts from the omlx provider's response, making it impossible to monitor how close the session is to its 131k context window limit or verify whether auto-compaction is functioning properly.

To Reproduce

Configure the omlx provider in openclaw.json with a local MLX endpoint (e.g., http://127.0.0.1:8200/v1) serving Qwen3.6-35B-A3B-UD-MLX-4bit with contextWindow: 131072 Start a direct chat session and send multiple messages to build up conversation history Run /status or check the dashboard session card Observe 📚 Context: ?/131k (?%) instead of actual token counts like 45k/131k (34%) Run openclaw status --usage — token usage data is similarly unavailable for omlx sessions Expected behavior

The context usage should display actual token counts (e.g., 📚 Context: 45,231/131,072 (34%)) based on the provider's usage field in the completion response Compaction count should increment correctly when auto-compaction triggers near the context limit The ? and (?%) placeholders should only appear when token tracking is genuinely unavailable, not when the provider returns valid usage data Screenshots

N/A — the issue manifests as ? placeholders in the status display rather than a visual error.

Desktop (please complete the following information):

macOS Version: 26.5.7 oMLX Version: 0.3.8 (OpenClaw 2026.5.7, provider is a local MLX inference server on port 8200) Additional context

The omlx provider is configured as a standard OpenAI-compatible endpoint (api: "openai-completions") pointing to a local MLX inference server. The model definition includes contextWindow: 131072 and maxTokens: 8192.

The issue likely stems from one of these:

The MLX server's completion response doesn't include a properly structured usage object (with prompt_tokens, completion_tokens, total_tokens fields) that OpenClaw expects from OpenAI-compatible APIs OpenClaw's token counting logic doesn't handle the response format from this specific MLX backend correctly The cost fields are all 0 (free/self-hosted), which may cause the token accounting pipeline to skip usage tracking entirely The provider config looks like:

"omlx": { "baseUrl": "http://127.0.0.1:8200/v1", "apiKey": "omlx", "api": "openai-completions", "models": [{ "id": "Qwen3.6-35B-A3B-UD-MLX-4bit", "contextWindow": 131072, "maxTokens": 8192, "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 } }] } Other providers (mimo, volcengine, my-gpt) display token counts correctly, so this appears specific to the omlx/MLX provider path.


Compactions: 0 — the context compression feature is also broken. Instead of triggering when the context nears its threshold, it waits until overflow occurs, at which point OpenClaw force-triggers a compaction. But this halts the running task, which then restarts from scratch, overflows again, and gets stuck in an endless loop.

Steps to reproduce

  1. https://github.com/jundot/omlx
  2. run omlx server
  3. '/Applications/oMLX.app/Contents/MacOS/omlx-cli' launch openclaw --model 'Qwen3.6-35B-A3B-UD-MLX-4bit' --api-key 'omlx' --tools-profile 'full'
  4. openclaw web ui ; /status ; 📚 Context: ?/131k (?%)

Expected behavior

show 📚 Context: 45,231/131,072 (34%))

Actual behavior

📚 Context: ?/131k (?%) Compactions: 0

Compactions: 0 — the context compression feature is also broken. Instead of triggering when the context nears its threshold, it waits until overflow occurs, at which point OpenClaw force-triggers a compaction. But this halts the running task, which then restarts from scratch, overflows again, and gets stuck in an endless loop.

OpenClaw version

2026.5.7

Operating system

macOS 26.5.7

Install method

npm

Model

omlx & local model & qwen3.6

Provider / routing chain

openclaw -> omlx -> local qwen3.6

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Now the local model is almost unusable. I have to set the context window very large just to keep things running, but that means my machine is constantly on the verge of crashing as the context grows. If I set the context window too small, frequent overflows make it impossible to execute tasks at all.

Additional information

Looking forward to the fix

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

show 📚 Context: 45,231/131,072 (34%))

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: omlx Provider Fails to Report Token Usage — Dashboard Shows ?/131k (?%) and Compactions: 0