hermes - 💡(How to fix) Fix [Bug]: state.db input_tokens and cache_read_tokens incorrectly recorded for MiMo/xiaomi provider

StepCodex · 2026-06-09T05:31:34Z

[hermes] Bug Description Hermes state.db records incorrect token values for MiMo xiaomi provider: 1. input tokens is ~3.9x higher than actual API miss tokens 2… ## Workaround Use Agent log or API official dashboard for billing analysis. Do not rely on `state.db` token values for MiMo provider. ## Bug Description Hermes `state.db` records incorrect token values for MiMo (xiaomi) provider: 1. **`input_tokens` is ~3.9x higher than actual API miss tokens** 2. **`cache_read_tokens` is always 0** despite cache being active (90%+ hit rate confirmed by API) This makes `state.db` unusable for billing analysis and cache hit rate calculation. ## Evidence ### API Official Data (from MiMo dashboard) - Total Input: 2,318,203 tokens - Cache Hit: 2,070,400 tokens (89.3%) - Cache Miss: 247,803 tokens ### Hermes Agent Log - Total prompt_tokens (cumulative): 2,229,419 tokens - Cache Hit (from log): 1,981,824 tokens - Cache Miss (calculated): 226,818 tokens ### Hermes state.db ```sql SELECT input_tokens, cache_read_tokens FROM sessions WHERE id = 'e03df98b569c'; -- Result: 879344 | 0 ``` **DB shows input_tokens=879,344 (should be ~247,803) and cache_read_tokens=0 (should be ~2,070,400)** ## Root Cause Analysis ### Code Flow 1. **API Response Parsing** (`agent/usage_pricing.py:738-757`): ```python # OpenAI-compatible API (MiMo uses this) prompt_total = response.usage.prompt_tokens # Total input (includes cache) cache_read_tokens = details.cached_tokens # From prompt_tokens_details input_tokens = prompt_total - cache_read_tokens - cache_write_tokens # Miss ``` 2. **Session Accumulation** (`agent/conversation_loop.py:1607-1610`): ```python agent.session_input_tokens += canonical_usage.input_tokens agent.session_cache_read_tokens += canonical_usage.cache_read_tokens ``` 3. **DB Storage** (`agent/turn_finalizer.py:341-344`): ```python "input_tokens": agent.session_input_tokens, "cache_read_tokens": agent.session_cache_read_tokens, ``` ### Hypothesis **MiMo API does not return `prompt_tokens_details.cached_tokens`** (or uses a non-standard field name), causing: 1. `normalize_usage()` returns `cache_read_tokens=0` 2. `input_tokens = prompt_total - 0 = prompt_total` (treats all input as miss) 3. DB accumulates `prompt_total` instead of actual miss This explains why: - DB `input_tokens` (879,344) is much higher than actual miss (247,803) - DB `cache_read_tokens` is 0 despite 90%+ cache hit rate ## Steps to Reproduce 1. Configure Hermes with MiMo provider: ```yaml providers: xiaomi: base_url: https://token-plan-cn.xiaomimimo.com/v1 model: mimo-v2.5 ``` 2. Run a multi-turn conversation with WebUI or Gateway 3. Check `state.db`: ```sql SELECT input_tokens, cache_read_tokens FROM sessions ORDER BY started_at DESC LIMIT 1; ``` 4. Compare with MiMo API dashboard data — values will not match ## Expected Behavior `state.db` should record: - `input_tokens` = actual miss tokens (new KV computation) - `cache_read_tokens` = actual cache hit tokens ## Actual Behavior `state.db` records: - `input_tokens` ≈ cumulative prompt_tokens (includes cache) - `cache_read_tokens` = 0 ## Proposed Fix ### Option 1: Add MiMo-specific usage parsing Check if MiMo returns cache data in a non-standard field. Add detection in `normalize_usage()`: ```python # MiMo-specific: check for alternative cache fields if not cache_read_tokens: cache_read_tokens = _to_int(getattr(response_usage, "cache_tokens", 0)) if not cache_read_tokens: cache_read_tokens = _to_int(getattr(response_usage, "cached_tokens", 0)) ``` ### Option 2: Log raw API response for debugging Add debug logging to capture MiMo's actual usage structure: ```python logger.debug(f"MiMo usage response: {response.usage}") logger.debug(f"prompt_tokens_details: {getattr(response.usage, 'prompt_tokens_details', None)}") ``` ## Workaround Use Agent log or API official dashboard for billing analysis. Do not rely on `state.db` token values for MiMo provider. ## Environment - Hermes Agent: latest main - Provider: xiaomi (MiMo) - Model: mimo-v2.5 - Python: 3.11+ ## Related - #29553 — cache tokens missing from SSE events (different layer) - #41177 — Desktop UI shows 0% cache hit (downstream effect) - This issue focuses on **DB storage layer** specifically for MiMo provider

Code Example

SELECT input_tokens, cache_read_tokens FROM sessions WHERE id = 'e03df98b569c';
-- Result: 879344 | 0

---

# OpenAI-compatible API (MiMo uses this)
prompt_total = response.usage.prompt_tokens  # Total input (includes cache)
cache_read_tokens = details.cached_tokens    # From prompt_tokens_details
input_tokens = prompt_total - cache_read_tokens - cache_write_tokens  # Miss

---

agent.session_input_tokens += canonical_usage.input_tokens
agent.session_cache_read_tokens += canonical_usage.cache_read_tokens

---

"input_tokens": agent.session_input_tokens,
"cache_read_tokens": agent.session_cache_read_tokens,

---

providers:
  xiaomi:
    base_url: https://token-plan-cn.xiaomimimo.com/v1
    model: mimo-v2.5

---

SELECT input_tokens, cache_read_tokens FROM sessions ORDER BY started_at DESC LIMIT 1;

---

# MiMo-specific: check for alternative cache fields
if not cache_read_tokens:
    cache_read_tokens = _to_int(getattr(response_usage, "cache_tokens", 0))
if not cache_read_tokens:
    cache_read_tokens = _to_int(getattr(response_usage, "cached_tokens", 0))

---

logger.debug(f"MiMo usage response: {response.usage}")
logger.debug(f"prompt_tokens_details: {getattr(response.usage, 'prompt_tokens_details', None)}")

Bug Description

Hermes state.db records incorrect token values for MiMo (xiaomi) provider:

input_tokens is ~3.9x higher than actual API miss tokens
cache_read_tokens is always 0 despite cache being active (90%+ hit rate confirmed by API)

This makes state.db unusable for billing analysis and cache hit rate calculation.

Evidence

API Official Data (from MiMo dashboard)

Total Input: 2,318,203 tokens
Cache Hit: 2,070,400 tokens (89.3%)
Cache Miss: 247,803 tokens

Hermes Agent Log

Total prompt_tokens (cumulative): 2,229,419 tokens
Cache Hit (from log): 1,981,824 tokens
Cache Miss (calculated): 226,818 tokens

Hermes state.db

SELECT input_tokens, cache_read_tokens FROM sessions WHERE id = 'e03df98b569c';
-- Result: 879344 | 0

DB shows input_tokens=879,344 (should be ~247,803) and cache_read_tokens=0 (should be ~2,070,400)

Root Cause Analysis

Code Flow

API Response Parsing (agent/usage_pricing.py:738-757):

# OpenAI-compatible API (MiMo uses this)
prompt_total = response.usage.prompt_tokens  # Total input (includes cache)
cache_read_tokens = details.cached_tokens    # From prompt_tokens_details
input_tokens = prompt_total - cache_read_tokens - cache_write_tokens  # Miss

Session Accumulation (agent/conversation_loop.py:1607-1610):

agent.session_input_tokens += canonical_usage.input_tokens
agent.session_cache_read_tokens += canonical_usage.cache_read_tokens

DB Storage (agent/turn_finalizer.py:341-344):

"input_tokens": agent.session_input_tokens,
"cache_read_tokens": agent.session_cache_read_tokens,

Hypothesis

MiMo API does not return prompt_tokens_details.cached_tokens (or uses a non-standard field name), causing:

normalize_usage() returns cache_read_tokens=0
input_tokens = prompt_total - 0 = prompt_total (treats all input as miss)
DB accumulates prompt_total instead of actual miss

This explains why:

DB input_tokens (879,344) is much higher than actual miss (247,803)
DB cache_read_tokens is 0 despite 90%+ cache hit rate

Steps to Reproduce

Configure Hermes with MiMo provider:

providers:
  xiaomi:
    base_url: https://token-plan-cn.xiaomimimo.com/v1
    model: mimo-v2.5

Run a multi-turn conversation with WebUI or Gateway
Check state.db:

SELECT input_tokens, cache_read_tokens FROM sessions ORDER BY started_at DESC LIMIT 1;

Compare with MiMo API dashboard data — values will not match

Expected Behavior

state.db should record:

input_tokens = actual miss tokens (new KV computation)
cache_read_tokens = actual cache hit tokens

Actual Behavior

state.db records:

input_tokens ≈ cumulative prompt_tokens (includes cache)
cache_read_tokens = 0

Proposed Fix

Option 1: Add MiMo-specific usage parsing

Check if MiMo returns cache data in a non-standard field. Add detection in normalize_usage():

# MiMo-specific: check for alternative cache fields
if not cache_read_tokens:
    cache_read_tokens = _to_int(getattr(response_usage, "cache_tokens", 0))
if not cache_read_tokens:
    cache_read_tokens = _to_int(getattr(response_usage, "cached_tokens", 0))

Option 2: Log raw API response for debugging

Add debug logging to capture MiMo's actual usage structure:

logger.debug(f"MiMo usage response: {response.usage}")
logger.debug(f"prompt_tokens_details: {getattr(response.usage, 'prompt_tokens_details', None)}")

Workaround

Use Agent log or API official dashboard for billing analysis. Do not rely on state.db token values for MiMo provider.

Environment

Hermes Agent: latest main
Provider: xiaomi (MiMo)
Model: mimo-v2.5
Python: 3.11+

#29553 — cache tokens missing from SSE events (different layer)
#41177 — Desktop UI shows 0% cache hit (downstream effect)
This issue focuses on DB storage layer specifically for MiMo provider

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: state.db input_tokens and cache_read_tokens incorrectly recorded for MiMo/xiaomi provider

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root Cause Analysis

Fix Action

Workaround

Code Example

Bug Description

Evidence

API Official Data (from MiMo dashboard)

Hermes Agent Log

Hermes state.db

Root Cause Analysis

Code Flow

Hypothesis

Steps to Reproduce

Expected Behavior

Actual Behavior

Proposed Fix

Option 1: Add MiMo-specific usage parsing

Option 2: Log raw API response for debugging

Workaround

Environment

Related

Still need to ship something?

TRENDING