hermes - 💡(How to fix) Fix [Setup]: How do I get token usage tracking working with a local OpenAI-compatible provider (LocalAI) on v0.14.0?

Code Example

curl -sN https://localai.example.internal/v1/chat/completions \
  -H "Authorization: Bearer $LOCALAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"production","messages":[{"role":"user","content":"say hi"}],"stream":true,"stream_options":{"include_usage":true}}' | tail -2

---

data: {"...","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":209,"total_tokens":221}}
data: [DONE]

---

model:
  default: production
  provider: LocalAI
  context_length: 172000

providers:
  LocalAI:
    api_key: ${LOCALAI_API_KEY}
    base_url: https://localai.example.internal/v1
    extra_body:
      stream_options:
        include_usage: true
    models:
      production:
        context_length: 172000
        extra_body:
          stream_options:
            include_usage: true
      production-embedding:
        context_length: 8192
      production-tiny:
        context_length: 64000

---

📊 Session Token Usage
────────────────────────────────────────
Model:                     production
Input tokens:                       0
[...all counters 0...]
API calls:                          1
Cost status:                 unknown
────────────────────────────────────────
Current context:  0 / 172,000 (0%)

---



---

See above

What's Going Wrong?

What I'm trying to do

Get /usage to show real token counts when chatting against a LocalAI backend. Currently all counters read 0 and Cost status: unknown, even though the model is responding correctly and the context bar displays the right ceiling (172K).

What I've tried

LocalAI requires stream_options: {include_usage: true} to emit the final usage chunk on streamed responses (standard OpenAI streaming spec behavior). I confirmed LocalAI honors this with a direct curl:

curl -sN https://localai.example.internal/v1/chat/completions \
  -H "Authorization: Bearer $LOCALAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"production","messages":[{"role":"user","content":"say hi"}],"stream":true,"stream_options":{"include_usage":true}}' | tail -2

Returns:

data: {"...","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":209,"total_tokens":221}}
data: [DONE]

So LocalAI's side is fine. The question is how to get Hermes to send stream_options.include_usage on the main-model request.

I tried adding extra_body to my providers: block in two places (config below), but /usage still reads zero in both cases. The config loads without errors and hermes config show reflects the change.

Current config (`/opt/data/config.yaml`)

model:
  default: production
  provider: LocalAI
  context_length: 172000

providers:
  LocalAI:
    api_key: ${LOCALAI_API_KEY}
    base_url: https://localai.example.internal/v1
    extra_body:
      stream_options:
        include_usage: true
    models:
      production:
        context_length: 172000
        extra_body:
          stream_options:
            include_usage: true
      production-embedding:
        context_length: 8192
      production-tiny:
        context_length: 64000

I also tried extra_body.chat_template_kwargs per the providers documentation example — same result, no effect on /usage.

What `/usage` shows

📊 Session Token Usage
────────────────────────────────────────
Model:                     production
Input tokens:                       0
[...all counters 0...]
API calls:                          1
Cost status:                 unknown
────────────────────────────────────────
Current context:  0 / 172,000 (0%)

LocalAI logs during a Hermes-initiated request show clean prompt processing (34,226 tokens through the slot) but no final usage chunk emission — consistent with include_usage not being in the request body Hermes sends.

Environment

Hermes Agent v0.14.0 (2026.5.16), Docker, Up to date
Python 3.13.5, OpenAI SDK 2.24.0
LocalAI behind reverse proxy at https://localai.example.internal/v1
Model: custom production (llama-cpp backend, 172K context)

Questions

Is extra_body under providers: supposed to be forwarded on the main-model request, or does the "auxiliary/compression/fallback only" note from the configuration docs also apply here?
If it's not the right knob, is there a supported way to either (a) force stream_options.include_usage on streamed requests, or (b) disable HTTP-layer streaming so usage comes back in the non-streamed response body (which works fine on curl)?
Is there an env var I'm missing (HERMES_*) that controls this?

Happy to provide more logs / try patches. Thanks!

Steps Taken

See above,

Installation Method

Docker

Operating System

Ubuntu 24.04

Python Version

No response

Hermes Version

v0.14.0

Debug Report

Full Error Output

See above

What I've Already Tried

No response

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Setup]: How do I get token usage tracking working with a local OpenAI-compatible provider (LocalAI) on v0.14.0?

Recommended Tools

GitHub issue graph ai analysis

Error Message

Full Error Output

Fix Action

Fix / Workaround

Code Example

What's Going Wrong?

What I'm trying to do

What I've tried

Current config (`/opt/data/config.yaml`)

What `/usage` shows

Environment

Questions

Steps Taken

Installation Method

Operating System

Python Version

Hermes Version

Debug Report

Full Error Output

What I've Already Tried

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Setup]: How do I get token usage tracking working with a local OpenAI-compatible provider (LocalAI) on v0.14.0?

Recommended Tools

GitHub issue graph ai analysis

Error Message

Full Error Output

Fix Action

Fix / Workaround

Code Example

What's Going Wrong?

What I'm trying to do

What I've tried

Current config (/opt/data/config.yaml)

What /usage shows

Environment

Questions

Steps Taken

Installation Method

Operating System

Python Version

Hermes Version

Debug Report

Full Error Output

What I've Already Tried

Still need to ship something?

TRENDING

Current config (`/opt/data/config.yaml`)

What `/usage` shows