hermes - 💡(How to fix) Fix Long-session context, memory, gateway, and Open WebUI stability improvements

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

This is a field report from a real Hermes Agent local installation used for long-running project work through the local API server and Open WebUI. The main reliability issues are long-session context growth, tool-output noise, late/fragile compression, repeated memory failures, unclear timeout states, and large session storage growth.

Reporter: @Arentai86 / OpenZen

Error Message

Auxiliary compression: connection error on auto and no fallback available

  • stop repeated memory attempts after the same limit error

Root Cause

Hermes is useful for serious local project work, but long sessions currently require too much manual recovery and context hygiene from the user. Better defaults around compression, tool-output summarization, memory failure handling, visible status, and session maintenance would make Hermes much more reliable as a daily local agent runtime.

Fix Action

Fix / Workaround

  • long shell outputs
  • repeated build logs
  • failed patch attempts
  • repeated missing-file errors
  • repeated tool failures
  • completed tool-call results
  • old task state that is no longer needed verbatim

Code Example

compression:
  enabled: true
  threshold: 0.5
  target_ratio: 0.2
  protect_last_n: 20
  hygiene_hard_message_limit: 400
  protect_first_n: 3

---

tool_output:
  max_bytes: 50000
  max_lines: 2000
  max_line_length: 2000

---

No response from provider for 300s (non-streaming, model: gpt-5.5). Aborting call.
API call failed (attempt 1/3): TimeoutError
Provider: openai-codex
Endpoint: https://chatgpt.com/backend-api/codex

---

Auxiliary compression: connection error on auto and no fallback available
Failed to generate context summary

---

Memory at 1,361/1,375 chars. Adding this entry would exceed the limit.
Replacement would put memory over the limit.
Tool loop warning: same_tool_failure_warning; memory

---

display:
  runtime_footer:
    enabled: false
  cleanup_progress: false

---

model:
  context_length: 272000

compression:
  enabled: true
  threshold: 0.32
  target_ratio: 0.12
  protect_last_n: 8
  protect_first_n: 2
  hygiene_hard_message_limit: 120
  abort_on_summary_failure: false

tool_output:
  max_bytes: 24000
  max_lines: 600
  max_line_length: 1200

auxiliary:
  compression:
    provider: openai-codex
    model: gpt-5.5
    base_url: https://chatgpt.com/backend-api/codex
    timeout: 600
    context_length: 272000

memory:
  memory_char_limit: 3200
  user_char_limit: 2400
  nudge_interval: 20
  flush_min_turns: 10

sessions:
  auto_prune: true
  retention_days: 30
  vacuum_after_prune: true
  min_interval_hours: 12

display:
  persistent_output_max_lines: 80
  cleanup_progress: true
  runtime_footer:
    enabled: true
    fields:
      - model
      - context_pct
      - cwd
RAW_BUFFERClick to expand / collapse

Summary

This is a field report from a real Hermes Agent local installation used for long-running project work through the local API server and Open WebUI. The main reliability issues are long-session context growth, tool-output noise, late/fragile compression, repeated memory failures, unclear timeout states, and large session storage growth.

Reporter: @Arentai86 / OpenZen

Environment

  • OS: macOS
  • Installation path: custom Hermes Agent installer
  • UI/API path: Open WebUI connected to Hermes Agent
  • Local API server: http://127.0.0.1:8642
  • Hermes home: ~/.hermes
  • Runtime source: ~/.hermes/runtime/server-source
  • Config: ~/.hermes/config.yaml
  • Session storage:
    • ~/.hermes/state.db
    • ~/.hermes/sessions

Observed local state before tuning:

  • ~/.hermes was about 2.4G
  • ~/.hermes/state.db was about 629M
  • ~/.hermes/sessions was about 622M
  • Around 3000 session files existed
  • Some session JSON files were larger than 10M

Problems Observed

1. Context grows too fast during normal project work

Hermes keeps too much old operational material in the active context:

  • long shell outputs
  • repeated build logs
  • failed patch attempts
  • repeated missing-file errors
  • repeated tool failures
  • completed tool-call results
  • old task state that is no longer needed verbatim

Expected:

  • Completed work is compacted into concise summaries.
  • Large tool outputs are pruned from the active prompt.
  • Only recent actionable state remains verbatim.
  • Long project chats remain usable for hours.

Actual:

  • Context usage rises quickly.
  • After 20-30 minutes, the agent becomes slow or appears stuck.
  • User has to start a new chat or repeatedly ask what happened.

2. Default compression settings are too conservative for long chats

Default-style settings observed locally:

compression:
  enabled: true
  threshold: 0.5
  target_ratio: 0.2
  protect_last_n: 20
  hygiene_hard_message_limit: 400
  protect_first_n: 3

Problems:

  • threshold: 0.5 waits until the prompt is already very large.
  • protect_last_n: 20 pins too much recent noise.
  • hygiene_hard_message_limit: 400 allows huge chat histories before forced cleanup.
  • The gateway hygiene path fires very late for real-world project sessions.

Expected:

  • Long-running coding/project chats compress earlier.
  • Old tool-heavy turns become eligible for cleanup sooner.
  • Message-count safety valves fire before the session is nearly unusable.

3. Tool output caps are too high for project sessions

Default-style settings observed locally:

tool_output:
  max_bytes: 50000
  max_lines: 2000
  max_line_length: 2000

Problems:

  • A single tool call can add a large amount of low-value output.
  • Repeated build/test/search commands quickly dominate context.
  • Failed tool calls can store enough text to make later recovery harder.

Expected:

  • Tool outputs are aggressively capped in active context.
  • Full output can remain available in logs/files.
  • The active prompt receives summaries rather than raw bulk output.
  • Repeated similar outputs are deduplicated or replaced with compact status.

4. Agent appears frozen on provider/API timeouts

Logs showed long waits such as:

No response from provider for 300s (non-streaming, model: gpt-5.5). Aborting call.
API call failed (attempt 1/3): TimeoutError
Provider: openai-codex
Endpoint: https://chatgpt.com/backend-api/codex

Expected:

  • UI/API clearly shows whether the agent is waiting for model, compacting context, retrying, or failed.
  • Long provider stalls do not look like silent hangs.
  • The agent produces a recoverable final status if the model call times out after tools have run.

Actual:

  • Open WebUI can make Hermes look frozen.
  • User cannot easily tell whether the agent is still working, retrying, compacting, or dead.

5. Auxiliary compression can fail or route poorly

Logs showed auxiliary compression attempting unavailable or unhealthy providers, then failing:

Auxiliary compression: connection error on auto and no fallback available
Failed to generate context summary

Expected:

  • Context compression uses the same working provider/profile as the main agent unless explicitly configured otherwise.
  • If auxiliary summarization fails, Hermes falls back to a safe static summary or deterministic compaction path.
  • Compression failure does not leave the session stuck at a huge context size.

Actual:

  • Compression may be delayed or fail.
  • Large context remains active, increasing the chance of more timeouts.

6. Memory tool can repeatedly fail and add noise

Observed repeated memory errors:

Memory at 1,361/1,375 chars. Adding this entry would exceed the limit.
Replacement would put memory over the limit.
Tool loop warning: same_tool_failure_warning; memory

Expected:

  • Memory writes self-compact or replace older memory automatically.
  • Repeated failed memory writes stop quickly.
  • Memory errors do not pollute active project context.

Actual:

  • Hermes repeatedly attempts memory operations that cannot fit.
  • The failures add more noise to the session and logs.

7. Session database and transcript storage grow without enough automatic cleanup

Observed:

  • ~/.hermes/state.db about 629M
  • ~/.hermes/sessions about 622M
  • Around 3000 session files
  • sessions.auto_prune disabled
  • sessions.retention_days set high

Expected:

  • Old sessions are pruned automatically by default or after installer setup.
  • Database is vacuumed after pruning.
  • Large historical transcripts do not degrade current runtime behavior.

Actual:

  • Session storage grows substantially.
  • Long-running installations accumulate a lot of old data.

8. UI does not surface context cleanup clearly enough

Default-style settings observed locally:

display:
  runtime_footer:
    enabled: false
  cleanup_progress: false

Expected:

  • UI shows context usage.
  • UI shows when cleanup/compaction is running.
  • User sees a clear status instead of assuming the agent froze.

Actual:

  • The agent may be compacting, retrying, or waiting, but the user sees little useful status.

Suggested Improvements

  1. Tune default compression for long project chats:
    • compress around 30-40% context usage
    • keep fewer recent messages verbatim
    • reduce hard message-count threshold
    • preserve only critical head messages plus rolling summary
  2. Add stricter tool-output context hygiene:
    • lower default tool_output.max_bytes
    • lower default tool_output.max_lines
    • cap line length more aggressively
    • summarize long command/build/test outputs before adding to active context
  3. Make completed tool phases compact automatically:
    • replace finished tool call groups with concise summaries
    • keep file paths, changed files, commands run, and final status
    • remove raw output once it is no longer needed for the next step
  4. Improve auxiliary compression routing:
    • default auxiliary compression to the active working provider/profile
    • avoid unavailable providers in auto
    • use deterministic fallback summary when LLM summarization fails
    • do not let failed summarization block future turns indefinitely
  5. Improve timeout and stuck-turn handling:
    • show visible waiting for model, retrying, compacting, and failed states
    • after provider timeout, return a clear final status to the user
    • preserve completed tool results and tell the user what was done
  6. Improve memory behavior:
    • compact memory automatically before writing
    • stop repeated memory attempts after the same limit error
    • keep memory failures out of the main project context when possible
  7. Enable safer session maintenance defaults:
    • enable session auto-prune
    • reduce retention for local transient sessions
    • vacuum the database after prune
    • provide a doctor warning when session storage becomes very large
  8. Improve Open WebUI/API integration status:
    • expose context usage in response metadata
    • expose cleanup/compaction status
    • make long-running requests visibly alive
    • avoid silent hangs when the backend is retrying

Example Safer Defaults

These values worked better for long local project sessions:

model:
  context_length: 272000

compression:
  enabled: true
  threshold: 0.32
  target_ratio: 0.12
  protect_last_n: 8
  protect_first_n: 2
  hygiene_hard_message_limit: 120
  abort_on_summary_failure: false

tool_output:
  max_bytes: 24000
  max_lines: 600
  max_line_length: 1200

auxiliary:
  compression:
    provider: openai-codex
    model: gpt-5.5
    base_url: https://chatgpt.com/backend-api/codex
    timeout: 600
    context_length: 272000

memory:
  memory_char_limit: 3200
  user_char_limit: 2400
  nudge_interval: 20
  flush_min_turns: 10

sessions:
  auto_prune: true
  retention_days: 30
  vacuum_after_prune: true
  min_interval_hours: 12

display:
  persistent_output_max_lines: 80
  cleanup_progress: true
  runtime_footer:
    enabled: true
    fields:
      - model
      - context_pct
      - cwd

Acceptance Criteria

  • Hermes can work in one project chat for hours without hitting context failure.
  • Completed tool work is automatically compacted out of active context.
  • Large tool outputs do not dominate future prompts.
  • Compression starts before the session is already near failure.
  • Memory failures do not repeat endlessly.
  • User sees clear status during compaction/retry/model wait.
  • Open WebUI/API requests do not look silently frozen.
  • Old sessions are pruned/vacuumed automatically or clearly flagged by doctor checks.

Why this matters

Hermes is useful for serious local project work, but long sessions currently require too much manual recovery and context hygiene from the user. Better defaults around compression, tool-output summarization, memory failure handling, visible status, and session maintenance would make Hermes much more reliable as a daily local agent runtime.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING