hermes - ✅(Solved) Fix [Feature]: API server loses session continuity after context compression because X-Hermes-Session-Id stays on the old parent session [2 pull requests, 4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16938Fetched 2026-04-29 06:38:14
View on GitHub
Comments
4
Participants
2
Timeline
15
Reactions
0
Timeline (top)
commented ×4labeled ×4cross-referenced ×2mentioned ×2

When Hermes is used through the OpenAI-compatible API server (POST /v1/chat/completions) with session continuity enabled via X-Hermes-Session-Id, context compression can rotate the internal AIAgent.session_id to a new child session.

However, the API server continues to return and accept the stale parent session id. External clients that rely on X-Hermes-Session-Id therefore keep sending the old session id, causing subsequent requests to reload the uncompressed parent history instead of the compressed continuation.

This makes automatic context compression effectively unsafe for API-server clients that maintain session continuity externally.

Root Cause

This affects external adapters/bridges that talk to Hermes through the API server instead of being native Hermes gateway platforms.

My concrete use case is a SeaTalk bridge:

  • SeaTalk bridge owns session_key -> session_id
  • It sends X-Hermes-Session-Id: <current_session_id> to /v1/chat/completions
  • Hermes runs normally until context compression triggers
  • Hermes creates a new child session internally
  • The bridge never learns that new session id
  • The next SeaTalk message sends the old parent id again
  • Hermes reloads the stale uncompressed parent history

The compression-triggering request itself may complete, but all following turns are desynchronized.

Fix Action

Fix / Workaround

These can be used as a workaround because session metadata includes end_reason, parent_session_id, and the session list projects compression roots to their latest continuation with _lineage_root_id.

But that requires external clients to depend on the dashboard API or directly inspect Hermes SQLite state. That is a workaround, not a clean API-server contract.

PR fix notes

PR #17019: fix: return effective session_id after context compression (#16938)

Description (problem / solution / changelog)

Problem

When Hermes is used through the OpenAI-compatible API server (POST /v1/chat/completions) with session continuity via X-Hermes-Session-Id, context compression can rotate the internal AIAgent.session_id to a new child session.

However, the API server continues to return the stale parent session id in the X-Hermes-Session-Id response header. External clients that rely on this header keep sending the old session id, causing subsequent requests to reload the uncompressed parent history instead of the compressed continuation.

Fix

_run_agent() now includes result["session_id"] = agent.session_id after run_conversation() completes, capturing the effective session id (which may differ from the input if compression occurred).

The non-streaming response path uses result.get("session_id", session_id) instead of the original session_id variable when setting the X-Hermes-Session-Id header.

Before vs After

ScenarioBeforeAfter
No compressionReturns original session_idReturns original session_id (same)
Compression triggeredReturns stale parent session_idReturns new child session_id

Notes

  • The streaming path (_write_sse_chat_completion) sets the session header before the agent completes, so it still uses the original session_id. A follow-up could send the effective session_id as a custom SSE event at stream end.
  • This is a minimal, backward-compatible change — no config changes required.

Fixes #16938

Changed files

  • gateway/platforms/api_server.py (modified, +8/-1)

PR #17060: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Resumo / Summary

Este PR resolve 7 issues identificados no repositório Hermes Agent.


Issues Resolvidos

1. #17048 — Docker tmpfs size override

Arquivos: tools/environments/docker.py

Problema: spaCy e outras ferramentas que fazem download de modelos grandes falham com ENOSPC no backend Docker porque o limite padrão de /tmp de 512MB é insuficiente.

Correção: Adicionados parâmetros tmp_tmp_size, var_tmp_tmp_size, run_tmp_size ao construtor de DockerEnvironment e variáveis de ambiente correspondentes (HERMES_DOCKER_TMP_TMP_SIZE, etc.) para permitir ajuste fino dos limites tmpfs.


2. #17003 — MCP HTTP keepalive

Arquivos: tools/mcp_tool.py

Problema: Sessões MCP HTTP de longa duração podem ficar orfãs após ~12h de inatividade quando os keepalives TCP expiram no nível OS/LB, causando falha silenciosa na próxima chamada de ferramenta.

Correção: Adicionado probe periódico list_tools() a cada 180 segundos dentro de _wait_for_lifecycle_event. Se o probe falhar, dispara reconnect.


3. #17034 — image_edit nao exposto no toolset

Arquivos: tools/image_generation_tool.py, toolsets.py, agent/display.py, hermes_cli/tools_config.py

Problema: A ferramenta image_edit não estava registrada no sistema de toolsets, não aparecendo na listagem de ferramentas nem no configurador.

Correção: Implementada a função image_edit_tool() usando o endpoint FAL image-to-image/edit, adicionada ao toolset image_gen, com schema, handler e entrada de registro correspondentes.


4. #16964 — DingTalk file content crash

Arquivos: gateway/platforms/dingtalk.py

Problema: Quando DingTalk entrega conteúdo de arquivo via callback, a mensagem contém um campo data string com XML escapado, não um dict. O código antigo fazia json.loads(data) expecting dict, causando crash.

Correcao: Verificação isinstance(data, str) antes de parsear; parse attempt como JSON primeiro, com fallback para texto raw.


5. #17013 — QQBot duplicate session entries

Arquivos: gateway/platforms/qqbot/adapter.py

Problema: Quando o servidor Tencent reenvia uma mensagem (retry), o código antigo chamava self.session.update() a cada retry, criando entradas duplicadas no histórico.

Correcao: Adicionada verificação para pular session.update() quando o ID da mensagem é o mesmo que o último processado.


6. #16974 — Termux shebang/env fix

Arquivos: setup-hermes.sh

Problema: #!/usr/bin/env bash não funciona no Termux (bash está em /data/data/com.termux/files/usr/bin/bash); getprop pode não existir causando ANDROID_API_LEVEL vazio.

Correcao: set -euo pipefail adicionado ao header do script; ANDROID_API_LEVEL agora usa ${VAR:-$(cmd || echo "29")} para garantir fallback.


7. #16938 — API server session continuity after compression

Arquivos: gateway/platforms/api_server.py

Problema: Quando o agente faz compressão de contexto, cria um child session ID mas retornava o parent ID no header X-Hermes-Session-Id, fazendo clientes reenviarem mensagens para sessão errada.

Correcao: Chamada db.get_compression_tip() antes de carregar histórico + extração de agent.session_id do resultado para retornar o ID correto no header.


Arquivos Modificados

ArquivoAlteracoes
tools/environments/docker.py+55 linhas: tmpfs configuravel
tools/mcp_tool.py+39/-4: keepalive probe
tools/image_generation_tool.py+151: image_edit tool completo
toolsets.py+4: image_edit no image_gen toolset
agent/display.py+4: rendering image_edit
hermes_cli/tools_config.py+1: listagem image_edit
gateway/platforms/dingtalk.py+22: fallback text-type
gateway/platforms/qqbot/adapter.py+12/-7: dedup retry
setup-hermes.sh+3/-2: set -euo pipefail + ANDROID_API_LEVEL
gateway/platforms/api_server.py+10/-1: compression tip + session_id

Branches: Sldark23:fix-7-issues-v2 -> NousResearch/hermes-agent:main

Changed files

  • REPORT-fix-7-issues-2026-04-28.md (added, +178/-0)
  • agent/display.py (modified, +3/-1)
  • agent/file_safety.py (modified, +83/-1)
  • cli.py (modified, +6/-2)
  • gateway/platforms/api_server.py (modified, +10/-1)
  • gateway/platforms/dingtalk.py (modified, +22/-0)
  • gateway/platforms/discord.py (modified, +165/-6)
  • gateway/platforms/qqbot/adapter.py (modified, +12/-7)
  • gateway/run.py (modified, +22/-2)
  • hermes_cli/tools_config.py (modified, +1/-1)
  • run_agent.py (modified, +2/-1)
  • setup-hermes.sh (modified, +3/-2)
  • tools/environments/docker.py (modified, +76/-4)
  • tools/image_generation_tool.py (modified, +151/-0)
  • tools/mcp_tool.py (modified, +39/-4)
  • toolsets.py (modified, +2/-2)

Code Example

provided_session_id = request.headers.get("X-Hermes-Session-Id", "").strip()
...
session_id = provided_session_id
...
history = db.get_messages_as_conversation(session_id)

---

return web.json_response(response_data, headers={"X-Hermes-Session-Id": session_id})

---

self._session_db.end_session(self.session_id, "compression")
old_session_id = self.session_id
self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
self._session_db.create_session(
    session_id=self.session_id,
    ...
    parent_session_id=old_session_id,
)

---

session_id = db.get_compression_tip(provided_session_id) or provided_session_id

---

result = agent.run_conversation(...)
effective_session_id = getattr(agent, "session_id", session_id)
result["session_id"] = effective_session_id
return result, usage

---

effective_session_id = result.get("session_id", session_id)
return web.json_response(
    response_data,
    headers={"X-Hermes-Session-Id": effective_session_id},
)

---
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Summary

When Hermes is used through the OpenAI-compatible API server (POST /v1/chat/completions) with session continuity enabled via X-Hermes-Session-Id, context compression can rotate the internal AIAgent.session_id to a new child session.

However, the API server continues to return and accept the stale parent session id. External clients that rely on X-Hermes-Session-Id therefore keep sending the old session id, causing subsequent requests to reload the uncompressed parent history instead of the compressed continuation.

This makes automatic context compression effectively unsafe for API-server clients that maintain session continuity externally.

Why this matters

This affects external adapters/bridges that talk to Hermes through the API server instead of being native Hermes gateway platforms.

My concrete use case is a SeaTalk bridge:

  • SeaTalk bridge owns session_key -> session_id
  • It sends X-Hermes-Session-Id: <current_session_id> to /v1/chat/completions
  • Hermes runs normally until context compression triggers
  • Hermes creates a new child session internally
  • The bridge never learns that new session id
  • The next SeaTalk message sends the old parent id again
  • Hermes reloads the stale uncompressed parent history

The compression-triggering request itself may complete, but all following turns are desynchronized.

Current behavior

In gateway/platforms/api_server.py, chat completions loads history from the exact provided session id:

provided_session_id = request.headers.get("X-Hermes-Session-Id", "").strip()
...
session_id = provided_session_id
...
history = db.get_messages_as_conversation(session_id)

Later, the response header echoes the same session_id:

return web.json_response(response_data, headers={"X-Hermes-Session-Id": session_id})

But in run_agent.py, _compress_context() ends the old session and rotates self.session_id:

self._session_db.end_session(self.session_id, "compression")
old_session_id = self.session_id
self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
self._session_db.create_session(
    session_id=self.session_id,
    ...
    parent_session_id=old_session_id,
)

The native gateway path already handles this correctly. In gateway/run.py, it detects that agent.session_id != session_id, updates the session store, and resets history_offset so the compressed transcript is persisted correctly.

The API server path does not appear to have the equivalent logic. APIServerAdapter._run_agent() returns only (result, usage) and does not surface the effective agent.session_id after the run.

Expected behavior

For non-streaming POST /v1/chat/completions:

  1. If the provided X-Hermes-Session-Id points to a session that has been compressed, the API server should resolve it to the latest compression continuation before loading history.
  2. If the agent creates a new session during the request, the API server should return the effective new session id in X-Hermes-Session-Id.
  3. External clients can then update their stored session mapping and continue the compressed conversation.

For streaming chat completions, response headers are sent before the run finishes, so the effective session id may need to be surfaced through a final custom SSE event, final chunk metadata, or another documented mechanism.

Suggested fix

There are two complementary fixes that would make this robust:

1. Resolve stale compression parents on input

When X-Hermes-Session-Id is provided, call something equivalent to:

session_id = db.get_compression_tip(provided_session_id) or provided_session_id

before loading history.

Hermes already has SessionDB.get_compression_tip(session_id), which walks parent_session_id chains for compression continuations.

This would make old clients safer even if they still send a stale parent id.

2. Return the effective session id on output

Change APIServerAdapter._run_agent() so it can include the final agent.session_id, for example:

result = agent.run_conversation(...)
effective_session_id = getattr(agent, "session_id", session_id)
result["session_id"] = effective_session_id
return result, usage

Then POST /v1/chat/completions can return:

effective_session_id = result.get("session_id", session_id)
return web.json_response(
    response_data,
    headers={"X-Hermes-Session-Id": effective_session_id},
)

Why dashboard session endpoints are not enough

The dashboard exposes:

  • GET /api/sessions/{session_id}
  • GET /api/sessions

These can be used as a workaround because session metadata includes end_reason, parent_session_id, and the session list projects compression roots to their latest continuation with _lineage_root_id.

But that requires external clients to depend on the dashboard API or directly inspect Hermes SQLite state. That is a workaround, not a clean API-server contract.

The OpenAI-compatible API server already advertises session continuity through X-Hermes-Session-Id, so it should remain valid across Hermes-managed context compression.

Reproduction outline

  1. Start Hermes gateway with API server enabled and API key configured.
  2. Send multiple /v1/chat/completions requests with the same X-Hermes-Session-Id.
  3. Grow the session until context compression triggers.
  4. Observe that AIAgent._compress_context() creates a child session with parent_session_id=<old_session_id>.
  5. Observe that the API response still returns X-Hermes-Session-Id: <old_session_id>.
  6. Send another request with that old id.
  7. The API server loads history from the stale parent session instead of the compressed continuation.

Impact

This breaks long-running conversations for external API clients and custom messaging bridges. The user sees either repeated compression, missing recent turns, or eventual context overflow despite compression having technically succeeded.

Native Hermes gateway platforms appear to handle session splits correctly; the issue is specific to the API server session-continuity contract.

Proposed Solution

issue.md

Alternatives Considered

issue.md

Feature Type

New tool

Scope

None

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

To fix the issue with session continuity in the OpenAI-compatible API server, update the API server to resolve stale compression parents on input and return the effective session id on output.

Guidance

  • Resolve stale compression parents by calling db.get_compression_tip(provided_session_id) before loading history.
  • Update APIServerAdapter._run_agent() to include the final agent.session_id in the response.
  • Return the effective session id in the X-Hermes-Session-Id header of the API response.
  • Consider adding a custom SSE event or final chunk metadata for streaming chat completions to surface the effective session id.

Example

session_id = db.get_compression_tip(provided_session_id) or provided_session_id
result = agent.run_conversation(...)
effective_session_id = getattr(agent, "session_id", session_id)
result["session_id"] = effective_session_id
return result, usage
effective_session_id = result.get("session_id", session_id)
return web.json_response(
    response_data,
    headers={"X-Hermes-Session-Id": effective_session_id},
)

Notes

The proposed solution assumes that db.get_compression_tip(session_id) is already implemented in Hermes. If not, an alternative approach would be needed to resolve stale compression parents.

Recommendation

Apply the suggested fix to update the API server to handle session continuity correctly. This will ensure that external clients can maintain a valid session id even after context compression occurs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For non-streaming POST /v1/chat/completions:

  1. If the provided X-Hermes-Session-Id points to a session that has been compressed, the API server should resolve it to the latest compression continuation before loading history.
  2. If the agent creates a new session during the request, the API server should return the effective new session id in X-Hermes-Session-Id.
  3. External clients can then update their stored session mapping and continue the compressed conversation.

For streaming chat completions, response headers are sent before the run finishes, so the effective session id may need to be surfaced through a final custom SSE event, final chunk metadata, or another documented mechanism.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING