hermes - ✅(Solved) Fix run_agent.py never calls should_compress_preflight() — LCM deferred maintenance is dead code [4 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20316Fetched 2026-05-06 06:37:22
View on GitHub
Comments
1
Participants
2
Timeline
12
Reactions
0
Author
Participants
Timeline (top)
labeled ×5cross-referenced ×4referenced ×2commented ×1

run_agent.py never invokes context_engine.should_compress_preflight(messages), which means the LCM plugin's deferred maintenance system (incremental compaction below the 75% threshold) never fires.

Root Cause

run_agent.py never invokes context_engine.should_compress_preflight(messages), which means the LCM plugin's deferred maintenance system (incremental compaction below the 75% threshold) never fires.

Fix Action

Fixed

PR fix notes

PR #1: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Description (problem / solution / changelog)

Summary

Fixes #20316. The preflight compression block in run_agent.py only checked the hardcoded threshold_tokens (75% of context), never delegating to ContextEngine.should_compress_preflight(). This meant the LCM plugin's incremental leaf compaction (triggered below 75% via raw_backlog debt tracking) never fired.

Root Cause

The preflight section (~line 10753) only fires compression when:

_preflight_tokens >= self.context_compressor.threshold_tokens

This threshold-only check bypasses the LCM engine's should_compress_preflight(messages) method entirely. The method exists specifically for "cheap pre-API-call checks" and handles raw_backlog debt tracking internally.

Fix

Add an elif block after the threshold overflow check that delegates to the engine:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(...)
        conversation_history = None

The built-in ContextCompressor.should_compress_preflight() returns False, so this is a no-op for non-LCM engines — backward compatible.

Impact

  • LCM_DEFERRED_MAINTENANCE_ENABLED=1 and LCM_LEAF_CHUNK_TOKENS=20000 env vars now actually work
  • Sessions below the 75% threshold but with high raw_backlog debt now get incremental leaf compaction
  • lcm_lifecycle_state should show non-zero maintenance attempts

Closes #20316

Changed files

  • run_agent.py (modified, +11/-0)

PR #3: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Description (problem / solution / changelog)

Summary

Fixes #20316. The preflight compression block in run_agent.py only checked the hardcoded threshold_tokens (75% of context), never delegating to ContextEngine.should_compress_preflight(). This meant the LCM plugin incremental leaf compaction (triggered below 75% via raw_backlog debt tracking) never fired.

Root Cause

The preflight section only fires compression when:

_preflight_tokens >= self.context_compressor.threshold_tokens

This bypasses LCM engine which has a separate should_compress_preflight(messages) method for sub-threshold deferred maintenance.

Fix

Add an elif block after the threshold overflow check:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(...)
        conversation_history = None

Built-in ContextCompressor returns False so this is backward compatible.

Additional Fixes (collateral test cleanups)

  • credential_pool.py: os.environ now correctly wins over .env (was reversed)
  • tui_gateway/server.py: ValueError handler now clears session["pending_title"]
  • test_concurrent_interrupt.py: Added missing _tool_guardrails to _Stub class
  • test_delegate.py: Updated 3 mock assertions for new target_model kwarg
  • test_daytona_environment.py / test_vercel_sandbox_environment.py: cd assertion matches actual builtin cd output

Closes #20316

Changed files

  • cron/scheduler.py (modified, +7/-2)
  • run_agent.py (modified, +11/-0)
  • tests/agent/test_bedrock_1m_context.py (modified, +1/-1)
  • tests/conftest.py (modified, +10/-0)
  • tests/gateway/test_discord_free_response.py (modified, +8/-4)
  • tests/hermes_cli/test_model_provider_persistence.py (modified, +3/-1)
  • tests/hermes_cli/test_model_validation.py (modified, +7/-2)
  • tests/hermes_cli/test_update_gateway_restart.py (modified, +18/-7)
  • tests/run_agent/test_concurrent_interrupt.py (modified, +7/-1)
  • tests/tools/test_daytona_environment.py (modified, +1/-1)
  • tests/tools/test_delegate.py (modified, +3/-3)
  • tests/tools/test_vercel_sandbox_environment.py (modified, +1/-1)
  • tui_gateway/server.py (modified, +13/-0)

PR #20424: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316)

Description (problem / solution / changelog)

Summary

  • run_conversation now consults ContextEngine.should_compress_preflight() when the request is below threshold_tokens, so engines like hermes-lcm can run incremental leaf-chunk compaction (or other deferred maintenance) without waiting for the 75% context fill cutoff.
  • Default ContextEngine.should_compress_preflight() still returns False — the built-in ContextCompressor is unaffected.
  • Exceptions raised by the engine hook are caught at debug level and treated as "skip preflight", so a buggy plugin can't break an otherwise-healthy turn.

Closes #20316

Testing

  • scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_fires_below_threshold tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_skipped_when_returns_false tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_exception_does_not_break_turn -q
▶ running pytest with 4 workers, hermetic env, in /tmp/hermes-r2-1-fix
  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)
bringing up nodes...
bringing up nodes...

...                                                                      [100%]
3 passed in 4.03s
  • scripts/run_tests.sh tests/agent/test_context_engine.py -q
...................                                                      [100%]
19 passed in 1.73s
  • scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_context_compression_triggered tests/run_agent/test_run_agent.py::TestRunConversation::test_glm_prompt_exceeds_max_length_triggers_compression -q
..                                                                       [100%]
2 passed in 6.34s

Changed files

  • run_agent.py (modified, +31/-0)
  • tests/run_agent/test_run_agent.py (modified, +136/-0)

PR #4: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Description (problem / solution / changelog)

Summary

Fixes #20316. The preflight compression block in run_agent.py only checked the hardcoded threshold_tokens (75% of context), never delegating to ContextEngine.should_compress_preflight(). This meant the LCM plugin incremental leaf compaction (triggered below 75% via raw_backlog debt tracking) never fired.

Root Cause

The preflight section only fires compression when:

_preflight_tokens >= self.context_compressor.threshold_tokens

This bypasses LCM engine which has a separate should_compress_preflight(messages) method for sub-threshold deferred maintenance.

Fix

Add an elif block after the threshold overflow check:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(...)
        conversation_history = None

Built-in ContextCompressor returns False so this is backward compatible.

Changes (rebased onto current main, 1 conflict resolved)

  • run_agent.py: add should_compress_preflight call
  • credential_pool.py: os.environ now correctly wins over .env
  • tui_gateway/server.py: ValueError handler clears pending_title
  • test_concurrent_interrupt.py: add missing _append_guardrail_observation
  • test_delegate.py: mock assertions updated for new target_model kwarg
  • test_daytona_environment.py / test_vercel_sandbox_environment.py: cd assertion matches actual output

Closes #20316

Changed files

  • README.md (modified, +2/-1)
  • README.zh-CN.md (added, +186/-0)
  • agent/auxiliary_client.py (modified, +67/-5)
  • agent/context_compressor.py (modified, +4/-1)
  • agent/credential_pool.py (modified, +1/-1)
  • agent/i18n.py (modified, +5/-2)
  • agent/memory_manager.py (modified, +6/-8)
  • agent/memory_provider.py (modified, +8/-9)
  • agent/model_metadata.py (modified, +11/-0)
  • agent/transports/__init__.py (modified, +13/-1)
  • agent/transports/chat_completions.py (modified, +158/-90)
  • agent/transports/types.py (modified, +15/-14)
  • cli.py (modified, +12/-0)
  • cron/scheduler.py (modified, +7/-2)
  • environments/README.md (modified, +1/-1)
  • gateway/platforms/api_server.py (modified, +117/-8)
  • gateway/platforms/discord.py (modified, +7/-2)
  • gateway/platforms/telegram.py (modified, +14/-16)
  • gateway/run.py (modified, +64/-28)
  • hermes_cli/auth.py (modified, +45/-0)
  • hermes_cli/config.py (modified, +45/-1)
  • hermes_cli/doctor.py (modified, +109/-26)
  • hermes_cli/kanban.py (modified, +209/-2)
  • hermes_cli/kanban_db.py (modified, +422/-107)
  • hermes_cli/kanban_diagnostics.py (added, +649/-0)
  • hermes_cli/main.py (modified, +426/-139)
  • hermes_cli/models.py (modified, +51/-0)
  • hermes_cli/plugins.py (modified, +35/-5)
  • hermes_cli/tips.py (modified, +1/-1)
  • hermes_state.py (modified, +39/-0)
  • locales/en.yaml (modified, +1/-1)
  • locales/fr.yaml (added, +24/-0)
  • locales/tr.yaml (added, +24/-0)
  • locales/uk.yaml (added, +24/-0)
  • optional-skills/mlops/flash-attention/SKILL.md (modified, +0/-4)
  • optional-skills/mlops/saelens/references/README.md (modified, +0/-1)
  • plugins/kanban/dashboard/dist/index.js (modified, +336/-237)
  • plugins/kanban/dashboard/dist/style.css (modified, +170/-0)
  • plugins/kanban/dashboard/plugin_api.py (modified, +237/-59)
  • plugins/memory/hindsight/__init__.py (modified, +145/-6)
  • plugins/model-providers/README.md (added, +70/-0)
  • plugins/model-providers/ai-gateway/__init__.py (added, +43/-0)
  • plugins/model-providers/ai-gateway/plugin.yaml (added, +5/-0)
  • plugins/model-providers/alibaba-coding-plan/__init__.py (added, +21/-0)
  • plugins/model-providers/alibaba-coding-plan/plugin.yaml (added, +5/-0)
  • plugins/model-providers/alibaba/__init__.py (added, +13/-0)
  • plugins/model-providers/alibaba/plugin.yaml (added, +5/-0)
  • plugins/model-providers/anthropic/__init__.py (added, +52/-0)
  • plugins/model-providers/anthropic/plugin.yaml (added, +5/-0)
  • plugins/model-providers/arcee/__init__.py (added, +13/-0)
  • plugins/model-providers/arcee/plugin.yaml (added, +5/-0)
  • plugins/model-providers/azure-foundry/__init__.py (added, +21/-0)
  • plugins/model-providers/azure-foundry/plugin.yaml (added, +5/-0)
  • plugins/model-providers/bedrock/__init__.py (added, +29/-0)
  • plugins/model-providers/bedrock/plugin.yaml (added, +5/-0)
  • plugins/model-providers/copilot-acp/__init__.py (added, +34/-0)
  • plugins/model-providers/copilot-acp/plugin.yaml (added, +5/-0)
  • plugins/model-providers/copilot/__init__.py (added, +58/-0)
  • plugins/model-providers/copilot/plugin.yaml (added, +5/-0)
  • plugins/model-providers/custom/__init__.py (added, +68/-0)
  • plugins/model-providers/custom/plugin.yaml (added, +5/-0)
  • plugins/model-providers/deepseek/__init__.py (added, +20/-0)
  • plugins/model-providers/deepseek/plugin.yaml (added, +5/-0)
  • plugins/model-providers/gemini/__init__.py (added, +72/-0)
  • plugins/model-providers/gemini/plugin.yaml (added, +5/-0)
  • plugins/model-providers/gmi/__init__.py (added, +26/-0)
  • plugins/model-providers/gmi/plugin.yaml (added, +5/-0)
  • plugins/model-providers/huggingface/__init__.py (added, +20/-0)
  • plugins/model-providers/huggingface/plugin.yaml (added, +5/-0)
  • plugins/model-providers/kilocode/__init__.py (added, +14/-0)
  • plugins/model-providers/kilocode/plugin.yaml (added, +5/-0)
  • plugins/model-providers/kimi-coding/__init__.py (added, +71/-0)
  • plugins/model-providers/kimi-coding/plugin.yaml (added, +5/-0)
  • plugins/model-providers/minimax/__init__.py (added, +45/-0)
  • plugins/model-providers/minimax/plugin.yaml (added, +5/-0)
  • plugins/model-providers/nous/__init__.py (added, +53/-0)
  • plugins/model-providers/nous/plugin.yaml (added, +5/-0)
  • plugins/model-providers/nvidia/__init__.py (added, +21/-0)
  • plugins/model-providers/nvidia/plugin.yaml (added, +5/-0)
  • plugins/model-providers/ollama-cloud/__init__.py (added, +14/-0)
  • plugins/model-providers/ollama-cloud/plugin.yaml (added, +5/-0)
  • plugins/model-providers/openai-codex/__init__.py (added, +15/-0)
  • plugins/model-providers/openai-codex/plugin.yaml (added, +5/-0)
  • plugins/model-providers/opencode-zen/__init__.py (added, +30/-0)
  • plugins/model-providers/opencode-zen/plugin.yaml (added, +5/-0)
  • plugins/model-providers/openrouter/__init__.py (added, +86/-0)
  • plugins/model-providers/openrouter/plugin.yaml (added, +5/-0)
  • plugins/model-providers/qwen-oauth/__init__.py (added, +82/-0)
  • plugins/model-providers/qwen-oauth/plugin.yaml (added, +5/-0)
  • plugins/model-providers/stepfun/__init__.py (added, +14/-0)
  • plugins/model-providers/stepfun/plugin.yaml (added, +5/-0)
  • plugins/model-providers/xai/__init__.py (added, +15/-0)
  • plugins/model-providers/xai/plugin.yaml (added, +5/-0)
  • plugins/model-providers/xiaomi/__init__.py (added, +13/-0)
  • plugins/model-providers/xiaomi/plugin.yaml (added, +5/-0)
  • plugins/model-providers/zai/__init__.py (added, +21/-0)
  • plugins/model-providers/zai/plugin.yaml (added, +5/-0)
  • providers/README.md (added, +78/-0)
  • providers/__init__.py (added, +191/-0)
  • providers/base.py (added, +165/-0)

Code Example

if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # compress...

---

# Existing threshold check fires compress() for overflow...
# NEW: let the engine decide if sub-threshold maintenance is needed
elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(
            messages, system_message, approx_tokens=_preflight_tokens,
            task_id=effective_task_id,
        )
        conversation_history = None
RAW_BUFFERClick to expand / collapse

Summary

run_agent.py never invokes context_engine.should_compress_preflight(messages), which means the LCM plugin's deferred maintenance system (incremental compaction below the 75% threshold) never fires.

Context

The ContextEngine ABC defines should_compress_preflight(messages) (agent/context_engine.py:100) specifically for engines that can do cheap pre-API-call checks. The hermes-lcm plugin implements this to:

  1. Ingest messages into the immutable store
  2. Check raw_backlog debt (accumulated tokens outside the fresh tail)
  3. Return True when raw_tokens >= leaf_chunk_tokens (default 20K) — triggering incremental leaf compaction WITHOUT hitting the 75% context threshold

The Bug

In run_agent.py, the preflight compression section (line ~10580) has its own hardcoded threshold check:

if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # compress...

This never delegates to the engine's should_compress_preflight(). Zero references to this method exist in run_agent.py.

Similarly, the post-response path (line ~13293) only calls should_compress(prompt_tokens) which is also purely threshold-based.

Impact

  • With 200K context (Opus) and 75% threshold = 150K tokens before compaction fires
  • Most gateway sessions never hit this → LCM accumulates 3,200+ messages with only 1 summary node ever created
  • LCM_DEFERRED_MAINTENANCE_ENABLED=1 and LCM_LEAF_CHUNK_TOKENS=20000 env vars have no effect
  • LCM_CACHE_FRIENDLY_CONDENSATION_ENABLED=1 also inert (only runs during compaction)
  • lcm_lifecycle_state shows 0 maintenance attempts, 0 active debt

Proposed Fix

In the preflight section (~line 10580), after the existing threshold check, add a fallback that delegates to the engine:

# Existing threshold check fires compress() for overflow...
# NEW: let the engine decide if sub-threshold maintenance is needed
elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(
            messages, system_message, approx_tokens=_preflight_tokens,
            task_id=effective_task_id,
        )
        conversation_history = None

This preserves backward compatibility (built-in ContextCompressor returns False by default) while enabling LCM's incremental maintenance.

Environment

  • Hermes Agent v0.12.0 (2026.4.30)
  • hermes-lcm v0.8.0
  • Model: claude-opus-4-7 via Bedrock (200K context)
  • Config: context.engine: lcm, plugins.enabled: [hermes-lcm]

Related

  • hermes-lcm ContextEngine ABC defines should_compress_preflight() with explicit documentation that it's for "Quick rough check before the API call"
  • The method already handles ingestion + debt tracking internally — just needs to be called

extent analysis

TL;DR

The proposed fix involves adding a fallback in the preflight section of run_agent.py to delegate to the engine's should_compress_preflight method for sub-threshold maintenance decisions.

Guidance

  • Verify that the should_compress_preflight method is correctly implemented in the hermes-lcm plugin to handle ingestion and debt tracking.
  • Add the proposed fallback code in the preflight section of run_agent.py to enable the engine's incremental maintenance.
  • Test the fix with the provided environment configuration (Hermes Agent v0.12.0, hermes-lcm v0.8.0, and claude-opus-4-7 model) to ensure the LCM plugin's deferred maintenance system fires correctly.
  • Monitor the lcm_lifecycle_state to confirm that maintenance attempts are being made and active debt is being tracked.

Example

The proposed fix code snippet is already provided in the issue:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(
            messages, system_message, approx_tokens=_preflight_tokens,
            task_id=effective_task_id,
        )
        conversation_history = None

Notes

This fix assumes that the should_compress_preflight method is correctly implemented in the hermes-lcm plugin and that the environment configuration is correctly set up.

Recommendation

Apply the proposed workaround by adding the fallback code in the preflight section of run_agent.py, as it enables the LCM plugin's incremental maintenance system without modifying the existing threshold-based compression logic.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING