hermes - ✅(Solved) Fix run_agent.py never calls should_compress_preflight() — LCM deferred maintenance is dead code [4 pull requests, 1 comments, 2 participants]

hermes2026-05-05 16:56:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#20316•Fetched 2026-05-06 06:37:22

View on GitHub

Comments

Participants

Timeline

Reactions

Author

maxatv

Participants

alt-glitch

maxatv

Timeline (top)

labeled ×5cross-referenced ×4referenced ×2commented ×1

run_agent.py never invokes context_engine.should_compress_preflight(messages), which means the LCM plugin's deferred maintenance system (incremental compaction below the 75% threshold) never fires.

Root Cause

Fix Action

Fixed

Fixed by PR: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance (https://github.com/wali-reheman/hermes-agent/pull/1)
Fixed by PR: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance (https://github.com/wali-reheman/hermes-agent/pull/3)
Fixed by PR: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316) (https://github.com/NousResearch/hermes-agent/pull/20424)

PR fix notes

PR #1: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Repository: wali-reheman/hermes-agent
Author: wali-reheman
State: closed | merged: False
Link: https://github.com/wali-reheman/hermes-agent/pull/1

Description (problem / solution / changelog)

Summary

Fixes #20316. The preflight compression block in run_agent.py only checked the hardcoded threshold_tokens (75% of context), never delegating to ContextEngine.should_compress_preflight(). This meant the LCM plugin's incremental leaf compaction (triggered below 75% via raw_backlog debt tracking) never fired.

Root Cause

The preflight section (~line 10753) only fires compression when:

_preflight_tokens >= self.context_compressor.threshold_tokens

This threshold-only check bypasses the LCM engine's should_compress_preflight(messages) method entirely. The method exists specifically for "cheap pre-API-call checks" and handles raw_backlog debt tracking internally.

Fix

Add an elif block after the threshold overflow check that delegates to the engine:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(...)
        conversation_history = None

The built-in ContextCompressor.should_compress_preflight() returns False, so this is a no-op for non-LCM engines — backward compatible.

Impact

LCM_DEFERRED_MAINTENANCE_ENABLED=1 and LCM_LEAF_CHUNK_TOKENS=20000 env vars now actually work
Sessions below the 75% threshold but with high raw_backlog debt now get incremental leaf compaction
lcm_lifecycle_state should show non-zero maintenance attempts

Closes #20316

Changed files

run_agent.py (modified, +11/-0)

PR #3: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Repository: wali-reheman/hermes-agent
Author: wali-reheman
State: closed | merged: False
Link: https://github.com/wali-reheman/hermes-agent/pull/3

Description (problem / solution / changelog)

Summary

Fixes #20316. The preflight compression block in run_agent.py only checked the hardcoded threshold_tokens (75% of context), never delegating to ContextEngine.should_compress_preflight(). This meant the LCM plugin incremental leaf compaction (triggered below 75% via raw_backlog debt tracking) never fired.

Root Cause

The preflight section only fires compression when:

_preflight_tokens >= self.context_compressor.threshold_tokens

This bypasses LCM engine which has a separate should_compress_preflight(messages) method for sub-threshold deferred maintenance.

Fix

Add an elif block after the threshold overflow check:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(...)
        conversation_history = None

Built-in ContextCompressor returns False so this is backward compatible.

Additional Fixes (collateral test cleanups)

credential_pool.py: os.environ now correctly wins over .env (was reversed)
tui_gateway/server.py: ValueError handler now clears session["pending_title"]
test_concurrent_interrupt.py: Added missing _tool_guardrails to _Stub class
test_delegate.py: Updated 3 mock assertions for new target_model kwarg
test_daytona_environment.py / test_vercel_sandbox_environment.py: cd assertion matches actual builtin cd output

Closes #20316

Changed files

cron/scheduler.py (modified, +7/-2)
run_agent.py (modified, +11/-0)
tests/agent/test_bedrock_1m_context.py (modified, +1/-1)
tests/conftest.py (modified, +10/-0)
tests/gateway/test_discord_free_response.py (modified, +8/-4)
tests/hermes_cli/test_model_provider_persistence.py (modified, +3/-1)
tests/hermes_cli/test_model_validation.py (modified, +7/-2)
tests/hermes_cli/test_update_gateway_restart.py (modified, +18/-7)
tests/run_agent/test_concurrent_interrupt.py (modified, +7/-1)
tests/tools/test_daytona_environment.py (modified, +1/-1)
tests/tools/test_delegate.py (modified, +3/-3)
tests/tools/test_vercel_sandbox_environment.py (modified, +1/-1)
tui_gateway/server.py (modified, +13/-0)

PR #20424: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316)

Repository: NousResearch/hermes-agent
Author: Beandon13
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20424

Description (problem / solution / changelog)

Summary

run_conversation now consults ContextEngine.should_compress_preflight() when the request is below threshold_tokens, so engines like hermes-lcm can run incremental leaf-chunk compaction (or other deferred maintenance) without waiting for the 75% context fill cutoff.
Default ContextEngine.should_compress_preflight() still returns False — the built-in ContextCompressor is unaffected.
Exceptions raised by the engine hook are caught at debug level and treated as "skip preflight", so a buggy plugin can't break an otherwise-healthy turn.

Closes #20316

Testing

scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_fires_below_threshold tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_skipped_when_returns_false tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_exception_does_not_break_turn -q

▶ running pytest with 4 workers, hermetic env, in /tmp/hermes-r2-1-fix
  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)
bringing up nodes...
bringing up nodes...

...                                                                      [100%]
3 passed in 4.03s

scripts/run_tests.sh tests/agent/test_context_engine.py -q

...................                                                      [100%]
19 passed in 1.73s

scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_context_compression_triggered tests/run_agent/test_run_agent.py::TestRunConversation::test_glm_prompt_exceeds_max_length_triggers_compression -q

..                                                                       [100%]
2 passed in 6.34s

Changed files

run_agent.py (modified, +31/-0)
tests/run_agent/test_run_agent.py (modified, +136/-0)

PR #4: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Repository: wali-reheman/hermes-agent
Author: wali-reheman
State: open | merged: False
Link: https://github.com/wali-reheman/hermes-agent/pull/4

Description (problem / solution / changelog)

Summary

Fixes #20316. The preflight compression block in run_agent.py only checked the hardcoded threshold_tokens (75% of context), never delegating to ContextEngine.should_compress_preflight(). This meant the LCM plugin incremental leaf compaction (triggered below 75% via raw_backlog debt tracking) never fired.

Root Cause

The preflight section only fires compression when:

_preflight_tokens >= self.context_compressor.threshold_tokens

This bypasses LCM engine which has a separate should_compress_preflight(messages) method for sub-threshold deferred maintenance.

Fix

Add an elif block after the threshold overflow check:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(...)
        conversation_history = None

Built-in ContextCompressor returns False so this is backward compatible.

Changes (rebased onto current main, 1 conflict resolved)

run_agent.py: add should_compress_preflight call
credential_pool.py: os.environ now correctly wins over .env
tui_gateway/server.py: ValueError handler clears pending_title
test_concurrent_interrupt.py: add missing _append_guardrail_observation
test_delegate.py: mock assertions updated for new target_model kwarg
test_daytona_environment.py / test_vercel_sandbox_environment.py: cd assertion matches actual output

Closes #20316

Changed files

README.md (modified, +2/-1)
README.zh-CN.md (added, +186/-0)
agent/auxiliary_client.py (modified, +67/-5)
agent/context_compressor.py (modified, +4/-1)
agent/credential_pool.py (modified, +1/-1)
agent/i18n.py (modified, +5/-2)
agent/memory_manager.py (modified, +6/-8)
agent/memory_provider.py (modified, +8/-9)
agent/model_metadata.py (modified, +11/-0)
agent/transports/__init__.py (modified, +13/-1)
agent/transports/chat_completions.py (modified, +158/-90)
agent/transports/types.py (modified, +15/-14)
cli.py (modified, +12/-0)
cron/scheduler.py (modified, +7/-2)
environments/README.md (modified, +1/-1)
gateway/platforms/api_server.py (modified, +117/-8)
gateway/platforms/discord.py (modified, +7/-2)
gateway/platforms/telegram.py (modified, +14/-16)
gateway/run.py (modified, +64/-28)
hermes_cli/auth.py (modified, +45/-0)
hermes_cli/config.py (modified, +45/-1)
hermes_cli/doctor.py (modified, +109/-26)
hermes_cli/kanban.py (modified, +209/-2)
hermes_cli/kanban_db.py (modified, +422/-107)
hermes_cli/kanban_diagnostics.py (added, +649/-0)
hermes_cli/main.py (modified, +426/-139)
hermes_cli/models.py (modified, +51/-0)
hermes_cli/plugins.py (modified, +35/-5)
hermes_cli/tips.py (modified, +1/-1)
hermes_state.py (modified, +39/-0)
locales/en.yaml (modified, +1/-1)
locales/fr.yaml (added, +24/-0)
locales/tr.yaml (added, +24/-0)
locales/uk.yaml (added, +24/-0)
optional-skills/mlops/flash-attention/SKILL.md (modified, +0/-4)
optional-skills/mlops/saelens/references/README.md (modified, +0/-1)
plugins/kanban/dashboard/dist/index.js (modified, +336/-237)
plugins/kanban/dashboard/dist/style.css (modified, +170/-0)
plugins/kanban/dashboard/plugin_api.py (modified, +237/-59)
plugins/memory/hindsight/__init__.py (modified, +145/-6)
plugins/model-providers/README.md (added, +70/-0)
plugins/model-providers/ai-gateway/__init__.py (added, +43/-0)
plugins/model-providers/ai-gateway/plugin.yaml (added, +5/-0)
plugins/model-providers/alibaba-coding-plan/__init__.py (added, +21/-0)
plugins/model-providers/alibaba-coding-plan/plugin.yaml (added, +5/-0)
plugins/model-providers/alibaba/__init__.py (added, +13/-0)
plugins/model-providers/alibaba/plugin.yaml (added, +5/-0)
plugins/model-providers/anthropic/__init__.py (added, +52/-0)
plugins/model-providers/anthropic/plugin.yaml (added, +5/-0)
plugins/model-providers/arcee/__init__.py (added, +13/-0)
plugins/model-providers/arcee/plugin.yaml (added, +5/-0)
plugins/model-providers/azure-foundry/__init__.py (added, +21/-0)
plugins/model-providers/azure-foundry/plugin.yaml (added, +5/-0)
plugins/model-providers/bedrock/__init__.py (added, +29/-0)
plugins/model-providers/bedrock/plugin.yaml (added, +5/-0)
plugins/model-providers/copilot-acp/__init__.py (added, +34/-0)
plugins/model-providers/copilot-acp/plugin.yaml (added, +5/-0)
plugins/model-providers/copilot/__init__.py (added, +58/-0)
plugins/model-providers/copilot/plugin.yaml (added, +5/-0)
plugins/model-providers/custom/__init__.py (added, +68/-0)
plugins/model-providers/custom/plugin.yaml (added, +5/-0)
plugins/model-providers/deepseek/__init__.py (added, +20/-0)
plugins/model-providers/deepseek/plugin.yaml (added, +5/-0)
plugins/model-providers/gemini/__init__.py (added, +72/-0)
plugins/model-providers/gemini/plugin.yaml (added, +5/-0)
plugins/model-providers/gmi/__init__.py (added, +26/-0)
plugins/model-providers/gmi/plugin.yaml (added, +5/-0)
plugins/model-providers/huggingface/__init__.py (added, +20/-0)
plugins/model-providers/huggingface/plugin.yaml (added, +5/-0)
plugins/model-providers/kilocode/__init__.py (added, +14/-0)
plugins/model-providers/kilocode/plugin.yaml (added, +5/-0)
plugins/model-providers/kimi-coding/__init__.py (added, +71/-0)
plugins/model-providers/kimi-coding/plugin.yaml (added, +5/-0)
plugins/model-providers/minimax/__init__.py (added, +45/-0)
plugins/model-providers/minimax/plugin.yaml (added, +5/-0)
plugins/model-providers/nous/__init__.py (added, +53/-0)
plugins/model-providers/nous/plugin.yaml (added, +5/-0)
plugins/model-providers/nvidia/__init__.py (added, +21/-0)
plugins/model-providers/nvidia/plugin.yaml (added, +5/-0)
plugins/model-providers/ollama-cloud/__init__.py (added, +14/-0)
plugins/model-providers/ollama-cloud/plugin.yaml (added, +5/-0)
plugins/model-providers/openai-codex/__init__.py (added, +15/-0)
plugins/model-providers/openai-codex/plugin.yaml (added, +5/-0)
plugins/model-providers/opencode-zen/__init__.py (added, +30/-0)
plugins/model-providers/opencode-zen/plugin.yaml (added, +5/-0)
plugins/model-providers/openrouter/__init__.py (added, +86/-0)
plugins/model-providers/openrouter/plugin.yaml (added, +5/-0)
plugins/model-providers/qwen-oauth/__init__.py (added, +82/-0)
plugins/model-providers/qwen-oauth/plugin.yaml (added, +5/-0)
plugins/model-providers/stepfun/__init__.py (added, +14/-0)
plugins/model-providers/stepfun/plugin.yaml (added, +5/-0)
plugins/model-providers/xai/__init__.py (added, +15/-0)
plugins/model-providers/xai/plugin.yaml (added, +5/-0)
plugins/model-providers/xiaomi/__init__.py (added, +13/-0)
plugins/model-providers/xiaomi/plugin.yaml (added, +5/-0)
plugins/model-providers/zai/__init__.py (added, +21/-0)
plugins/model-providers/zai/plugin.yaml (added, +5/-0)
providers/README.md (added, +78/-0)
providers/__init__.py (added, +191/-0)
providers/base.py (added, +165/-0)

Code Example

if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # compress...

---

# Existing threshold check fires compress() for overflow...
# NEW: let the engine decide if sub-threshold maintenance is needed
elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(
            messages, system_message, approx_tokens=_preflight_tokens,
            task_id=effective_task_id,
        )
        conversation_history = None

RAW_BUFFERClick to expand / collapse

Summary

Context

The ContextEngine ABC defines should_compress_preflight(messages) (agent/context_engine.py:100) specifically for engines that can do cheap pre-API-call checks. The hermes-lcm plugin implements this to:

Ingest messages into the immutable store
Check raw_backlog debt (accumulated tokens outside the fresh tail)
Return True when raw_tokens >= leaf_chunk_tokens (default 20K) — triggering incremental leaf compaction WITHOUT hitting the 75% context threshold

The Bug

In run_agent.py, the preflight compression section (line ~10580) has its own hardcoded threshold check:

if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # compress...

This never delegates to the engine's should_compress_preflight(). Zero references to this method exist in run_agent.py.

Similarly, the post-response path (line ~13293) only calls should_compress(prompt_tokens) which is also purely threshold-based.

Impact

With 200K context (Opus) and 75% threshold = 150K tokens before compaction fires
Most gateway sessions never hit this → LCM accumulates 3,200+ messages with only 1 summary node ever created
LCM_DEFERRED_MAINTENANCE_ENABLED=1 and LCM_LEAF_CHUNK_TOKENS=20000 env vars have no effect
LCM_CACHE_FRIENDLY_CONDENSATION_ENABLED=1 also inert (only runs during compaction)
lcm_lifecycle_state shows 0 maintenance attempts, 0 active debt

Proposed Fix

In the preflight section (~line 10580), after the existing threshold check, add a fallback that delegates to the engine:

# Existing threshold check fires compress() for overflow...
# NEW: let the engine decide if sub-threshold maintenance is needed
elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(
            messages, system_message, approx_tokens=_preflight_tokens,
            task_id=effective_task_id,
        )
        conversation_history = None

This preserves backward compatibility (built-in ContextCompressor returns False by default) while enabling LCM's incremental maintenance.

Environment

Hermes Agent v0.12.0 (2026.4.30)
hermes-lcm v0.8.0
Model: claude-opus-4-7 via Bedrock (200K context)
Config: context.engine: lcm, plugins.enabled: [hermes-lcm]

hermes-lcm ContextEngine ABC defines should_compress_preflight() with explicit documentation that it's for "Quick rough check before the API call"
The method already handles ingestion + debt tracking internally — just needs to be called

extent analysis

TL;DR

The proposed fix involves adding a fallback in the preflight section of run_agent.py to delegate to the engine's should_compress_preflight method for sub-threshold maintenance decisions.

Guidance

Verify that the should_compress_preflight method is correctly implemented in the hermes-lcm plugin to handle ingestion and debt tracking.
Add the proposed fallback code in the preflight section of run_agent.py to enable the engine's incremental maintenance.
Test the fix with the provided environment configuration (Hermes Agent v0.12.0, hermes-lcm v0.8.0, and claude-opus-4-7 model) to ensure the LCM plugin's deferred maintenance system fires correctly.
Monitor the lcm_lifecycle_state to confirm that maintenance attempts are being made and active debt is being tracked.

Example

The proposed fix code snippet is already provided in the issue:

elif hasattr(self.context_compressor, "should_compress_preflight"):
    if self.context_compressor.should_compress_preflight(messages):
        messages, active_system_prompt = self._compress_context(
            messages, system_message, approx_tokens=_preflight_tokens,
            task_id=effective_task_id,
        )
        conversation_history = None

Notes

This fix assumes that the should_compress_preflight method is correctly implemented in the hermes-lcm plugin and that the environment configuration is correctly set up.

Recommendation

Apply the proposed workaround by adding the fallback code in the preflight section of run_agent.py, as it enables the LCM plugin's incremental maintenance system without modifying the existing threshold-based compression logic.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix run_agent.py never calls should_compress_preflight() — LCM deferred maintenance is dead code [4 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #1: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Impact

Changed files

PR #3: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Additional Fixes (collateral test cleanups)

Changed files

PR #20424: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316)

Description (problem / solution / changelog)

Summary

Testing

Changed files

PR #4: fix(run_agent): call should_compress_preflight for sub-threshold LCM deferred maintenance

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Changes (rebased onto current main, 1 conflict resolved)

Changed files

Code Example

Summary

Context

The Bug

Impact

Proposed Fix

Environment

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING