hermes - 💡(How to fix) Fix [Feature]: update should support rollback and auto-rollback (commit/confirm with timeout) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#13603Fetched 2026-04-22 08:05:28
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Code Example

/update
 (shows changelog — see companion issue)
/update confirm
→ "⚕ Updating... gateway will restart.
    You have 10 minutes to send /confirm.
    If I don't hear from you, I'll auto-rollback to <commit-sha>."
 (update, restart)
 (gateway comes back up)
→ "✓ Update complete! Running <new-version>.
    Send /confirm within 9m to keep this version,
    or I'll auto-rollback."

/confirm
"✓ Update confirmed. Rollback timer cancelled."

---

 (auto-rollback to saved commit SHA)
 (pip install, gateway restart)
→ "⚠ No /confirm received — rolled back to <old-version>.
    The update may have caused issues. You can retry with /update
    or investigate with /debug."

---

# In cmd_update(), before git pull:
pre_update_sha = subprocess.check_output(
    ["git", "rev-parse", "HEAD"], cwd=PROJECT_ROOT
).decode().strip()
# Save to ~/.hermes/.update_checkpoint.json
{
    "pre_update_sha": pre_update_sha,
    "timestamp": "...",
    "timeout_seconds": 600,
    "confirmed": false
}

---

/rollback update    — immediately revert to pre-update commit

---
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Problem

/update and hermes update have no rollback mechanism. If an update breaks the gateway, the agent, or a critical tool, the user must manually SSH in and run git reflog / git reset --hard to recover. Worse, if the update breaks gateway startup, the user loses their remote control channel entirely — the very channel they used to trigger the update.

Current Behavior

  • Update does git pull --ff-only (or git reset --hard origin/main if histories diverge)
  • Local changes are stashed, but there's no record of the pre-update commit for easy reversion
  • If the new version is broken, recovery requires terminal access and git knowledge
  • No health check after update — "it deployed" ≠ "it works"

Proposed Behavior: Commit/Confirm Pattern

Inspired by JunOS commit confirmed <timeout> — a well-proven pattern for remote configuration changes where a bad change can lock you out of the management channel:

/update
→ (shows changelog — see companion issue)
→ /update confirm
→ "⚕ Updating... gateway will restart.
    You have 10 minutes to send /confirm.
    If I don't hear from you, I'll auto-rollback to <commit-sha>."
→ (update, restart)
→ (gateway comes back up)
→ "✓ Update complete! Running <new-version>.
    Send /confirm within 9m to keep this version,
    or I'll auto-rollback."

/confirm
→ "✓ Update confirmed. Rollback timer cancelled."

If no /confirm arrives within the timeout:

→ (auto-rollback to saved commit SHA)
→ (pip install, gateway restart)
→ "⚠ No /confirm received — rolled back to <old-version>.
    The update may have caused issues. You can retry with /update
    or investigate with /debug."

Implementation Sketch

Pre-update checkpoint:

# In cmd_update(), before git pull:
pre_update_sha = subprocess.check_output(
    ["git", "rev-parse", "HEAD"], cwd=PROJECT_ROOT
).decode().strip()
# Save to ~/.hermes/.update_checkpoint.json
{
    "pre_update_sha": pre_update_sha,
    "timestamp": "...",
    "timeout_seconds": 600,
    "confirmed": false
}

Post-restart watchdog:

  • Gateway startup checks for .update_checkpoint.json
  • If found and confirmed == false, starts a countdown timer
  • Timer sends a reminder at T-2min, T-1min
  • On expiry: runs git reset --hard <pre_update_sha>, reinstalls deps, restarts gateway

/confirm command:

  • Sets confirmed = true in checkpoint file
  • Cancels the watchdog timer
  • Deletes the checkpoint file

Manual rollback:

/rollback update    — immediately revert to pre-update commit

Edge Cases

  1. Gateway won't start after update — The watchdog can't run inside a dead gateway. Options:

    • A separate lightweight watchdog process spawned before the update (like a cron job or systemd timer)
    • hermes update itself sets a system-level timer (at command or temp systemd timer) as a dead-man's switch
  2. User is slow but update is fine — Reminder messages help; timeout should be configurable (/update --timeout 30m)

  3. Multiple rapid updates — Each update overwrites the checkpoint; only the most recent pre-update SHA is saved

  4. Stashed local changes — The existing stash mechanism should be preserved; rollback should also restore the stash

Motivation

Any system where applying a change can break the management channel needs a dead-man's switch. This is standard practice in network operations (JunOS commit confirmed, IOS XE configure replace ... revert trigger timer). Hermes is in the same category — /update modifies the very codebase that handles /update, and the gateway restart means there's a window where a broken update = total loss of remote control.


[This enhancement request generated by Hermes agent at my behest]

Proposed Solution

Enhance request to improve robustness of /update command, especially when operated remotely over a gateway.

Alternatives Considered

No response

Feature Type

New bundled skill

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

Implement a commit/confirm pattern for the /update command to allow for automatic rollback in case of a failed update.

Guidance

  • Introduce a pre-update checkpoint to save the current commit SHA before updating, allowing for easy rollback if needed.
  • Implement a post-restart watchdog that checks for the presence of a checkpoint file and initiates a countdown timer for automatic rollback if the update is not confirmed within a specified timeout.
  • Create a /confirm command to cancel the watchdog timer and confirm the update, deleting the checkpoint file in the process.
  • Consider implementing a manual rollback command (/rollback update) to immediately revert to the pre-update commit.

Example

# In cmd_update(), before git pull:
pre_update_sha = subprocess.check_output(
    ["git", "rev-parse", "HEAD"], cwd=PROJECT_ROOT
).decode().strip()
# Save to ~/.hermes/.update_checkpoint.json
{
    "pre_update_sha": pre_update_sha,
    "timestamp": "...",
    "timeout_seconds": 600,
    "confirmed": false
}

Notes

The proposed solution requires careful consideration of edge cases, such as a gateway that won't start after an update, or multiple rapid updates. Additionally, the existing stash mechanism for local changes should be preserved, and the rollback process should restore the stash.

Recommendation

Apply the proposed commit/confirm pattern to the /update command, as it provides a robust mechanism for automatic rollback in case of a failed update, ensuring that the system remains recoverable even in the event of a broken update.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: update should support rollback and auto-rollback (commit/confirm with timeout) [1 participants]