hermes - 💡(How to fix) Fix Cron inactivity timeout can mark a job failed while the agent thread keeps running [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18004Fetched 2026-05-01 05:54:28
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

Cron job execution uses a ThreadPoolExecutor to run agent.run_conversation(). On inactivity timeout, the scheduler shuts down the executor with wait=False, cancel_futures=True, calls agent.interrupt(), and raises a timeout error. This marks the cron job as failed, but it does not guarantee that the already-running worker thread has stopped.

Error Message

Cron job execution uses a ThreadPoolExecutor to run agent.run_conversation(). On inactivity timeout, the scheduler shuts down the executor with wait=False, cancel_futures=True, calls agent.interrupt(), and raises a timeout error. This marks the cron job as failed, but it does not guarantee that the already-running worker thread has stopped.

Root Cause

Cron job execution uses a ThreadPoolExecutor to run agent.run_conversation(). On inactivity timeout, the scheduler shuts down the executor with wait=False, cancel_futures=True, calls agent.interrupt(), and raises a timeout error. This marks the cron job as failed, but it does not guarantee that the already-running worker thread has stopped.

RAW_BUFFERClick to expand / collapse

Summary

Cron job execution uses a ThreadPoolExecutor to run agent.run_conversation(). On inactivity timeout, the scheduler shuts down the executor with wait=False, cancel_futures=True, calls agent.interrupt(), and raises a timeout error. This marks the cron job as failed, but it does not guarantee that the already-running worker thread has stopped.

Impact

A timed-out autonomous cron job can continue executing file, network, terminal, or other side effects after the scheduler has recorded the run as failed. This can overlap with retries or later scheduled runs and can make cron state and user-visible delivery inaccurate. In standalone cron processes, a non-daemon worker thread may also keep the process alive unexpectedly.

Evidence

In cron/scheduler.py, cron execution submits the agent run to a one-worker ThreadPoolExecutor and polls for completion/activity. When the inactivity limit is exceeded, the timeout path calls:

  • _cron_pool.shutdown(wait=False, cancel_futures=True)
  • agent.interrupt("Cron job timed out (inactivity)")
  • raises TimeoutError(...)

In Python, cancel_futures=True only cancels futures that have not started. It does not kill an already-running thread, and wait=False explicitly does not wait for that thread to exit. If the agent is blocked in non-interruptible I/O, tool execution, or an API call that does not promptly observe the interrupt, the side-effecting work can continue after cron has treated the run as failed.

Expected behavior

When cron reports a timeout/failure, the job's execution context should either be stopped or safely isolated so it cannot continue side effects as if still active.

Suggested direction

Run cron jobs in a supervised process that can be terminated on timeout, or enforce cancellation through every agent/tool/API path and do not complete timeout handling until the worker has actually stopped or is otherwise isolated from further side effects.

extent analysis

TL;DR

Run cron jobs in a supervised process that can be terminated on timeout to prevent continued execution of side effects after a timeout.

Guidance

  • To address the issue, consider using a process-based approach instead of threading, allowing for more reliable termination of the cron job on timeout.
  • Implement a mechanism to supervise the cron job process and terminate it when a timeout occurs, ensuring that side effects are stopped.
  • Review the agent.run_conversation() method to identify potential non-interruptible I/O or API calls that may prevent the thread from exiting promptly.
  • Consider using a more robust cancellation mechanism, such as a signal-based approach, to ensure that the worker thread can be terminated reliably.

Example

import subprocess

# Run cron job in a supervised process
def run_cron_job(agent):
    process = subprocess.Popen(['python', '-m', 'cron_job', 'run'])
    try:
        process.wait(timeout=300)  # 5-minute timeout
    except subprocess.TimeoutExpired:
        process.terminate()
        process.wait()

Notes

The suggested approach requires significant changes to the existing cron job execution mechanism. Careful consideration should be given to the potential impact on existing functionality and performance.

Recommendation

Apply workaround: Run cron jobs in a supervised process that can be terminated on timeout, as this approach provides a more reliable way to prevent continued execution of side effects after a timeout.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When cron reports a timeout/failure, the job's execution context should either be stopped or safely isolated so it cannot continue side effects as if still active.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Cron inactivity timeout can mark a job failed while the agent thread keeps running [1 participants]