hermes - 💡(How to fix) Fix [Feature]: Cron job stage persistence + partial retry mechanism - Real world case: 2 million tokens wasted due to push failures [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17071Fetched 2026-04-29 06:37:23
View on GitHub
Comments
2
Participants
2
Timeline
5
Reactions
0
Author
Timeline (top)
labeled ×3commented ×2

Fix Action

Fix / Workaround

This is like asking an employee to write a proposal,
print it, and mail it. If the mail fails, and they start
over by re-writing the entire proposal again. It's just
not logical.
My Temporary Workaround
I split the job into 4 separate cron jobs:
1. 08:55 Routing check
2. 08:56 Search news → save to local JSON
3. 08:59 Read JSON, generate briefing → save to local MD
4. 09:00 Read MD and push to WeChat

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Background
I'm a Hermes user and encountered a very serious token
waste issue today that I believe affects all cron users.
Real World Scenario
My cron job is a "daily briefing" with roughly this
flow:

 1. Run script to check Volcengine API routing                
 2. Search news from 3 engines using multi-search-engine      
 skill                                                        
 3. LLM generates the briefing content                        
 4. Push to WeChat                                            
                                                              
 The Problem                                                  
 Today step 4 (push to WeChat) failed, and Hermes             
 automatically retried 10 times.                              
                                                              
 But instead of only retrying step 4, it re-executed the      
 ENTIRE job from step 1!                                      
                                                              
 Result:                                                      
 - Each retry costs ~80k tokens                               
 - 2 million tokens burned in one day                         
 - The push action itself costs almost 0 tokens               
 My Pain Point                                                
 Push failure is a WeChat API issue, it has nothing to do     
 with search and generation. The content was already          
 generated, why search and generate it all again?             
                                                              
 This is like asking an employee to write a proposal,         
 print it, and mail it. If the mail fails, and they start     
 over by re-writing the entire proposal again. It's just      
 not logical.                                                 
 My Temporary Workaround                                      
 I split the job into 4 separate cron jobs:                   
 1. 08:55 Routing check                                       
 2. 08:56 Search news → save to local JSON                    
 3. 08:59 Read JSON, generate briefing → save to local MD     
 4. 09:00 Read MD and push to WeChat                          
                                                              
 Now even if push fails 100 times, the tokens spent           
 earlier are never wasted.

Proposed Solution

I hope Hermes can natively support "stage-based
workflows":

 yaml                                                         
 workflow:                                                    
   name: Insurance Tech Daily Briefing                        
   stages:                                                    
     - name: Routing Check                                    
       run: python3 ~/.hermes/scripts/add-volc-route.sh       
       retry: 2                                               
                                                              
     - name: News Search                                      
       skill: multi-search-engine                             
       retry: 2                                               
       depends_on: Routing Check                              
       output: materials.json  # Output saved, downstream     
 stages can read                                              
                                                              
     - name: Content Generation                               
       skill: zhibao-insurance-tech-daily-briefing            
       retry: 1                                               
       depends_on: News Search                                
       input: materials.json                                  
       output: briefing.md                                    
                                                              
     - name: WeChat Push                                      
       run: hermes send --platform weixin --file              
 briefing.md                                                  
       retry: 5                                               
       depends_on: Content Generation                         
       input: briefing.md                                     
                                                              
 Core Requirements                                            
 1. Stage Isolation: Each stage can have independent          
 retry counts                                                 
 2. Output Persistence: Each stage's output can be saved      
 to file for downstream stages                                
 3. Partial Retry: If a stage fails, only retry that          
 specific stage, successful stages are preserved              
 4. Traceability: All intermediate outputs are preserved      
 for debugging                                                
 One Sentence Summary                                         
 > Don't let a push failure waste all the tokens spent on     
 search and generation. The cost of failure should be         
 limited to the stage that failed.                            
                                                              
 Thanks for building such an amazing product! 🙏

Alternatives Considered

No response

Feature Type

Other

Scope

None

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

Implement stage-based workflows in Hermes to prevent token waste by retrying only the failed stage instead of the entire job.

Guidance

  • Break down the cron job into separate stages with independent retry counts to prevent unnecessary re-execution of successful stages.
  • Implement output persistence to save each stage's output for downstream stages, allowing for partial retry.
  • Consider adding traceability features to preserve intermediate outputs for debugging purposes.
  • Review the proposed solution's core requirements to ensure they meet the specific needs of the use case.

Example

workflow:
  name: Insurance Tech Daily Briefing
  stages:
    - name: Routing Check
      run: python3 ~/.hermes/scripts/add-volc-route.sh
      retry: 2
    - name: News Search
      skill: multi-search-engine
      retry: 2
      depends_on: Routing Check
      output: materials.json

Notes

The proposed solution requires changes to the Hermes workflow system, which may involve significant development and testing efforts. It is essential to carefully evaluate the feasibility and potential impact of such changes.

Recommendation

Apply workaround by splitting the job into separate cron jobs, as described in the temporary workaround, until a native stage-based workflow solution is implemented. This approach can help mitigate token waste while a more permanent solution is developed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING