hermes - 💡(How to fix) Fix /goal judge over-continues exploratory goals unless the assistant explicitly says the goal is complete

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A Discord /goal with an exploratory objective produced a valid synthesis/proposal answer, but the goal judge repeatedly returned continue because the assistant did not explicitly state that the goal was complete.

The synthetic continuation loop escalated the task from “review context and reflect on ways to help” into producing multiple concrete artifacts that the user had only mentioned as examples of possible help.

The issue is that /goal appears too dependent on explicit completion phrasing. For exploratory goals such as “review / reflect / suggest options”, a high-quality synthesis can satisfy the goal even without a magic phrase like “goal complete”.

Root Cause

This can turn planning/reflection goals into unwanted execution. The user may ask the assistant to inspect context and suggest possible help, but /goal can keep pushing until the assistant manufactures additional deliverables.

That is especially risky when the “possible help” examples include sending messages, editing files, creating tickets, or performing investigations.

Code Example

[Continuing toward your standing goal]
Goal: <original exploratory goal>

Continue working toward this goal. Take the next concrete step. If you believe the goal is complete, state so explicitly and stop.

---

18:28:51 inbound message: msg='observe tasks from two work domains ... overdue/today/tomorrow tasks ... notes ... chat context ... same-day notes/journal context ... reflect on ways you can help, for example writing tickets, preparing a message, or preparing an investigation ... maybe create kanban activities ... or maybe resolve differently.'

---

18:35:21 Turn ended: reason=text_response(finish_reason=stop)
         response_len=10022

---

18:35:22 hermes_cli.goals: goal judge: verdict=continue reason=A resposta mostra análise e propostas, mas não confirma explicitamente que a varredura completa foi concluída nem entreg… [truncated]
18:35:28 gateway.run: inbound message: msg='[Continuing toward your standing goal] Goal: observe tasks ...'

---

18:36:16 goal judge: verdict=continue ...
18:36:17 inbound synthetic continuation ...

18:36:44 goal judge: verdict=continue ...
18:36:44 inbound synthetic continuation ...

18:37:30 goal judge: verdict=continue ...
18:37:32 inbound synthetic continuation ...

18:38:17 goal judge: verdict=continue ...
18:38:23 inbound synthetic continuation ...

18:39:22 goal judge: verdict=continue ...
18:39:29 inbound synthetic continuation ...

---

18:39:38 Turn ended: reason=text_response(finish_reason=stop) response_len=822
18:39:39 hermes_cli.goals: goal judge: verdict=done reason=The agent explicitly states the goal is complete and lists the deliverables produced, so the goal is satisfied.
RAW_BUFFERClick to expand / collapse

Summary

A Discord /goal with an exploratory objective produced a valid synthesis/proposal answer, but the goal judge repeatedly returned continue because the assistant did not explicitly state that the goal was complete.

The synthetic continuation loop escalated the task from “review context and reflect on ways to help” into producing multiple concrete artifacts that the user had only mentioned as examples of possible help.

The issue is that /goal appears too dependent on explicit completion phrasing. For exploratory goals such as “review / reflect / suggest options”, a high-quality synthesis can satisfy the goal even without a magic phrase like “goal complete”.

Expected behavior

For exploratory goals, /goal should stop when the assistant has reasonably completed the review and produced actionable recommendations, unless the original goal clearly requires concrete follow-up artifacts.

A response should not be judged incomplete solely because it lacks explicit wording such as “the goal is complete”.

Actual behavior

  1. User set a broad exploratory goal:
    • review recent tasks from two work domains;
    • include overdue/today/tomorrow items and task notes;
    • review related chat context;
    • review the user's same-day notes/journal context;
    • reflect on ways the assistant could help;
    • examples included writing tickets, preparing messages, preparing an investigation, or maybe creating Kanban activities.
  2. The assistant reviewed the context and produced a large synthesis with a menu of possible ways to help.
  3. The goal judge returned continue because the answer did not explicitly confirm that the review was complete.
  4. Hermes injected repeated synthetic messages:
[Continuing toward your standing goal]
Goal: <original exploratory goal>

Continue working toward this goal. Take the next concrete step. If you believe the goal is complete, state so explicitly and stop.
  1. The assistant escalated from reflection/proposal into producing multiple concrete deliverables:
    • ticket drafts;
    • dependency/blocker wording;
    • handoff text;
    • messages to third parties;
    • technical brief material.
  2. The loop only stopped once the assistant explicitly wrote that the goal was complete.

Sanitized evidence

Initial user goal shape:

18:28:51 inbound message: msg='observe tasks from two work domains ... overdue/today/tomorrow tasks ... notes ... chat context ... same-day notes/journal context ... reflect on ways you can help, for example writing tickets, preparing a message, or preparing an investigation ... maybe create kanban activities ... or maybe resolve differently.'

First complete-enough answer:

18:35:21 Turn ended: reason=text_response(finish_reason=stop)
         response_len=10022

Goal judge then continued:

18:35:22 hermes_cli.goals: goal judge: verdict=continue reason=A resposta mostra análise e propostas, mas não confirma explicitamente que a varredura completa foi concluída nem entreg… [truncated]
18:35:28 gateway.run: inbound message: msg='[Continuing toward your standing goal] Goal: observe tasks ...'

Repeated continuation pattern:

18:36:16 goal judge: verdict=continue ...
18:36:17 inbound synthetic continuation ...

18:36:44 goal judge: verdict=continue ...
18:36:44 inbound synthetic continuation ...

18:37:30 goal judge: verdict=continue ...
18:37:32 inbound synthetic continuation ...

18:38:17 goal judge: verdict=continue ...
18:38:23 inbound synthetic continuation ...

18:39:22 goal judge: verdict=continue ...
18:39:29 inbound synthetic continuation ...

Loop stops only after explicit completion wording:

18:39:38 Turn ended: reason=text_response(finish_reason=stop) response_len=822
18:39:39 hermes_cli.goals: goal judge: verdict=done reason=The agent explicitly states the goal is complete and lists the deliverables produced, so the goal is satisfied.

Why this matters

This can turn planning/reflection goals into unwanted execution. The user may ask the assistant to inspect context and suggest possible help, but /goal can keep pushing until the assistant manufactures additional deliverables.

That is especially risky when the “possible help” examples include sending messages, editing files, creating tickets, or performing investigations.

Suspected cause

The goal judge appears to use explicit completion language as a stronger signal than the actual semantic sufficiency of the response.

This creates a bad incentive: unless the assistant says “the goal is complete”, the loop may keep running even when the useful answer has already been delivered.

Proposed fixes / invariants

  1. Treat exploratory/review/proposal goals as completable by a sufficient synthesis, even without explicit “goal complete” wording.
  2. Distinguish examples of possible next actions from required deliverables.
  3. Add goal-judge guidance such as:
    • if the user asked to “review/reflect/suggest”, a concrete recommendation list can satisfy the goal;
    • do not require producing every artifact mentioned as an example;
    • do not continue merely to force an explicit completion phrase.
  4. Consider making the continuation prompt safer for exploratory goals:
    • “If the previous answer substantially satisfied the review/proposal request, mark complete instead of producing extra artifacts.”

Related issues

  • #26986 — keep persistent goals active when the response explicitly reports incomplete work. This issue is the inverse: do not keep goals active when the response functionally completed an exploratory goal.
  • #27585 — /goal can spam repeated completion messages when judge errors fail-open to continue. Related because both involve continuation after terminal-ish answers.
  • #28649 — gateway /goal continuation loop behavior on Telegram/Discord.
  • #18467 / #33618 — /goal state and session-id/compression lifecycle. Related but not required to reproduce this case.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For exploratory goals, /goal should stop when the assistant has reasonably completed the review and produced actionable recommendations, unless the original goal clearly requires concrete follow-up artifacts.

A response should not be judged incomplete solely because it lacks explicit wording such as “the goal is complete”.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING