openclaw - ✅(Solved) Fix Feature: composable termination algebra + GSAR grounding scorer for agents [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77981Fetched 2026-05-06 06:18:20
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #75165: feat(agents): composable termination algebra + GSAR grounding scorer

Description (problem / solution / changelog)

Closes #77981

What this does

Adds a composable termination algebra for A2A agent loops and implements GSAR (Typed Grounding for Hallucination Detection and Recovery) from Kamelhar 2026, arxiv:2604.23366 as a first-class TerminationCondition. Both are wired into the A2A ping-pong loop in sessions-send-tool via an optional termination param.


Proof 1 — the algebra saves turns

Task: "Where is the nearest coffee shop?"

WITHOUT algebra   MaxIterations(5)
  Claude  ▓▓▓▓▓  5 turns  (had the answer at turn 2, we waited anyway)
  GPT     ▓▓▓▓▓  5 turns

WITH algebra   TextMention("FOUND IT").or(MaxIterations(5))
  Claude  ▓▓░░░  2 turns  ← "Blue Bottle on Mission St. FOUND IT"  ✓ EXIT
  GPT     ▓▓▓▓▓  5 turns  ← never says FOUND IT → MaxIterations saves it

  saved: 3 turns = 60% less waiting for Claude

Claude naturally signals completion. GPT does not. The combinator captures this difference without any provider-specific code.


Proof 2 — soft signals must be paired with a hard bound

soft_signal                        ✗  UNSAFE — may never stop
soft_signal.or(MaxIterations(N))   ✓  SAFE   — always terminates

Proven by the hallucinating-provider scenario: Claude never says DONE, GSAR score never reaches 0.80 — MaxIterations fires at turn 5 every time. Budget holds.

The AND combinator lets you guard against vague early answers:

const detailed = new CustomCondition(s => [s.replyText.length >= 40, "detailed_enough"]);
const cond = new TextMention("FOUND IT").and(detailed).or(new MaxIterations(5));
// turn 1: "FOUND IT!" — too short, fails `detailed`
// turn 2: "Blue Bottle on Mission St, 4 min walk. FOUND IT" — passes both ✓

Proof 3 — GSAR score formula (arxiv:2604.23366 §3.2)

        W(G) + W(K)
S = ─────────────────────────    ∈ [0, 1]
     W(G) + W(U) + ρ·W(X) + W(K)

G = grounded   U = ungrounded   X = contradicted   K = complementary

6 structural properties proven in tests:

PropertyStatementTest
P1 BoundednessS ∈ [0,1] alwaysextreme partitions, empty partition
P2 Grounded monotonicityU→G never decreases Smigration path across 10 steps
P3 Contradiction penaltyadding X never increases Sincremental X additions
P4 Complementary valueK contributes but < equivalent Gweight comparison
P5 Non-suppressionX stays in denominator even at ρ=0ρ sweep
P6 Asymmetryw(inference) < w(tool_match) strictly decreases Sweight maps

Proof 4 — GSAR decision tiers and threshold stability

score   0.0──────────0.65──────────0.80──────────1.0
              replan      regenerate      proceed
                             τ_proceed = 0.80 sits on a
                             ≥85%-accuracy plateau across [0.75, 0.85]
                             not a cliff — safe to ship as default

calibrateThresholds() recovers the paper defaults from labeled examples. Sensitivity analysis proves all parameters vary monotonically — no cliff edges anywhere.


Proof 5 — GSAR wired into a recovery loop

// Algorithm 1 from the paper — bounded recovery, termination guaranteed
new GroundednessCondition(scorer).or(new MaxIterations(K_max))
Provider typeBehaviorTurns used
Grounded (Claude-like)exits at turn 180% reduction vs flat
Recoveringexits at first grounded reply (turn 3)quality-gated, not time-gated
Hallucinatingnever reaches proceed — MaxIterations saves itbudget holds

Proof 6 — live integration test with real Anthropic API

src/agents/termination.algebra.live.test.ts — 6 tests against a real Claude endpoint:

OPENCLAW_LIVE_TEST=1 pnpm test:live -- src/agents/termination.algebra.live.test.ts

 ✓  flat MaxIterations: Claude runs all 5 turns regardless of answer quality
    [algebra-live] flat: 5 turns used
    [algebra-live] turn-1 reply: "The capital of France is Paris."

 ✓  TextMention("DONE").or(MaxIterations): Claude exits early once it says DONE
    [algebra-live] algebra: exited at turn 1, reason=text_mention:DONE
    [algebra-live] reply: "The capital of France is Paris. DONE"

 ✓  algebra exits Claude early across 5 independent runs — consistent savings
    [algebra-live] run 1: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 2: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 3: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 4: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 5: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] avg flat=5.0  avg algebra=1.0  saved=80%

 ✓  Claude annotates grounded vs ungrounded claims with [G]/[U] tags on request
    [gsar-live] annotated reply: "Water boils at 100°C at sea level [G]. It does not boil at 50°C [X]."
    [gsar-live] partition: {"grounded":1,"ungrounded":0,"contradicted":1,"complementary":0}

 ✓  GroundednessCondition exits loop once Claude produces a fully grounded reply
    [gsar-live] turn 1: score=0.333  partition={"grounded":1,"ungrounded":2,...}
    [gsar-live] turn 2: score=0.800  partition={"grounded":4,"ungrounded":0,...}
    [gsar-live] exited at turn 2, reason=grounded:proceed:s=0.800

 ✓  GSAR property P2 holds on real replies: factual reply scores higher than vague
    [gsar-live] vague:    score=0.000  partition={"grounded":0,"ungrounded":3,...}
    [gsar-live] grounded: score=1.000  partition={"grounded":3,"ungrounded":0,...}
    [gsar-live] grounded decision: proceed

 Test Files  1 passed (1)
      Tests  6 passed (6)
   Duration  18.4s

Run it yourself: OPENCLAW_LIVE_TEST=1 ANTHROPIC_API_KEY=sk-ant-... pnpm test:live -- src/agents/termination.algebra.live.test.ts


Unit test coverage — 128/128 passing

pnpm test src/agents/termination.test.ts \
          src/agents/termination.complex.test.ts \
          src/agents/gsar.test.ts \
          src/agents/gsar.openclaw.test.ts \
          src/agents/gsar.calibration.test.ts

 Test Files  5 passed (5)
      Tests  128 passed (128)
   Duration  164ms
FilePurpose
termination.tsCore algebra: conditions, combinators, Awaitable<T>
termination.test.tsUnit tests + Anthropic vs OpenAI behavioral proof + coffee shop illustrated proof
termination.complex.test.tsComplex composition, adversarial providers, real-world recipes
sessions-send-tool.a2a.tsA2A loop integration
gsar.tsGSAR scorer, decision function, GroundednessCondition, calibration helpers
gsar.test.tsStructural properties P1–P6, recovery scenarios
gsar.openclaw.test.tsOpenClaw-specific scenarios (SearXNG, message draft, code review)
gsar.calibration.test.tsSensitivity analysis, derivability, stability plateau
termination.algebra.live.test.tsLive proof against real Anthropic API, 5 iterations

Reviewer notes

  • ReplyPattern stateful regex bug fixed (ClawSweeper P2): RegExp.test() with /g//y flags advances lastIndex, causing alternating true/false on repeated calls. Fixed with this.pattern.lastIndex = 0 before each check. Regression test included.
  • gsar.ts is a library module imported only by tests today — added to the deadcode allowlist. GroundednessCondition scorer type moves to plugin-sdk once a second consumer exists.
  • Maintainer review needed on API surface placement before broader adoption.

Changed files

  • scripts/deadcode-unused-files.allowlist.mjs (modified, +1/-0)
  • scripts/demo-gsar-algebra.ts (added, +459/-0)
  • src/agents/gsar.calibration.test.ts (added, +329/-0)
  • src/agents/gsar.openclaw.test.ts (added, +623/-0)
  • src/agents/gsar.test.ts (added, +524/-0)
  • src/agents/gsar.ts (added, +369/-0)
  • src/agents/termination.algebra.live.test.ts (added, +416/-0)
  • src/agents/termination.complex.test.ts (added, +529/-0)
  • src/agents/termination.test.ts (added, +352/-0)
  • src/agents/termination.ts (added, +207/-0)
  • src/agents/tools/sessions-send-tool.a2a.ts (modified, +12/-0)
RAW_BUFFERClick to expand / collapse

Problem / motivation

The agents runtime currently composes termination conditions ad hoc per call site, which makes it hard to reason about precedence, override, and shadowing of stop conditions. Existing GSAR (grounding-scored agent reasoning) signals also aren't represented as a first-class scorer that termination policy can read.

Proposed: introduce a composable termination algebra (and/or/not over typed conditions) plus a GSAR grounding scorer wired into that algebra, so termination policy is expressible as data instead of imperative branching across runtime call sites.

Tracking PR

Implementation in #75165.

extent analysis

TL;DR

Introduce a composable termination algebra to express termination policy as data, allowing for more manageable and predictable termination conditions.

Guidance

  • Review the proposed implementation in PR #75165 to understand the changes and how they address the current issues with termination conditions.
  • Consider how the introduction of a composable termination algebra (and/or/not over typed conditions) will impact existing code and termination policies.
  • Evaluate the benefits of representing GSAR signals as a first-class scorer within the termination algebra.
  • Assess the potential effects on precedence, override, and shadowing of stop conditions with the new algebra.

Notes

The success of this approach depends on the correct implementation of the composable termination algebra and its integration with the GSAR grounding scorer.

Recommendation

Apply workaround: Introduce the composable termination algebra as proposed, to improve the manageability and predictability of termination conditions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Feature: composable termination algebra + GSAR grounding scorer for agents [1 pull requests, 1 comments, 2 participants]