openclaw - ✅(Solved) Fix Feature: composable termination algebra + GSAR grounding scorer for agents [1 pull requests, 1 comments, 2 participants]

openclaw2026-05-05 17:39:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#77981•Fetched 2026-05-06 06:18:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

fede-kamel

Participants

clawsweeper[bot]

fede-kamel

Timeline (top)

commented ×1cross-referenced ×1

Fix Action

Fixed

Fixed by PR: feat(agents): composable termination algebra + GSAR grounding scorer (https://github.com/openclaw/openclaw/pull/75165)

PR fix notes

PR #75165: feat(agents): composable termination algebra + GSAR grounding scorer

Repository: openclaw/openclaw
Author: fede-kamel
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/75165

Description (problem / solution / changelog)

Closes #77981

What this does

Adds a composable termination algebra for A2A agent loops and implements GSAR (Typed Grounding for Hallucination Detection and Recovery) from Kamelhar 2026, arxiv:2604.23366 as a first-class TerminationCondition. Both are wired into the A2A ping-pong loop in sessions-send-tool via an optional termination param.

Proof 1 — the algebra saves turns

Task: "Where is the nearest coffee shop?"

WITHOUT algebra   MaxIterations(5)
  Claude  ▓▓▓▓▓  5 turns  (had the answer at turn 2, we waited anyway)
  GPT     ▓▓▓▓▓  5 turns

WITH algebra   TextMention("FOUND IT").or(MaxIterations(5))
  Claude  ▓▓░░░  2 turns  ← "Blue Bottle on Mission St. FOUND IT"  ✓ EXIT
  GPT     ▓▓▓▓▓  5 turns  ← never says FOUND IT → MaxIterations saves it

  saved: 3 turns = 60% less waiting for Claude

Claude naturally signals completion. GPT does not. The combinator captures this difference without any provider-specific code.

Proof 2 — soft signals must be paired with a hard bound

soft_signal                        ✗  UNSAFE — may never stop
soft_signal.or(MaxIterations(N))   ✓  SAFE   — always terminates

Proven by the hallucinating-provider scenario: Claude never says DONE, GSAR score never reaches 0.80 — MaxIterations fires at turn 5 every time. Budget holds.

The AND combinator lets you guard against vague early answers:

const detailed = new CustomCondition(s => [s.replyText.length >= 40, "detailed_enough"]);
const cond = new TextMention("FOUND IT").and(detailed).or(new MaxIterations(5));
// turn 1: "FOUND IT!" — too short, fails `detailed`
// turn 2: "Blue Bottle on Mission St, 4 min walk. FOUND IT" — passes both ✓

Proof 3 — GSAR score formula (arxiv:2604.23366 §3.2)

        W(G) + W(K)
S = ─────────────────────────    ∈ [0, 1]
     W(G) + W(U) + ρ·W(X) + W(K)

G = grounded   U = ungrounded   X = contradicted   K = complementary

6 structural properties proven in tests:

Property	Statement	Test
P1 Boundedness	S ∈ [0,1] always	extreme partitions, empty partition
P2 Grounded monotonicity	U→G never decreases S	migration path across 10 steps
P3 Contradiction penalty	adding X never increases S	incremental X additions
P4 Complementary value	K contributes but < equivalent G	weight comparison
P5 Non-suppression	X stays in denominator even at ρ=0	ρ sweep
P6 Asymmetry	w(inference) < w(tool_match) strictly decreases S	weight maps

Proof 4 — GSAR decision tiers and threshold stability

score   0.0──────────0.65──────────0.80──────────1.0
              replan      regenerate      proceed
                                   ▲
                             τ_proceed = 0.80 sits on a
                             ≥85%-accuracy plateau across [0.75, 0.85]
                             not a cliff — safe to ship as default

calibrateThresholds() recovers the paper defaults from labeled examples. Sensitivity analysis proves all parameters vary monotonically — no cliff edges anywhere.

Proof 5 — GSAR wired into a recovery loop

// Algorithm 1 from the paper — bounded recovery, termination guaranteed
new GroundednessCondition(scorer).or(new MaxIterations(K_max))

Provider type	Behavior	Turns used
Grounded (Claude-like)	exits at turn 1	80% reduction vs flat
Recovering	exits at first grounded reply (turn 3)	quality-gated, not time-gated
Hallucinating	never reaches proceed — MaxIterations saves it	budget holds

Proof 6 — live integration test with real Anthropic API

src/agents/termination.algebra.live.test.ts — 6 tests against a real Claude endpoint:

OPENCLAW_LIVE_TEST=1 pnpm test:live -- src/agents/termination.algebra.live.test.ts

 ✓  flat MaxIterations: Claude runs all 5 turns regardless of answer quality
    [algebra-live] flat: 5 turns used
    [algebra-live] turn-1 reply: "The capital of France is Paris."

 ✓  TextMention("DONE").or(MaxIterations): Claude exits early once it says DONE
    [algebra-live] algebra: exited at turn 1, reason=text_mention:DONE
    [algebra-live] reply: "The capital of France is Paris. DONE"

 ✓  algebra exits Claude early across 5 independent runs — consistent savings
    [algebra-live] run 1: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 2: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 3: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 4: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] run 5: flat=5  algebra=1  reason=text_mention:DONE
    [algebra-live] avg flat=5.0  avg algebra=1.0  saved=80%

 ✓  Claude annotates grounded vs ungrounded claims with [G]/[U] tags on request
    [gsar-live] annotated reply: "Water boils at 100°C at sea level [G]. It does not boil at 50°C [X]."
    [gsar-live] partition: {"grounded":1,"ungrounded":0,"contradicted":1,"complementary":0}

 ✓  GroundednessCondition exits loop once Claude produces a fully grounded reply
    [gsar-live] turn 1: score=0.333  partition={"grounded":1,"ungrounded":2,...}
    [gsar-live] turn 2: score=0.800  partition={"grounded":4,"ungrounded":0,...}
    [gsar-live] exited at turn 2, reason=grounded:proceed:s=0.800

 ✓  GSAR property P2 holds on real replies: factual reply scores higher than vague
    [gsar-live] vague:    score=0.000  partition={"grounded":0,"ungrounded":3,...}
    [gsar-live] grounded: score=1.000  partition={"grounded":3,"ungrounded":0,...}
    [gsar-live] grounded decision: proceed

 Test Files  1 passed (1)
      Tests  6 passed (6)
   Duration  18.4s

Run it yourself: OPENCLAW_LIVE_TEST=1 ANTHROPIC_API_KEY=sk-ant-... pnpm test:live -- src/agents/termination.algebra.live.test.ts

Unit test coverage — 128/128 passing

pnpm test src/agents/termination.test.ts \
          src/agents/termination.complex.test.ts \
          src/agents/gsar.test.ts \
          src/agents/gsar.openclaw.test.ts \
          src/agents/gsar.calibration.test.ts

 Test Files  5 passed (5)
      Tests  128 passed (128)
   Duration  164ms

File	Purpose
`termination.ts`	Core algebra: conditions, combinators, `Awaitable<T>`
`termination.test.ts`	Unit tests + Anthropic vs OpenAI behavioral proof + coffee shop illustrated proof
`termination.complex.test.ts`	Complex composition, adversarial providers, real-world recipes
`sessions-send-tool.a2a.ts`	A2A loop integration
`gsar.ts`	GSAR scorer, decision function, `GroundednessCondition`, calibration helpers
`gsar.test.ts`	Structural properties P1–P6, recovery scenarios
`gsar.openclaw.test.ts`	OpenClaw-specific scenarios (SearXNG, message draft, code review)
`gsar.calibration.test.ts`	Sensitivity analysis, derivability, stability plateau
`termination.algebra.live.test.ts`	Live proof against real Anthropic API, 5 iterations

Reviewer notes

ReplyPattern stateful regex bug fixed (ClawSweeper P2): RegExp.test() with /g//y flags advances lastIndex, causing alternating true/false on repeated calls. Fixed with this.pattern.lastIndex = 0 before each check. Regression test included.
gsar.ts is a library module imported only by tests today — added to the deadcode allowlist. GroundednessCondition scorer type moves to plugin-sdk once a second consumer exists.
Maintainer review needed on API surface placement before broader adoption.

Changed files

scripts/deadcode-unused-files.allowlist.mjs (modified, +1/-0)
scripts/demo-gsar-algebra.ts (added, +459/-0)
src/agents/gsar.calibration.test.ts (added, +329/-0)
src/agents/gsar.openclaw.test.ts (added, +623/-0)
src/agents/gsar.test.ts (added, +524/-0)
src/agents/gsar.ts (added, +369/-0)
src/agents/termination.algebra.live.test.ts (added, +416/-0)
src/agents/termination.complex.test.ts (added, +529/-0)
src/agents/termination.test.ts (added, +352/-0)
src/agents/termination.ts (added, +207/-0)
src/agents/tools/sessions-send-tool.a2a.ts (modified, +12/-0)

RAW_BUFFERClick to expand / collapse

Problem / motivation

The agents runtime currently composes termination conditions ad hoc per call site, which makes it hard to reason about precedence, override, and shadowing of stop conditions. Existing GSAR (grounding-scored agent reasoning) signals also aren't represented as a first-class scorer that termination policy can read.

Proposed: introduce a composable termination algebra (and/or/not over typed conditions) plus a GSAR grounding scorer wired into that algebra, so termination policy is expressible as data instead of imperative branching across runtime call sites.

Tracking PR

Implementation in #75165.

extent analysis

TL;DR

Introduce a composable termination algebra to express termination policy as data, allowing for more manageable and predictable termination conditions.

Guidance

Review the proposed implementation in PR #75165 to understand the changes and how they address the current issues with termination conditions.
Consider how the introduction of a composable termination algebra (and/or/not over typed conditions) will impact existing code and termination policies.
Evaluate the benefits of representing GSAR signals as a first-class scorer within the termination algebra.
Assess the potential effects on precedence, override, and shadowing of stop conditions with the new algebra.

Notes

The success of this approach depends on the correct implementation of the composable termination algebra and its integration with the GSAR grounding scorer.

Recommendation

Apply workaround: Introduce the composable termination algebra as proposed, to improve the manageability and predictability of termination conditions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Feature: composable termination algebra + GSAR grounding scorer for agents [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #75165: feat(agents): composable termination algebra + GSAR grounding scorer

Description (problem / solution / changelog)

What this does

Proof 1 — the algebra saves turns

Proof 2 — soft signals must be paired with a hard bound

Proof 3 — GSAR score formula (arxiv:2604.23366 §3.2)

Proof 4 — GSAR decision tiers and threshold stability

Proof 5 — GSAR wired into a recovery loop

Proof 6 — live integration test with real Anthropic API

Unit test coverage — 128/128 passing

Reviewer notes

Changed files

Problem / motivation

Tracking PR

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Feature: composable termination algebra + GSAR grounding scorer for agents [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #75165: feat(agents): composable termination algebra + GSAR grounding scorer

Description (problem / solution / changelog)

What this does

Proof 1 — the algebra saves turns

Proof 2 — soft signals must be paired with a hard bound

Proof 3 — GSAR score formula (arxiv:2604.23366 §3.2)

Proof 4 — GSAR decision tiers and threshold stability

Proof 5 — GSAR wired into a recovery loop

Proof 6 — live integration test with real Anthropic API

Unit test coverage — 128/128 passing

Reviewer notes

Changed files

Problem / motivation

Tracking PR

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING