hermes - 💡(How to fix) Fix Research Report: How AI Agent Frameworks Enforce Behavioral Constraints

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Key insight: Hooks are "traffic lights" — the agent cannot bypass them because the check happens in the harness, not in the LLM.

Code Example

# skill: tdd
## Iron Law: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.

This is MANDATORY. You have committed to TDD.
All skilled engineers follow this process.
RAW_BUFFERClick to expand / collapse

AI Agent Behavioral Constraints: Comprehensive Research Report

Date: 2026-05-21 Researched by: Hermes Agent subagent Context: Understanding how to enforce behavioral constraints on AI agents — for Hermes Agent (80+ skills, terminal/file/code tools, subagents)


Executive Summary

AI agent frameworks enforce constraints through three layers:

LayerMechanismReliabilityBypassable?
Prompt-basedSystem prompts, AGENTS.md, CLAUDE.md, skill descriptionsSoft / ProbabilisticYes, routinely
Tool-basedHooks, git pre-commit, CI checks, lint gatesMedium-Hard / Deterministic at checkpointsPartially
Architecture-basedMode systems, tool filtering, policy enginesHardest / DeterministicNo (model can't override harness)

The fundamental finding: For LLMs, rules are suggestions, not laws. Compliance is probabilistic, not deterministic. The most effective strategy combines all three layers.


1. Prompt-Based Constraints

How it works: System prompts, AGENTS.md, CLAUDE.md, .cursorrules, skill descriptions, and SOUL.md are injected into the LLM's context window.

Hermes Agent's assembly order (from agent/prompt_builder.py):

  1. SOUL.md (replaces hardcoded identity)
  2. Tool-aware behavior guidance
  3. Honcho personality (when active)
  4. Optional system message
  5. Frozen MEMORY snapshot
  6. User profile
  7. Skills index (80+ skills as <available_skills>)
  8. Project context files (priority: .hermes.md > AGENTS.md > CLAUDE.md > .cursorrules)
  9. Timestamp + session
  10. Platform hint

Critical Limitations

  1. Rush Mode: Under rapid-fire requests, AI prioritizes task completion over process compliance (Yajin Zhou, 2026)
  2. Context Compaction: After context window exceeds limit and compacts, process standards are dropped — only task objectives survive
  3. AI Risk Assessment: The model reads rules, understands them, then decides they don't apply ("just a small change")
  4. Probability, Not Determinism: The model treats rules as one input signal among many
  5. Shorter Rules Work Better: 300+ lines → near-zero compliance; 5-10 lines → significant improvement
  6. Claude Code: Official docs state CLAUDE.md is context, not enforced law (GitHub issue #42863)

Superpowers' Psychological Persuasion

Superpowers (obra/superpowers, 89K+ stars) uses Cialdini's 7 Principles of Persuasion:

  • Authority: "This is mandatory / The Iron Law"
  • Commitment: "You have committed to checking skills first"
  • Scarcity: Time pressure framing
  • Social Proof: "All experienced developers follow this"

The Wharton paper "Call Me a Jerk: Persuading AI" confirms these have statistically significant impact on LLM compliance.


2. Tool-Based Constraints

The Enforcement Spectrum

LevelMechanismDeterminismBypassable?
SoftestSystem prompt rulesProbabilisticYes, routinely
MediumSkill descriptionsProbabilisticYes, under pressure
MediumPre-commit hooksDeterministic at commit timeOnly by --no-verify
HardCI pipeline checksDeterministic at push/PR timeForce-push/admin override
HardClaude Code PreToolUse HooksDeterministic per tool callNo (agent cannot bypass)
HardestAiria policy engineDeterministic at infra layerNo (outside model's reach)

Claude Code Hooks — The Current SOTA

Released early 2026. 21 lifecycle events, most powerful is PreToolUse which can block tool calls:

EventCan Block?Use Case
PreToolUseYESSecurity gates, file protection, command blocking
PostToolUseNoAuto-formatting, linting, logging
UserPromptSubmitNoSecret scanning, context injection
StopNoQuality checks, summary generation
PreCompactNoSave state before memory loss

Key insight: Hooks are "traffic lights" — the agent cannot bypass them because the check happens in the harness, not in the LLM.

Airia Agent Constraints — Enterprise Policy Engine

Deterministic IF-THEN policy engine between agents and resources:

  • Context Aggregator → Policy Evaluation Engine → Policy Enforcement Engine
  • Sub-10ms latency for simple policies
  • Supports: Allow, Block, Limit, or Require Human Approval

3. Architecture-Based Constraints

Kilocode Mode System (Inspiration for Hermes #476)

Each mode = persona + tool permission ruleset + behavioral constraints. Rules use glob matching with LAST matching rule wins.

ModeTool Access
CodeFull access (read, edit, terminal, browser, MCP)
AskRead-only — NO edit, NO terminal
ArchitectRead + Markdown-only edit
DebugFull access + debug prompt
OrchestratorRead + bash + task delegation — NO direct edit
ReviewRead + conditional edit

Why this is the hardest constraint: If mode denies Edit, the harness removes the tool before the LLM even decides.

Hermes Agent's Current State (#476)

Hermes has ingredients but no linked mode system: /personality, /prompt, SOUL.md, --tools. The gap: No concept of a mode that simultaneously changes persona, restricts tools, and modifies behavioral guidance.

Proposed: Mode dataclass + modified _execute_tool_calls() for mode-aware filtering.

Superpowers' Process Architecture

Not just prompts — a workflow architecture with architectural gates:

  1. Brainstorming: Hard gate — no code until design approved
  2. Planning: 2-5 minute tasks with exact paths
  3. TDD: "NO CODE WITHOUT A FAILING TEST FIRST. Write code before the test? Delete it."
  4. Subagent-driven development: Fresh agent per task, two-stage review
  5. Code review: Separate subagent, no session history sharing

4. Framework Comparison

FrameworkPrompt ConstraintsTool ConstraintsArchitecture ConstraintsUnique Strength
Hermes AgentSOUL.md, AGENTS.md, 80+ skills indexTerminal/file/code tools, cron, gateway callbacksSubagent delegation with skip_context_filesLearning loop writes own skills
Claude CodeCLAUDE.md, .claude/rules/*.mdHooks (21 events)Modes/agents via SDKBest hooks system
Cursor.cursorrules, .cursor/rules/*.mdcHooks (subset)None significantIDE integration, broadest user base
SuperpowersSKILL.md with persuasion principlesSubagent review (separate context)Brainstorming→Plan→TDD gatesPsychological persuasion + pressure scenarios
AiriaN/APolicy engine at infra layerDeterministic IF-THENEnterprise policy governance
KilocodeMode-specific personaTool permission rulesetsMode system = persona + tools + constraintsLast-match-wins rule resolution

5. Practical Recommendation for Hermes Agent

Three-Layer Strategy

Layer 1: Prompt — Optimize for Compliance (Immediate)

A. SOUL.md — Keep 5-10 lines max. No explanations. Only non-negotiable rules.

B. Skill Descriptions — Use Persuasion. Every skill needs an Iron Law + Cialdini principles. Example:

# skill: tdd
## Iron Law: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.

This is MANDATORY. You have committed to TDD.
All skilled engineers follow this process.

C. AGENTS.md — Under 10 lines. Strong language: ALWAYS, NEVER, MANDATORY. No explanations.

Layer 2: Tool Enforcement — Add Hard Checkpoints (Short-term)

A. PreToolUse-style filtering in _execute_tool_calls() — check current mode before executing tools.

B. Skill-level guard functions — preconditions before critical skill actions execute.

C. Subagent review pattern — separate review subagent with clean context (no session history sharing).

Layer 3: Architecture — Mode System (Medium-term)

A. Implement Mode system (#476): Mode dataclass combining persona + tool whitelist + behavioral rules.

B. Tool-based mode enforcement: If mode denies a tool, harness rejects it — LLM cannot override.

C. Mode propagation to subagents via delegate_task.

Priority Order

  1. Immediate: Trim SOUL.md + audit skill descriptions with persuasion language
  2. Short-term: Implement Mode system (#476) — single highest-value architectural change
  3. Medium-term: PreToolUse-style hooks + skill guard functions
  4. Long-term: Subagent review gate + learning loop for self-improving constraints

What NOT to Do

  • ❌ Don't write 300-line rule files (proven ineffective)
  • ❌ Don't rely on prompts alone for critical constraints
  • ❌ Don't assume subagents inherit parent constraints
  • ❌ Don't trust agent self-attestations

Key References

  1. Hermes Agent Prompt Assembly
  2. Hermes Agent Mode System (#476)
  3. Claude Code Hooks
  4. Why AI Agents Break Rules (Yajin Zhou, 2026)
  5. Superpowers Framework
  6. Cursor Rules Empirical Study (arXiv:2512.18925)
  7. Airia Agent Constraints
  8. Knostic — Why Cursor Ignores Rules
  9. Superpowers Persuasion Analysis
  10. AIOS: LLM Agent Operating System (arXiv:2403.16971)
  11. Pixelmojo Claude Code Hooks Guide
  12. Speakeasy AI Agent Hooks Guide
  13. Termdock Superpowers Deep Dive

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING