AI Agent Behavioral Constraints: Comprehensive Research Report

StepCodex · 2026-05-21T05:04:38Z

[hermes] AI Agent Behavioral Constraints: Comprehensive Research Report Date: 2026-05-21 Researched by: Hermes Agent subagent Context: Understanding how to enf… # AI Agent Behavioral Constraints: Comprehensive Research Report **Date:** 2026-05-21 **Researched by:** Hermes Agent subagent **Context:** Understanding how to enforce behavioral constraints on AI agents — for Hermes Agent (80+ skills, terminal/file/code tools, subagents) --- ## Executive Summary AI agent frameworks enforce constraints through **three layers**: | Layer | Mechanism | Reliability | Bypassable? | |-------|-----------|-------------|-------------| | **Prompt-based** | System prompts, AGENTS.md, CLAUDE.md, skill descriptions | **Soft / Probabilistic** | Yes, routinely | | **Tool-based** | Hooks, git pre-commit, CI checks, lint gates | **Medium-Hard / Deterministic at checkpoints** | Partially | | **Architecture-based** | Mode systems, tool filtering, policy engines | **Hardest / Deterministic** | No (model can't override harness) | **The fundamental finding:** For LLMs, rules are *suggestions, not laws*. Compliance is probabilistic, not deterministic. The most effective strategy combines all three layers. --- ## 1. Prompt-Based Constraints **How it works:** System prompts, AGENTS.md, CLAUDE.md, .cursorrules, skill descriptions, and SOUL.md are injected into the LLM's context window. **Hermes Agent's assembly order** (from `agent/prompt_builder.py`): 1. SOUL.md (replaces hardcoded identity) 2. Tool-aware behavior guidance 3. Honcho personality (when active) 4. Optional system message 5. Frozen MEMORY snapshot 6. User profile 7. Skills index (80+ skills as ` `) 8. Project context files (priority: .hermes.md > AGENTS.md > CLAUDE.md > .cursorrules) 9. Timestamp + session 10. Platform hint ### Critical Limitations 1. **Rush Mode:** Under rapid-fire requests, AI prioritizes task completion over process compliance (Yajin Zhou, 2026) 2. **Context Compaction:** After context window exceeds limit and compacts, process standards are dropped — only task objectives survive 3. **AI Risk Assessment:** The model reads rules, understands them, then *decides* they don't apply ("just a small change") 4. **Probability, Not Determinism:** The model treats rules as one input signal among many 5. **Shorter Rules Work Better:** 300+ lines → near-zero compliance; 5-10 lines → significant improvement 6. **Claude Code:** Official docs state CLAUDE.md is **context, not enforced law** (GitHub issue #42863) ### Superpowers' Psychological Persuasion Superpowers (obra/superpowers, 89K+ stars) uses **Cialdini's 7 Principles of Persuasion**: - **Authority:** "This is mandatory / The Iron Law" - **Commitment:** "You have committed to checking skills first" - **Scarcity:** Time pressure framing - **Social Proof:** "All experienced developers follow this" The Wharton paper "Call Me a Jerk: Persuading AI" confirms these have **statistically significant impact** on LLM compliance. --- ## 2. Tool-Based Constraints ### The Enforcement Spectrum | Level | Mechanism | Determinism | Bypassable? | |-------|-----------|-------------|-------------| | Softest | System prompt rules | Probabilistic | Yes, routinely | | Medium | Skill descriptions | Probabilistic | Yes, under pressure | | Medium | Pre-commit hooks | Deterministic at commit time | Only by `--no-verify` | | Hard | CI pipeline checks | Deterministic at push/PR time | Force-push/admin override | | Hard | **Claude Code PreToolUse Hooks** | Deterministic per tool call | **No** (agent cannot bypass) | | Hardest | **Airia policy engine** | Deterministic at infra layer | **No** (outside model's reach) | ### Claude Code Hooks — The Current SOTA Released early 2026. 21 lifecycle events, most powerful is **PreToolUse** which can block tool calls: | Event | Can Block? | Use Case | |-------|:----------:|----------| | **PreToolUse** | **YES** | Security gates, file protection, command blocking | | **PostToolUse** | No | Auto-formatting, linting, logging | | **UserPromptSubmit** | No | Secret scanning, context injection | | **Stop** | No | Quality checks, summary generation | | **PreCompact** | No | Save state before memory loss | **Key insight:** Hooks are "traffic lights" — the agent cannot bypass them because the check happens in the **harness**, not in the LLM. ### Airia Agent Constraints — Enterprise Policy Engine Deterministic IF-THEN policy engine between agents and resources: - Context Aggregator → Policy Evaluation Engine → Policy Enforcement Engine - Sub-10ms latency for simple policies - Supports: Allow, Block, Limit, or Require Human Approval --- ## 3. Architecture-Based Constraints ### Kilocode Mode System (Inspiration for Hermes #476) Each mode = persona + tool permission ruleset + behavioral constraints. Rules use glob matching with **LAST matching rule wins**. | Mode | Tool Access | |------|-------------| | Code | Full access (read, edit, terminal, browser, MCP) | | Ask | **Read-only** — NO edit, NO termi

Date: 2026-05-21 Researched by: Hermes Agent subagent Context: Understanding how to enforce behavioral constraints on AI agents — for Hermes Agent (80+ skills, terminal/file/code tools, subagents)

Executive Summary

AI agent frameworks enforce constraints through three layers:

Layer	Mechanism	Reliability	Bypassable?
Prompt-based	System prompts, AGENTS.md, CLAUDE.md, skill descriptions	Soft / Probabilistic	Yes, routinely
Tool-based	Hooks, git pre-commit, CI checks, lint gates	Medium-Hard / Deterministic at checkpoints	Partially
Architecture-based	Mode systems, tool filtering, policy engines	Hardest / Deterministic	No (model can't override harness)

The fundamental finding: For LLMs, rules are suggestions, not laws. Compliance is probabilistic, not deterministic. The most effective strategy combines all three layers.

1. Prompt-Based Constraints

How it works: System prompts, AGENTS.md, CLAUDE.md, .cursorrules, skill descriptions, and SOUL.md are injected into the LLM's context window.

Hermes Agent's assembly order (from agent/prompt_builder.py):

SOUL.md (replaces hardcoded identity)
Tool-aware behavior guidance
Honcho personality (when active)
Optional system message
Frozen MEMORY snapshot
User profile
Skills index (80+ skills as <available_skills>)
Project context files (priority: .hermes.md > AGENTS.md > CLAUDE.md > .cursorrules)
Timestamp + session
Platform hint

Critical Limitations

Rush Mode: Under rapid-fire requests, AI prioritizes task completion over process compliance (Yajin Zhou, 2026)
Context Compaction: After context window exceeds limit and compacts, process standards are dropped — only task objectives survive
AI Risk Assessment: The model reads rules, understands them, then decides they don't apply ("just a small change")
Probability, Not Determinism: The model treats rules as one input signal among many
Shorter Rules Work Better: 300+ lines → near-zero compliance; 5-10 lines → significant improvement
Claude Code: Official docs state CLAUDE.md is context, not enforced law (GitHub issue #42863)

Superpowers' Psychological Persuasion

Superpowers (obra/superpowers, 89K+ stars) uses Cialdini's 7 Principles of Persuasion:

Authority: "This is mandatory / The Iron Law"
Commitment: "You have committed to checking skills first"
Scarcity: Time pressure framing
Social Proof: "All experienced developers follow this"

The Wharton paper "Call Me a Jerk: Persuading AI" confirms these have statistically significant impact on LLM compliance.

2. Tool-Based Constraints

The Enforcement Spectrum

Level	Mechanism	Determinism	Bypassable?
Softest	System prompt rules	Probabilistic	Yes, routinely
Medium	Skill descriptions	Probabilistic	Yes, under pressure
Medium	Pre-commit hooks	Deterministic at commit time	Only by `--no-verify`
Hard	CI pipeline checks	Deterministic at push/PR time	Force-push/admin override
Hard	Claude Code PreToolUse Hooks	Deterministic per tool call	No (agent cannot bypass)
Hardest	Airia policy engine	Deterministic at infra layer	No (outside model's reach)

Claude Code Hooks — The Current SOTA

Released early 2026. 21 lifecycle events, most powerful is PreToolUse which can block tool calls:

Event	Can Block?	Use Case
PreToolUse	YES	Security gates, file protection, command blocking
PostToolUse	No	Auto-formatting, linting, logging
UserPromptSubmit	No	Secret scanning, context injection
Stop	No	Quality checks, summary generation
PreCompact	No	Save state before memory loss

Key insight: Hooks are "traffic lights" — the agent cannot bypass them because the check happens in the harness, not in the LLM.

Airia Agent Constraints — Enterprise Policy Engine

Deterministic IF-THEN policy engine between agents and resources:

Context Aggregator → Policy Evaluation Engine → Policy Enforcement Engine
Sub-10ms latency for simple policies
Supports: Allow, Block, Limit, or Require Human Approval

3. Architecture-Based Constraints

Kilocode Mode System (Inspiration for Hermes #476)

Each mode = persona + tool permission ruleset + behavioral constraints. Rules use glob matching with LAST matching rule wins.

Mode	Tool Access
Code	Full access (read, edit, terminal, browser, MCP)
Ask	Read-only — NO edit, NO terminal
Architect	Read + Markdown-only edit
Debug	Full access + debug prompt
Orchestrator	Read + bash + task delegation — NO direct edit
Review	Read + conditional edit

Why this is the hardest constraint: If mode denies Edit, the harness removes the tool before the LLM even decides.

Hermes Agent's Current State (#476)

Hermes has ingredients but no linked mode system: /personality, /prompt, SOUL.md, --tools. The gap: No concept of a mode that simultaneously changes persona, restricts tools, and modifies behavioral guidance.

Proposed: Mode dataclass + modified _execute_tool_calls() for mode-aware filtering.

Superpowers' Process Architecture

Not just prompts — a workflow architecture with architectural gates:

Brainstorming: Hard gate — no code until design approved
Planning: 2-5 minute tasks with exact paths
TDD: "NO CODE WITHOUT A FAILING TEST FIRST. Write code before the test? Delete it."
Subagent-driven development: Fresh agent per task, two-stage review
Code review: Separate subagent, no session history sharing

4. Framework Comparison

Framework	Prompt Constraints	Tool Constraints	Architecture Constraints	Unique Strength
Hermes Agent	SOUL.md, AGENTS.md, 80+ skills index	Terminal/file/code tools, cron, gateway callbacks	Subagent delegation with `skip_context_files`	Learning loop writes own skills
Claude Code	CLAUDE.md, `.claude/rules/*.md`	Hooks (21 events)	Modes/agents via SDK	Best hooks system
Cursor	`.cursorrules`, `.cursor/rules/*.mdc`	Hooks (subset)	None significant	IDE integration, broadest user base
Superpowers	SKILL.md with persuasion principles	Subagent review (separate context)	Brainstorming→Plan→TDD gates	Psychological persuasion + pressure scenarios
Airia	N/A	Policy engine at infra layer	Deterministic IF-THEN	Enterprise policy governance
Kilocode	Mode-specific persona	Tool permission rulesets	Mode system = persona + tools + constraints	Last-match-wins rule resolution

5. Practical Recommendation for Hermes Agent

Three-Layer Strategy

Layer 1: Prompt — Optimize for Compliance (Immediate)

A. SOUL.md — Keep 5-10 lines max. No explanations. Only non-negotiable rules.

B. Skill Descriptions — Use Persuasion. Every skill needs an Iron Law + Cialdini principles. Example:

# skill: tdd
## Iron Law: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.

This is MANDATORY. You have committed to TDD.
All skilled engineers follow this process.

C. AGENTS.md — Under 10 lines. Strong language: ALWAYS, NEVER, MANDATORY. No explanations.

Layer 2: Tool Enforcement — Add Hard Checkpoints (Short-term)

A. PreToolUse-style filtering in _execute_tool_calls() — check current mode before executing tools.

B. Skill-level guard functions — preconditions before critical skill actions execute.

C. Subagent review pattern — separate review subagent with clean context (no session history sharing).

Layer 3: Architecture — Mode System (Medium-term)

A. Implement Mode system (#476): Mode dataclass combining persona + tool whitelist + behavioral rules.

B. Tool-based mode enforcement: If mode denies a tool, harness rejects it — LLM cannot override.

C. Mode propagation to subagents via delegate_task.

Priority Order

Immediate: Trim SOUL.md + audit skill descriptions with persuasion language
Short-term: Implement Mode system (#476) — single highest-value architectural change
Medium-term: PreToolUse-style hooks + skill guard functions
Long-term: Subagent review gate + learning loop for self-improving constraints

What NOT to Do

❌ Don't write 300-line rule files (proven ineffective)
❌ Don't rely on prompts alone for critical constraints
❌ Don't assume subagents inherit parent constraints
❌ Don't trust agent self-attestations

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Research Report: How AI Agent Frameworks Enforce Behavioral Constraints

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example