hermes - 💡(How to fix) Fix [Feature Proposal] 4 Design Patterns from Tencent Marvis: Pre-built Agent Profiles, Cloud-Local Routing, Desktop GUI Agent, Tiered Security

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Tencent launched Marvis on May 20, 2026 — an OS-level AI assistant that embeds between the user and the operating system. After studying its architecture in depth, I identified 4 design patterns that could significantly improve Hermes without requiring a rewrite. This issue maps each pattern to Hermes' existing architecture with concrete implementation suggestions.

Note for international team: Marvis is a China-market product (QQ login required, Chinese UI). I've extracted all the relevant architecture details below so you don't need to register or download it. The value is in the design patterns, not the product itself.


Error Message

  • "Take a screenshot of this error dialog and explain what's wrong" return {"error": "This operation requires human execution. Refusing."}

Root Cause

  1. Multi-Agent architecture is not academic — 6 agents working in parallel, on consumer hardware, for non-technical users
  2. OS-level AI middleware is viable — users are willing to let AI control their desktop if safety guarantees are clear
  3. Cloud-local routing saves real money — Tencent's free 10M token/day is only sustainable because >60% of processing stays local
  4. GUI Agent is the natural extension of Code Agent — the jump from "AI that writes code" to "AI that operates your computer" is smaller than it seems

Fix Action

Fix / Workaround

User Natural Language
┌─────────────────┐
│  PM Agent        │  Understands intent, decomposes tasks, parallel dispatches
│  (Orchestrator)  │  Powered by Hunyuan + DeepSeek V4 (cloud)
└──┬──┬──┬──┬──┬──┘
   │  │  │  │  │
   ↓  ↓  ↓  ↓  ↓
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ File │ │System│ │ App  │ │Browser│ │Search│
│Agent │ │Agent │ │Agent │ │Agent  │ │Agent │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘

The PM Agent uses a structured task dispatch protocol — not just forwarding the user's raw text, but packaging it with context, history, dependencies, and expected output schema.

Code Example

Traditional AI:    UserChat WindowText Output
Marvis:            UserNatural LanguageOS APIsFiles / Settings / Apps / Hardware

---

User Natural Language
┌─────────────────┐
PM AgentUnderstands intent, decomposes tasks, parallel dispatches
  (Orchestrator)Powered by Hunyuan + DeepSeek V4 (cloud)
└──┬──┬──┬──┬──┬──┘
   │  │  │  │  │
   ↓  ↓  ↓  ↓  ↓
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
File │ │System│ │ App  │ │Browser│ │Search│
│Agent │ │Agent │ │Agent │ │Agent  │ │Agent │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘

---

# ~/.hermes/agent_profiles.yaml
profiles:
  file-agent:
    description: "File search, read, write, convert, OCR"
    toolsets: [terminal, file, vision]
    system_prompt: |
      You are a file specialist. Your job is to locate, read, and process files.
      - Search by content, not just filename
      - For images: use vision tools to describe content
      - Always return absolute file paths in results
    preferred_model: "deepseek-v4-flash"  # cheaper for simple file ops
    
  system-agent:
    description: "System diagnostics, settings, cleanup"
    toolsets: [terminal]
    system_prompt: |
      You are a system operations specialist.
      - Diagnose system issues (disk, memory, network, processes)
      - NEVER run destructive commands (rm -rf, format, dd) without explicit confirmation
      - Prefer read-only diagnostics first
    risk_level: medium
    
  browser-agent:
    description: "Web interaction, scraping, form automation"
    toolsets: [browser]
    system_prompt: |
      You are a web interaction specialist.
      - Navigate, extract, fill forms, click buttons
      - Always return the URL you're on and what you found
    
  search-agent:
    description: "Web search and information aggregation"
    toolsets: [web]
    system_prompt: |
      You are a search and research specialist.
      - Search broadly first, then drill down
      - Always cite sources with URLs
      - Synthesize findings into a structured summary

---

# Current:
delegate_task(goal="find all invoices", toolsets=["terminal", "file"])

# Proposed:
delegate_task(goal="find all invoices", profile="file-agent")
# Resolves to: toolsets + system_prompt + model from agent_profiles.yaml

---

# In run_agent.py, before delegation:
task_type = classify_task(user_message)  # "file_ops" | "system_ops" | "web_search" | ...
if task_type in agent_profiles:
    delegate_task(goal=user_message, profile=task_type)

---

User InputPM Agent analyzes:
  ├─ "Organize my invoices"Cloud intent understanding + Local file search
  ├─ "Review this contract's risk"Pure local, no cloud upload (privacy)
  └─ "What's the weather today"Cloud search

---

# ~/.hermes/routing.yaml
routing:
  enabled: true
  default_model: "deepseek-v4-pro"       # cloud, for general use
  
  local_models:
    primary: "qwen2.5-7b-local"          # via ollama or similar
    fallback: "deepseek-v4-flash"        # cheap cloud if local unavailable
  
  rules:
    - name: "privacy-sensitive"
      patterns:
        keywords: ["contract", "financial", "passport", "password", "confidential", "NDA"]
        file_paths: ["~/Documents/finance/", "~/Desktop/tax/"]
      route_to: local
      reason: "contains sensitive data"
      
    - name: "simple-file-ops"
      patterns:
        intents: ["read_file", "list_directory", "find_file", "file_stats"]
      route_to: local
      reason: "simple operation, no cloud needed"
      
    - name: "web-search"
      patterns:
        intents: ["web_search", "browser_navigate"]
      route_to: cloud
      reason: "requires internet access"
      
    - name: "complex-reasoning"
      patterns:
        keywords: ["explain", "analyze", "compare", "refactor", "design"]
        min_complexity: 0.7  # heuristic score
      route_to: cloud
      reason: "needs large model reasoning"

---

# Pseudo-code for where routing would hook in:
def select_model(self, user_message: str, context: dict) -> str:
    if not self.routing_enabled:
        return self.default_model
    
    task_type = self.classify_task(user_message)
    sensitivity = self.assess_sensitivity(user_message, context)
    
    if sensitivity == "high" or task_type in ["simple_file_ops"]:
        return self.route_to_local()
    elif task_type in ["web_search", "complex_reasoning"]:
        return self.route_to_cloud()
    
    return self.default_model

---

# New tools (Linux via xdotool/ydotool, Windows via existing Win API)
@register_tool(toolset="desktop", risk_level="medium")
def desktop_screenshot(region: str = "full") -> str:
    """Capture screen and return path for vision analysis. 
    region: 'full' | 'active_window' | 'selection'"""
    # Linux: import -window root /tmp/screenshot.png
    # Windows: existing win32 API
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_click(x: int, y: int, button: str = "left") -> str:
    """Click at screen coordinates."""
    # Linux: xdotool mousemove X Y click 1
    # Windows: SetCursorPos + mouse_event
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_type(text: str) -> str:
    """Type text at current focus."""
    # Linux: xdotool type "..."
    pass

@register_tool(toolset="desktop", risk_level="low")
def desktop_list_windows() -> list:
    """List all open windows with titles and positions."""
    # Linux: wmctrl -l
    # Windows: EnumWindows
    pass

---

# Dedicated platform adapters (similar to terminal backends)
@register_tool(toolset="desktop")
def system_setting_get(key: str) -> str:
    """Read a system setting by name."""
    # Windows: registry or Settings app API
    # Linux: gsettings / dconf
    # macOS: defaults read
    pass

@register_tool(toolset="desktop")
def file_search_semantic(query: str, path: str = "~") -> list:
    """Find files by content description, not just filename.
    'the PDF invoice from last month' → finds the right file"""
    # Uses local embedding model to index files
    pass

---

# In tools/registry.py or tool definitions
@register_tool(risk_level="low")
def read_file(path: str, ...): ...

@register_tool(risk_level="medium")
def delete_file(path: str): ...

@register_tool(risk_level="high", human_only=True)
def execute_payment(amount: float, ...): ...

---

# Auto-generated from tool registry, injected into every Agent session
RISK_RULES = """
## Operation Safety Tiers

When using tools, always respect these risk levels:

🟢 LOW risk (auto-execute): read_file, search, grep, list, stat, cat
Execute immediately, no confirmation needed

🟡 MEDIUM risk (confirm first): delete, move, write, install, systemctl, chmod
Before executing: explain what you're about to do and wait for confirmation
NEVER batch medium-risk operations without individual confirmation

🔴 HIGH risk (FORBIDDEN): payments, sudo destructive, auth changes, rm -rf /
DO NOT execute these under any circumstances
Tell the user to perform these operations themselves
"""

---

# In model_tools.py handle_function_call()
def handle_function_call(tool_name: str, args: dict, task_id: str):
    risk = TOOL_RISK_LEVELS.get(tool_name, "medium")  # default medium
    
    if risk == "high" and TOOL_HUMAN_ONLY.get(tool_name, False):
        return {"error": "This operation requires human execution. Refusing."}
    
    if risk == "medium" and not confirmation_granted(tool_name, args, task_id):
        return {"requires_confirmation": True, "plan": f"About to run {tool_name} with {args}"}
    
    return execute_tool(tool_name, args)
RAW_BUFFERClick to expand / collapse

Summary

Tencent launched Marvis on May 20, 2026 — an OS-level AI assistant that embeds between the user and the operating system. After studying its architecture in depth, I identified 4 design patterns that could significantly improve Hermes without requiring a rewrite. This issue maps each pattern to Hermes' existing architecture with concrete implementation suggestions.

Note for international team: Marvis is a China-market product (QQ login required, Chinese UI). I've extracted all the relevant architecture details below so you don't need to register or download it. The value is in the design patterns, not the product itself.


What is Marvis? (Quick Context)

Marvis is not a chatbot. It's an AI middleware layer sitting between the user and the OS:

Traditional AI:    User → Chat Window → Text Output
Marvis:            User → Natural Language → OS APIs → Files / Settings / Apps / Hardware

Key stats (for context, not to copy):

  • Built by Tencent's App Store team (PC ecosystem veterans)
  • Windows + Mac + Android (iOS coming), cross-device sync
  • Free tier: 10 million tokens/day (Tencent eats the cloud cost)
  • Has strategic partnership with Microsoft for Windows API access
  • Launched May 20, 2026 — less than a week ago, already gaining traction

The 4 Design Patterns Hermes Should Adopt

🥇 Pattern 1: Pre-Configured Specialized Sub-Agents ("1+5 Architecture")

What Marvis does

Ships with 1 PM Agent (orchestrator) + 5 specialist agents pre-configured out of the box:

User Natural Language
┌─────────────────┐
│  PM Agent        │  Understands intent, decomposes tasks, parallel dispatches
│  (Orchestrator)  │  Powered by Hunyuan + DeepSeek V4 (cloud)
└──┬──┬──┬──┬──┬──┘
   │  │  │  │  │
   ↓  ↓  ↓  ↓  ↓
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ File │ │System│ │ App  │ │Browser│ │Search│
│Agent │ │Agent │ │Agent │ │Agent  │ │Agent │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘

Each specialist has a fixed, narrow scope:

AgentScopeImplementation
File AgentFile search/read/write/convert, image content search, OCRLocal file index + semantic search
Computer AgentSystem settings, hardware diagnostics, cleanupWindows API direct calls (not simulated clicks)
App AgentGUI control of desktop + Android appsVisual recognition + simulated input
Browser AgentWeb interaction, data scraping, form fillingBrowser takeover + DOM parsing
Search AgentWeb search + information aggregationSearch engine API calls

The PM Agent uses a structured task dispatch protocol — not just forwarding the user's raw text, but packaging it with context, history, dependencies, and expected output schema.

Hermes current state

  • ✅ Has delegate_task for spawning sub-agents
  • ✅ Has toolsets for scoping sub-agent capabilities
  • ✅ Has #9459 tracking "agent profiles for delegate_task"
  • ❌ No pre-configured specialist agents — every delegation requires manual toolset/prompt specification
  • ❌ Sub-agents are generalists by default, not specialists by design

Concrete proposal

Step 1 — Define agent profiles in config:

# ~/.hermes/agent_profiles.yaml
profiles:
  file-agent:
    description: "File search, read, write, convert, OCR"
    toolsets: [terminal, file, vision]
    system_prompt: |
      You are a file specialist. Your job is to locate, read, and process files.
      - Search by content, not just filename
      - For images: use vision tools to describe content
      - Always return absolute file paths in results
    preferred_model: "deepseek-v4-flash"  # cheaper for simple file ops
    
  system-agent:
    description: "System diagnostics, settings, cleanup"
    toolsets: [terminal]
    system_prompt: |
      You are a system operations specialist.
      - Diagnose system issues (disk, memory, network, processes)
      - NEVER run destructive commands (rm -rf, format, dd) without explicit confirmation
      - Prefer read-only diagnostics first
    risk_level: medium
    
  browser-agent:
    description: "Web interaction, scraping, form automation"
    toolsets: [browser]
    system_prompt: |
      You are a web interaction specialist.
      - Navigate, extract, fill forms, click buttons
      - Always return the URL you're on and what you found
    
  search-agent:
    description: "Web search and information aggregation"
    toolsets: [web]
    system_prompt: |
      You are a search and research specialist.
      - Search broadly first, then drill down
      - Always cite sources with URLs
      - Synthesize findings into a structured summary

Step 2 — Extend delegate_task to accept profile names:

# Current:
delegate_task(goal="find all invoices", toolsets=["terminal", "file"])

# Proposed:
delegate_task(goal="find all invoices", profile="file-agent")
# Resolves to: toolsets + system_prompt + model from agent_profiles.yaml

Step 3 — PM Agent auto-routing (future, optional):

The main Agent could auto-detect task type and route to the right specialist:

# In run_agent.py, before delegation:
task_type = classify_task(user_message)  # "file_ops" | "system_ops" | "web_search" | ...
if task_type in agent_profiles:
    delegate_task(goal=user_message, profile=task_type)

This is the lowest-hanging fruit — it leverages existing infrastructure and is tracked in #9459.


🥈 Pattern 2: Cloud-Local Auto Routing

What Marvis does

Marvis doesn't make users manually choose between cloud and local models. It auto-routes based on task characteristics:

User Input → PM Agent analyzes:
  ├─ "Organize my invoices"        → Cloud intent understanding + Local file search
  ├─ "Review this contract's risk" → Pure local, no cloud upload (privacy)
  └─ "What's the weather today"    → Cloud search

The routing logic:

FactorRoutes to CloudRoutes to Local
Task complexityMulti-step planning, ambiguous intentSimple, well-defined operations
Data sensitivityPublic informationPersonal files, contracts, financial data
Connectivity requirementWeb search, API callsFile ops, system settings
Cost sensitivityComplex reasoning needs big modelsSimple tasks waste cloud tokens

The key insight: Marvis does heavy pre-processing locally (file indexing, image OCR, text extraction) BEFORE sending anything to the cloud. This means cloud models get structured, pre-digested input instead of raw data — cutting token usage by 60-80%.

Hermes current state

  • ✅ Supports multiple model providers
  • /model command for manual switching
  • ❌ One model per session — no runtime routing
  • ❌ All tool output goes to the cloud model regardless of sensitivity
  • ❌ No local pre-processing pipeline

Concrete proposal

Step 1 — Task classifier (lightweight, rule-based first):

# ~/.hermes/routing.yaml
routing:
  enabled: true
  default_model: "deepseek-v4-pro"       # cloud, for general use
  
  local_models:
    primary: "qwen2.5-7b-local"          # via ollama or similar
    fallback: "deepseek-v4-flash"        # cheap cloud if local unavailable
  
  rules:
    - name: "privacy-sensitive"
      patterns:
        keywords: ["contract", "financial", "passport", "password", "confidential", "NDA"]
        file_paths: ["~/Documents/finance/", "~/Desktop/tax/"]
      route_to: local
      reason: "contains sensitive data"
      
    - name: "simple-file-ops"
      patterns:
        intents: ["read_file", "list_directory", "find_file", "file_stats"]
      route_to: local
      reason: "simple operation, no cloud needed"
      
    - name: "web-search"
      patterns:
        intents: ["web_search", "browser_navigate"]
      route_to: cloud
      reason: "requires internet access"
      
    - name: "complex-reasoning"
      patterns:
        keywords: ["explain", "analyze", "compare", "refactor", "design"]
        min_complexity: 0.7  # heuristic score
      route_to: cloud
      reason: "needs large model reasoning"

Step 2 — Local pre-processing pipeline:

Before sending context to the cloud model, run a lightweight local model to:

  • Extract key information from tool outputs (summarize large file contents)
  • Classify task sensitivity
  • Structure raw data (JSON-ify free text)

This mirrors what Hermes already does with context compression (#31684), but using a local model instead of just truncation.

Step 3 — Integration point in run_agent.py:

# Pseudo-code for where routing would hook in:
def select_model(self, user_message: str, context: dict) -> str:
    if not self.routing_enabled:
        return self.default_model
    
    task_type = self.classify_task(user_message)
    sensitivity = self.assess_sensitivity(user_message, context)
    
    if sensitivity == "high" or task_type in ["simple_file_ops"]:
        return self.route_to_local()
    elif task_type in ["web_search", "complex_reasoning"]:
        return self.route_to_cloud()
    
    return self.default_model

🥉 Pattern 3: Desktop/GUI Agent Capabilities

What Marvis does

Marvis can see and control desktop applications — not just terminal. Examples:

  • "Open WeChat, find the last message from Mom, tell her I'll be late"
  • "Open my stock trading app, check AAPL price, screenshot the chart"
  • "Find the Windows setting that disables lock screen ads" (uses Windows API, not clicking around blindly)

Implementation: Windows API direct calls (via Microsoft partnership) + GUI visual recognition for apps without APIs.

Hermes current state

  • ✅ Excellent terminal/CLI tools
  • ✅ Browser tools for web interaction
  • vision_analyze for image understanding
  • ❌ Cannot interact with desktop GUI applications
  • ❌ Cannot see what's on the user's screen
  • ✅ #29379 tracking Canvas Mode (GUI direction already under discussion)

Concrete proposal (phased)

Phase 1 — Screen Capture + Click/Type (low effort, WSL-compatible):

Add a desktop toolset with minimal primitives:

# New tools (Linux via xdotool/ydotool, Windows via existing Win API)
@register_tool(toolset="desktop", risk_level="medium")
def desktop_screenshot(region: str = "full") -> str:
    """Capture screen and return path for vision analysis. 
    region: 'full' | 'active_window' | 'selection'"""
    # Linux: import -window root /tmp/screenshot.png
    # Windows: existing win32 API
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_click(x: int, y: int, button: str = "left") -> str:
    """Click at screen coordinates."""
    # Linux: xdotool mousemove X Y click 1
    # Windows: SetCursorPos + mouse_event
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_type(text: str) -> str:
    """Type text at current focus."""
    # Linux: xdotool type "..."
    pass

@register_tool(toolset="desktop", risk_level="low")
def desktop_list_windows() -> list:
    """List all open windows with titles and positions."""
    # Linux: wmctrl -l
    # Windows: EnumWindows
    pass

This alone would enable use cases like:

  • "Take a screenshot of this error dialog and explain what's wrong"
  • "Fill out this form in my browser" (Agent sees the screen, identifies fields, types)
  • "Close all Slack windows"

Phase 2 — OS API Integration (higher effort, platform-specific):

# Dedicated platform adapters (similar to terminal backends)
@register_tool(toolset="desktop")
def system_setting_get(key: str) -> str:
    """Read a system setting by name."""
    # Windows: registry or Settings app API
    # Linux: gsettings / dconf
    # macOS: defaults read
    pass

@register_tool(toolset="desktop")
def file_search_semantic(query: str, path: str = "~") -> list:
    """Find files by content description, not just filename.
    'the PDF invoice from last month' → finds the right file"""
    # Uses local embedding model to index files
    pass

WSL consideration: Since many Hermes users (especially developers on Windows) use WSL, the desktop tools should detect the environment and proxy through the Windows host when running in WSL. The existing wsl-windows-interop patterns in Hermes could be extended.


4️⃣ Pattern 4: Tiered Security Classification

What Marvis does

Every operation is classified into 3 risk tiers with automatic enforcement:

TierExamplesHandling
🟢 LowRead files, search, display infoAI auto-executes
🟡 MediumDelete files, modify settings, install softwareAI proposes plan → User must confirm → Execute
🔴 HighPayments, transfers, auth changes, sudo rm -rf /AI forbidden from executing. Must be done manually.

This is NOT just a prompt-level suggestion — it's enforced at the execution layer. Medium-risk operations trigger a "hard check" (L2 hard inquiry) that cannot be bypassed by the AI.

Hermes current state

  • ✅ Terminal confirmation dialogs for dangerous commands (implicit)
  • requires_approval parameter in some tool calls
  • ❌ No systematic risk classification across all tools
  • ❌ Risk level is not visible to the Agent's reasoning
  • ❌ Some dangerous operations can slip through by phrasing

Concrete proposal

Step 1 — Add risk_level to tool registration:

# In tools/registry.py or tool definitions
@register_tool(risk_level="low")
def read_file(path: str, ...): ...

@register_tool(risk_level="medium")
def delete_file(path: str): ...

@register_tool(risk_level="high", human_only=True)
def execute_payment(amount: float, ...): ...

Step 2 — Inject risk rules into system prompt:

# Auto-generated from tool registry, injected into every Agent session
RISK_RULES = """
## Operation Safety Tiers

When using tools, always respect these risk levels:

🟢 LOW risk (auto-execute): read_file, search, grep, list, stat, cat
   → Execute immediately, no confirmation needed

🟡 MEDIUM risk (confirm first): delete, move, write, install, systemctl, chmod
   → Before executing: explain what you're about to do and wait for confirmation
   → NEVER batch medium-risk operations without individual confirmation

🔴 HIGH risk (FORBIDDEN): payments, sudo destructive, auth changes, rm -rf /
   → DO NOT execute these under any circumstances
   → Tell the user to perform these operations themselves
"""

Step 3 — Enforce at execution layer (not just prompt):

# In model_tools.py handle_function_call()
def handle_function_call(tool_name: str, args: dict, task_id: str):
    risk = TOOL_RISK_LEVELS.get(tool_name, "medium")  # default medium
    
    if risk == "high" and TOOL_HUMAN_ONLY.get(tool_name, False):
        return {"error": "This operation requires human execution. Refusing."}
    
    if risk == "medium" and not confirmation_granted(tool_name, args, task_id):
        return {"requires_confirmation": True, "plan": f"About to run {tool_name} with {args}"}
    
    return execute_tool(tool_name, args)

What NOT to Copy from Marvis

These are intentional anti-patterns that Hermes should avoid:

Marvis weaknessWhy Hermes should NOT adopt it
Closed ecosystem — cross-device relies on proprietary Tencent app engineHermes' MCP + A2A + multi-platform gateway is the right open approach
Non-extensible — users cannot create custom Skills/workflowsHermes' Skill system is a core differentiator; keep it flexible
Model lock-in — local = Qwen only, cloud = Hunyuan/DS onlyHermes' provider-agnostic design is a major strength
No long-running autonomy — no cron, no idle loops, no overnight tasksHermes' cron system + background processes are already superior
Search quality issues — early users report poor retrieval accuracyHermes' multi-search backends (anysearch, browser, etc.) provide better flexibility

Related Existing Issues

  • #29379 — Native Canvas Mode (GUI/visual direction already being discussed)
  • #9459 — Agent profiles for delegate_task (pre-configured roles)
  • #514 — A2A Protocol Support (Agent-to-Agent standard, complementary)
  • #11922 — Multi-agent communication & per-channel persona
  • #31684 — compress_context (local pre-processing overlaps with Pattern 2)
  • #25545 — Skill Orchestration / Workflow Composition (pre-built workflows)

Why Now

Marvis launched 5 days ago. It's the first consumer product that proves:

  1. Multi-Agent architecture is not academic — 6 agents working in parallel, on consumer hardware, for non-technical users
  2. OS-level AI middleware is viable — users are willing to let AI control their desktop if safety guarantees are clear
  3. Cloud-local routing saves real money — Tencent's free 10M token/day is only sustainable because >60% of processing stays local
  4. GUI Agent is the natural extension of Code Agent — the jump from "AI that writes code" to "AI that operates your computer" is smaller than it seems

Hermes already has better infrastructure than Marvis in many ways (MCP, multi-model, cron, Skills, platforms gateway). What's missing is:

  1. Default configurations that make the infrastructure accessible (Pattern 1)
  2. Intelligence layer that routes between capabilities (Pattern 2)
  3. Desktop reach beyond the terminal (Pattern 3)
  4. Safety guardrails that make desktop control trustworthy (Pattern 4)

None of these require a rewrite — they're incremental additions to existing architecture.


Implementation Priority (my recommendation)

PriorityPatternEffortImpactDependency
🥇 P0Pattern 4: Security Tiers~200 LOCSafety baseline for all other patternsNone
🥈 P1Pattern 1: Pre-built Agent Profiles~500 LOCUnlocks delegation UX#9459
🥉 P2Pattern 2: Cloud-Local Routing~800 LOCCost optimizationPattern 1 for routing targets
P3Pattern 3: Desktop GUI Agent~2000+ LOCMajor differentiationPattern 4 for safety

Security first (Pattern 4), then pre-built agents (Pattern 1), then smart routing (Pattern 2), then desktop reach (Pattern 3). Each builds on the previous.


Researched and drafted by a Hermes user who also runs OpenClaw. Happy to help with testing or refining any of these ideas.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING