hermes - 💡(How to fix) Fix [Feature Proposal] 4 Design Patterns from Tencent Marvis: Pre-built Agent Profiles, Cloud-Local Routing, Desktop GUI Agent, Tiered Security

hermes2026-05-25 02:15:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Tencent launched Marvis on May 20, 2026 — an OS-level AI assistant that embeds between the user and the operating system. After studying its architecture in depth, I identified 4 design patterns that could significantly improve Hermes without requiring a rewrite. This issue maps each pattern to Hermes' existing architecture with concrete implementation suggestions.

Note for international team: Marvis is a China-market product (QQ login required, Chinese UI). I've extracted all the relevant architecture details below so you don't need to register or download it. The value is in the design patterns, not the product itself.

Error Message

"Take a screenshot of this error dialog and explain what's wrong" return {"error": "This operation requires human execution. Refusing."}

Root Cause

Multi-Agent architecture is not academic — 6 agents working in parallel, on consumer hardware, for non-technical users
OS-level AI middleware is viable — users are willing to let AI control their desktop if safety guarantees are clear
Cloud-local routing saves real money — Tencent's free 10M token/day is only sustainable because >60% of processing stays local
GUI Agent is the natural extension of Code Agent — the jump from "AI that writes code" to "AI that operates your computer" is smaller than it seems

Fix Action

Fix / Workaround

User Natural Language
       ↓
┌─────────────────┐
│  PM Agent        │  Understands intent, decomposes tasks, parallel dispatches
│  (Orchestrator)  │  Powered by Hunyuan + DeepSeek V4 (cloud)
└──┬──┬──┬──┬──┬──┘
   │  │  │  │  │
   ↓  ↓  ↓  ↓  ↓
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ File │ │System│ │ App  │ │Browser│ │Search│
│Agent │ │Agent │ │Agent │ │Agent  │ │Agent │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘

The PM Agent uses a structured task dispatch protocol — not just forwarding the user's raw text, but packaging it with context, history, dependencies, and expected output schema.

Code Example

Traditional AI:    User → Chat Window → Text Output
Marvis:            User → Natural Language → OS APIs → Files / Settings / Apps / Hardware

---

User Natural Language
       ↓
┌─────────────────┐
│  PM Agent        │  Understands intent, decomposes tasks, parallel dispatches
│  (Orchestrator)  │  Powered by Hunyuan + DeepSeek V4 (cloud)
└──┬──┬──┬──┬──┬──┘
   │  │  │  │  │
   ↓  ↓  ↓  ↓  ↓
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ File │ │System│ │ App  │ │Browser│ │Search│
│Agent │ │Agent │ │Agent │ │Agent  │ │Agent │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘

---

# ~/.hermes/agent_profiles.yaml
profiles:
  file-agent:
    description: "File search, read, write, convert, OCR"
    toolsets: [terminal, file, vision]
    system_prompt: |
      You are a file specialist. Your job is to locate, read, and process files.
      - Search by content, not just filename
      - For images: use vision tools to describe content
      - Always return absolute file paths in results
    preferred_model: "deepseek-v4-flash"  # cheaper for simple file ops
    
  system-agent:
    description: "System diagnostics, settings, cleanup"
    toolsets: [terminal]
    system_prompt: |
      You are a system operations specialist.
      - Diagnose system issues (disk, memory, network, processes)
      - NEVER run destructive commands (rm -rf, format, dd) without explicit confirmation
      - Prefer read-only diagnostics first
    risk_level: medium
    
  browser-agent:
    description: "Web interaction, scraping, form automation"
    toolsets: [browser]
    system_prompt: |
      You are a web interaction specialist.
      - Navigate, extract, fill forms, click buttons
      - Always return the URL you're on and what you found
    
  search-agent:
    description: "Web search and information aggregation"
    toolsets: [web]
    system_prompt: |
      You are a search and research specialist.
      - Search broadly first, then drill down
      - Always cite sources with URLs
      - Synthesize findings into a structured summary

---

# Current:
delegate_task(goal="find all invoices", toolsets=["terminal", "file"])

# Proposed:
delegate_task(goal="find all invoices", profile="file-agent")
# Resolves to: toolsets + system_prompt + model from agent_profiles.yaml

---

# In run_agent.py, before delegation:
task_type = classify_task(user_message)  # "file_ops" | "system_ops" | "web_search" | ...
if task_type in agent_profiles:
    delegate_task(goal=user_message, profile=task_type)

---

User Input → PM Agent analyzes:
  ├─ "Organize my invoices"        → Cloud intent understanding + Local file search
  ├─ "Review this contract's risk" → Pure local, no cloud upload (privacy)
  └─ "What's the weather today"    → Cloud search

---

# ~/.hermes/routing.yaml
routing:
  enabled: true
  default_model: "deepseek-v4-pro"       # cloud, for general use
  
  local_models:
    primary: "qwen2.5-7b-local"          # via ollama or similar
    fallback: "deepseek-v4-flash"        # cheap cloud if local unavailable
  
  rules:
    - name: "privacy-sensitive"
      patterns:
        keywords: ["contract", "financial", "passport", "password", "confidential", "NDA"]
        file_paths: ["~/Documents/finance/", "~/Desktop/tax/"]
      route_to: local
      reason: "contains sensitive data"
      
    - name: "simple-file-ops"
      patterns:
        intents: ["read_file", "list_directory", "find_file", "file_stats"]
      route_to: local
      reason: "simple operation, no cloud needed"
      
    - name: "web-search"
      patterns:
        intents: ["web_search", "browser_navigate"]
      route_to: cloud
      reason: "requires internet access"
      
    - name: "complex-reasoning"
      patterns:
        keywords: ["explain", "analyze", "compare", "refactor", "design"]
        min_complexity: 0.7  # heuristic score
      route_to: cloud
      reason: "needs large model reasoning"

---

# Pseudo-code for where routing would hook in:
def select_model(self, user_message: str, context: dict) -> str:
    if not self.routing_enabled:
        return self.default_model
    
    task_type = self.classify_task(user_message)
    sensitivity = self.assess_sensitivity(user_message, context)
    
    if sensitivity == "high" or task_type in ["simple_file_ops"]:
        return self.route_to_local()
    elif task_type in ["web_search", "complex_reasoning"]:
        return self.route_to_cloud()
    
    return self.default_model

---

# New tools (Linux via xdotool/ydotool, Windows via existing Win API)
@register_tool(toolset="desktop", risk_level="medium")
def desktop_screenshot(region: str = "full") -> str:
    """Capture screen and return path for vision analysis. 
    region: 'full' | 'active_window' | 'selection'"""
    # Linux: import -window root /tmp/screenshot.png
    # Windows: existing win32 API
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_click(x: int, y: int, button: str = "left") -> str:
    """Click at screen coordinates."""
    # Linux: xdotool mousemove X Y click 1
    # Windows: SetCursorPos + mouse_event
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_type(text: str) -> str:
    """Type text at current focus."""
    # Linux: xdotool type "..."
    pass

@register_tool(toolset="desktop", risk_level="low")
def desktop_list_windows() -> list:
    """List all open windows with titles and positions."""
    # Linux: wmctrl -l
    # Windows: EnumWindows
    pass

---

# Dedicated platform adapters (similar to terminal backends)
@register_tool(toolset="desktop")
def system_setting_get(key: str) -> str:
    """Read a system setting by name."""
    # Windows: registry or Settings app API
    # Linux: gsettings / dconf
    # macOS: defaults read
    pass

@register_tool(toolset="desktop")
def file_search_semantic(query: str, path: str = "~") -> list:
    """Find files by content description, not just filename.
    'the PDF invoice from last month' → finds the right file"""
    # Uses local embedding model to index files
    pass

---

# In tools/registry.py or tool definitions
@register_tool(risk_level="low")
def read_file(path: str, ...): ...

@register_tool(risk_level="medium")
def delete_file(path: str): ...

@register_tool(risk_level="high", human_only=True)
def execute_payment(amount: float, ...): ...

---

# Auto-generated from tool registry, injected into every Agent session
RISK_RULES = """
## Operation Safety Tiers

When using tools, always respect these risk levels:

🟢 LOW risk (auto-execute): read_file, search, grep, list, stat, cat
   → Execute immediately, no confirmation needed

🟡 MEDIUM risk (confirm first): delete, move, write, install, systemctl, chmod
   → Before executing: explain what you're about to do and wait for confirmation
   → NEVER batch medium-risk operations without individual confirmation

🔴 HIGH risk (FORBIDDEN): payments, sudo destructive, auth changes, rm -rf /
   → DO NOT execute these under any circumstances
   → Tell the user to perform these operations themselves
"""

---

# In model_tools.py handle_function_call()
def handle_function_call(tool_name: str, args: dict, task_id: str):
    risk = TOOL_RISK_LEVELS.get(tool_name, "medium")  # default medium
    
    if risk == "high" and TOOL_HUMAN_ONLY.get(tool_name, False):
        return {"error": "This operation requires human execution. Refusing."}
    
    if risk == "medium" and not confirmation_granted(tool_name, args, task_id):
        return {"requires_confirmation": True, "plan": f"About to run {tool_name} with {args}"}
    
    return execute_tool(tool_name, args)

RAW_BUFFERClick to expand / collapse

Summary

Note for international team: Marvis is a China-market product (QQ login required, Chinese UI). I've extracted all the relevant architecture details below so you don't need to register or download it. The value is in the design patterns, not the product itself.

What is Marvis? (Quick Context)

Marvis is not a chatbot. It's an AI middleware layer sitting between the user and the OS:

Traditional AI:    User → Chat Window → Text Output
Marvis:            User → Natural Language → OS APIs → Files / Settings / Apps / Hardware

Key stats (for context, not to copy):

Built by Tencent's App Store team (PC ecosystem veterans)
Windows + Mac + Android (iOS coming), cross-device sync
Free tier: 10 million tokens/day (Tencent eats the cloud cost)
Has strategic partnership with Microsoft for Windows API access
Launched May 20, 2026 — less than a week ago, already gaining traction

The 4 Design Patterns Hermes Should Adopt

🥇 Pattern 1: Pre-Configured Specialized Sub-Agents ("1+5 Architecture")

What Marvis does

Ships with 1 PM Agent (orchestrator) + 5 specialist agents pre-configured out of the box:

User Natural Language
       ↓
┌─────────────────┐
│  PM Agent        │  Understands intent, decomposes tasks, parallel dispatches
│  (Orchestrator)  │  Powered by Hunyuan + DeepSeek V4 (cloud)
└──┬──┬──┬──┬──┬──┘
   │  │  │  │  │
   ↓  ↓  ↓  ↓  ↓
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ File │ │System│ │ App  │ │Browser│ │Search│
│Agent │ │Agent │ │Agent │ │Agent  │ │Agent │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘

Each specialist has a fixed, narrow scope:

Agent	Scope	Implementation
File Agent	File search/read/write/convert, image content search, OCR	Local file index + semantic search
Computer Agent	System settings, hardware diagnostics, cleanup	Windows API direct calls (not simulated clicks)
App Agent	GUI control of desktop + Android apps	Visual recognition + simulated input
Browser Agent	Web interaction, data scraping, form filling	Browser takeover + DOM parsing
Search Agent	Web search + information aggregation	Search engine API calls

The PM Agent uses a structured task dispatch protocol — not just forwarding the user's raw text, but packaging it with context, history, dependencies, and expected output schema.

Hermes current state

✅ Has delegate_task for spawning sub-agents
✅ Has toolsets for scoping sub-agent capabilities
✅ Has #9459 tracking "agent profiles for delegate_task"
❌ No pre-configured specialist agents — every delegation requires manual toolset/prompt specification
❌ Sub-agents are generalists by default, not specialists by design

Concrete proposal

Step 1 — Define agent profiles in config:

# ~/.hermes/agent_profiles.yaml
profiles:
  file-agent:
    description: "File search, read, write, convert, OCR"
    toolsets: [terminal, file, vision]
    system_prompt: |
      You are a file specialist. Your job is to locate, read, and process files.
      - Search by content, not just filename
      - For images: use vision tools to describe content
      - Always return absolute file paths in results
    preferred_model: "deepseek-v4-flash"  # cheaper for simple file ops
    
  system-agent:
    description: "System diagnostics, settings, cleanup"
    toolsets: [terminal]
    system_prompt: |
      You are a system operations specialist.
      - Diagnose system issues (disk, memory, network, processes)
      - NEVER run destructive commands (rm -rf, format, dd) without explicit confirmation
      - Prefer read-only diagnostics first
    risk_level: medium
    
  browser-agent:
    description: "Web interaction, scraping, form automation"
    toolsets: [browser]
    system_prompt: |
      You are a web interaction specialist.
      - Navigate, extract, fill forms, click buttons
      - Always return the URL you're on and what you found
    
  search-agent:
    description: "Web search and information aggregation"
    toolsets: [web]
    system_prompt: |
      You are a search and research specialist.
      - Search broadly first, then drill down
      - Always cite sources with URLs
      - Synthesize findings into a structured summary

Step 2 — Extend delegate_task to accept profile names:

# Current:
delegate_task(goal="find all invoices", toolsets=["terminal", "file"])

# Proposed:
delegate_task(goal="find all invoices", profile="file-agent")
# Resolves to: toolsets + system_prompt + model from agent_profiles.yaml

Step 3 — PM Agent auto-routing (future, optional):

The main Agent could auto-detect task type and route to the right specialist:

# In run_agent.py, before delegation:
task_type = classify_task(user_message)  # "file_ops" | "system_ops" | "web_search" | ...
if task_type in agent_profiles:
    delegate_task(goal=user_message, profile=task_type)

This is the lowest-hanging fruit — it leverages existing infrastructure and is tracked in #9459.

🥈 Pattern 2: Cloud-Local Auto Routing

What Marvis does

Marvis doesn't make users manually choose between cloud and local models. It auto-routes based on task characteristics:

User Input → PM Agent analyzes:
  ├─ "Organize my invoices"        → Cloud intent understanding + Local file search
  ├─ "Review this contract's risk" → Pure local, no cloud upload (privacy)
  └─ "What's the weather today"    → Cloud search

The routing logic:

Factor	Routes to Cloud	Routes to Local
Task complexity	Multi-step planning, ambiguous intent	Simple, well-defined operations
Data sensitivity	Public information	Personal files, contracts, financial data
Connectivity requirement	Web search, API calls	File ops, system settings
Cost sensitivity	Complex reasoning needs big models	Simple tasks waste cloud tokens

The key insight: Marvis does heavy pre-processing locally (file indexing, image OCR, text extraction) BEFORE sending anything to the cloud. This means cloud models get structured, pre-digested input instead of raw data — cutting token usage by 60-80%.

Hermes current state

✅ Supports multiple model providers
✅ /model command for manual switching
❌ One model per session — no runtime routing
❌ All tool output goes to the cloud model regardless of sensitivity
❌ No local pre-processing pipeline

Concrete proposal

Step 1 — Task classifier (lightweight, rule-based first):

# ~/.hermes/routing.yaml
routing:
  enabled: true
  default_model: "deepseek-v4-pro"       # cloud, for general use
  
  local_models:
    primary: "qwen2.5-7b-local"          # via ollama or similar
    fallback: "deepseek-v4-flash"        # cheap cloud if local unavailable
  
  rules:
    - name: "privacy-sensitive"
      patterns:
        keywords: ["contract", "financial", "passport", "password", "confidential", "NDA"]
        file_paths: ["~/Documents/finance/", "~/Desktop/tax/"]
      route_to: local
      reason: "contains sensitive data"
      
    - name: "simple-file-ops"
      patterns:
        intents: ["read_file", "list_directory", "find_file", "file_stats"]
      route_to: local
      reason: "simple operation, no cloud needed"
      
    - name: "web-search"
      patterns:
        intents: ["web_search", "browser_navigate"]
      route_to: cloud
      reason: "requires internet access"
      
    - name: "complex-reasoning"
      patterns:
        keywords: ["explain", "analyze", "compare", "refactor", "design"]
        min_complexity: 0.7  # heuristic score
      route_to: cloud
      reason: "needs large model reasoning"

Step 2 — Local pre-processing pipeline:

Before sending context to the cloud model, run a lightweight local model to:

Extract key information from tool outputs (summarize large file contents)
Classify task sensitivity
Structure raw data (JSON-ify free text)

This mirrors what Hermes already does with context compression (#31684), but using a local model instead of just truncation.

Step 3 — Integration point in run_agent.py:

# Pseudo-code for where routing would hook in:
def select_model(self, user_message: str, context: dict) -> str:
    if not self.routing_enabled:
        return self.default_model
    
    task_type = self.classify_task(user_message)
    sensitivity = self.assess_sensitivity(user_message, context)
    
    if sensitivity == "high" or task_type in ["simple_file_ops"]:
        return self.route_to_local()
    elif task_type in ["web_search", "complex_reasoning"]:
        return self.route_to_cloud()
    
    return self.default_model

🥉 Pattern 3: Desktop/GUI Agent Capabilities

What Marvis does

Marvis can see and control desktop applications — not just terminal. Examples:

"Open WeChat, find the last message from Mom, tell her I'll be late"
"Open my stock trading app, check AAPL price, screenshot the chart"
"Find the Windows setting that disables lock screen ads" (uses Windows API, not clicking around blindly)

Implementation: Windows API direct calls (via Microsoft partnership) + GUI visual recognition for apps without APIs.

Hermes current state

✅ Excellent terminal/CLI tools
✅ Browser tools for web interaction
✅ vision_analyze for image understanding
❌ Cannot interact with desktop GUI applications
❌ Cannot see what's on the user's screen
✅ #29379 tracking Canvas Mode (GUI direction already under discussion)

Concrete proposal (phased)

Phase 1 — Screen Capture + Click/Type (low effort, WSL-compatible):

Add a desktop toolset with minimal primitives:

# New tools (Linux via xdotool/ydotool, Windows via existing Win API)
@register_tool(toolset="desktop", risk_level="medium")
def desktop_screenshot(region: str = "full") -> str:
    """Capture screen and return path for vision analysis. 
    region: 'full' | 'active_window' | 'selection'"""
    # Linux: import -window root /tmp/screenshot.png
    # Windows: existing win32 API
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_click(x: int, y: int, button: str = "left") -> str:
    """Click at screen coordinates."""
    # Linux: xdotool mousemove X Y click 1
    # Windows: SetCursorPos + mouse_event
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_type(text: str) -> str:
    """Type text at current focus."""
    # Linux: xdotool type "..."
    pass

@register_tool(toolset="desktop", risk_level="low")
def desktop_list_windows() -> list:
    """List all open windows with titles and positions."""
    # Linux: wmctrl -l
    # Windows: EnumWindows
    pass

This alone would enable use cases like:

"Take a screenshot of this error dialog and explain what's wrong"
"Fill out this form in my browser" (Agent sees the screen, identifies fields, types)
"Close all Slack windows"

Phase 2 — OS API Integration (higher effort, platform-specific):

# Dedicated platform adapters (similar to terminal backends)
@register_tool(toolset="desktop")
def system_setting_get(key: str) -> str:
    """Read a system setting by name."""
    # Windows: registry or Settings app API
    # Linux: gsettings / dconf
    # macOS: defaults read
    pass

@register_tool(toolset="desktop")
def file_search_semantic(query: str, path: str = "~") -> list:
    """Find files by content description, not just filename.
    'the PDF invoice from last month' → finds the right file"""
    # Uses local embedding model to index files
    pass

WSL consideration: Since many Hermes users (especially developers on Windows) use WSL, the desktop tools should detect the environment and proxy through the Windows host when running in WSL. The existing wsl-windows-interop patterns in Hermes could be extended.

4️⃣ Pattern 4: Tiered Security Classification

What Marvis does

Every operation is classified into 3 risk tiers with automatic enforcement:

Tier	Examples	Handling
🟢 Low	Read files, search, display info	AI auto-executes
🟡 Medium	Delete files, modify settings, install software	AI proposes plan → User must confirm → Execute
🔴 High	Payments, transfers, auth changes, `sudo rm -rf /`	AI forbidden from executing. Must be done manually.

This is NOT just a prompt-level suggestion — it's enforced at the execution layer. Medium-risk operations trigger a "hard check" (L2 hard inquiry) that cannot be bypassed by the AI.

Hermes current state

✅ Terminal confirmation dialogs for dangerous commands (implicit)
✅ requires_approval parameter in some tool calls
❌ No systematic risk classification across all tools
❌ Risk level is not visible to the Agent's reasoning
❌ Some dangerous operations can slip through by phrasing

Concrete proposal

Step 1 — Add risk_level to tool registration:

# In tools/registry.py or tool definitions
@register_tool(risk_level="low")
def read_file(path: str, ...): ...

@register_tool(risk_level="medium")
def delete_file(path: str): ...

@register_tool(risk_level="high", human_only=True)
def execute_payment(amount: float, ...): ...

Step 2 — Inject risk rules into system prompt:

# Auto-generated from tool registry, injected into every Agent session
RISK_RULES = """
## Operation Safety Tiers

When using tools, always respect these risk levels:

🟢 LOW risk (auto-execute): read_file, search, grep, list, stat, cat
   → Execute immediately, no confirmation needed

🟡 MEDIUM risk (confirm first): delete, move, write, install, systemctl, chmod
   → Before executing: explain what you're about to do and wait for confirmation
   → NEVER batch medium-risk operations without individual confirmation

🔴 HIGH risk (FORBIDDEN): payments, sudo destructive, auth changes, rm -rf /
   → DO NOT execute these under any circumstances
   → Tell the user to perform these operations themselves
"""

Step 3 — Enforce at execution layer (not just prompt):

# In model_tools.py handle_function_call()
def handle_function_call(tool_name: str, args: dict, task_id: str):
    risk = TOOL_RISK_LEVELS.get(tool_name, "medium")  # default medium
    
    if risk == "high" and TOOL_HUMAN_ONLY.get(tool_name, False):
        return {"error": "This operation requires human execution. Refusing."}
    
    if risk == "medium" and not confirmation_granted(tool_name, args, task_id):
        return {"requires_confirmation": True, "plan": f"About to run {tool_name} with {args}"}
    
    return execute_tool(tool_name, args)

What NOT to Copy from Marvis

These are intentional anti-patterns that Hermes should avoid:

Marvis weakness	Why Hermes should NOT adopt it
Closed ecosystem — cross-device relies on proprietary Tencent app engine	Hermes' MCP + A2A + multi-platform gateway is the right open approach
Non-extensible — users cannot create custom Skills/workflows	Hermes' Skill system is a core differentiator; keep it flexible
Model lock-in — local = Qwen only, cloud = Hunyuan/DS only	Hermes' provider-agnostic design is a major strength
No long-running autonomy — no cron, no idle loops, no overnight tasks	Hermes' cron system + background processes are already superior
Search quality issues — early users report poor retrieval accuracy	Hermes' multi-search backends (anysearch, browser, etc.) provide better flexibility

Related Existing Issues

#29379 — Native Canvas Mode (GUI/visual direction already being discussed)
#9459 — Agent profiles for delegate_task (pre-configured roles)
#514 — A2A Protocol Support (Agent-to-Agent standard, complementary)
#11922 — Multi-agent communication & per-channel persona
#31684 — compress_context (local pre-processing overlaps with Pattern 2)
#25545 — Skill Orchestration / Workflow Composition (pre-built workflows)

Why Now

Marvis launched 5 days ago. It's the first consumer product that proves:

Multi-Agent architecture is not academic — 6 agents working in parallel, on consumer hardware, for non-technical users
OS-level AI middleware is viable — users are willing to let AI control their desktop if safety guarantees are clear
Cloud-local routing saves real money — Tencent's free 10M token/day is only sustainable because >60% of processing stays local
GUI Agent is the natural extension of Code Agent — the jump from "AI that writes code" to "AI that operates your computer" is smaller than it seems

Hermes already has better infrastructure than Marvis in many ways (MCP, multi-model, cron, Skills, platforms gateway). What's missing is:

Default configurations that make the infrastructure accessible (Pattern 1)
Intelligence layer that routes between capabilities (Pattern 2)
Desktop reach beyond the terminal (Pattern 3)
Safety guardrails that make desktop control trustworthy (Pattern 4)

None of these require a rewrite — they're incremental additions to existing architecture.

Implementation Priority (my recommendation)

Priority	Pattern	Effort	Impact	Dependency
🥇 P0	Pattern 4: Security Tiers	~200 LOC	Safety baseline for all other patterns	None
🥈 P1	Pattern 1: Pre-built Agent Profiles	~500 LOC	Unlocks delegation UX	#9459
🥉 P2	Pattern 2: Cloud-Local Routing	~800 LOC	Cost optimization	Pattern 1 for routing targets
P3	Pattern 3: Desktop GUI Agent	~2000+ LOC	Major differentiation	Pattern 4 for safety

Security first (Pattern 4), then pre-built agents (Pattern 1), then smart routing (Pattern 2), then desktop reach (Pattern 3). Each builds on the previous.

Researched and drafted by a Hermes user who also runs OpenClaw. Happy to help with testing or refining any of these ideas.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - 💡(How to fix) Fix [Feature Proposal] 4 Design Patterns from Tencent Marvis: Pre-built Agent Profiles, Cloud-Local Routing, Desktop GUI Agent, Tiered Security

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

What is Marvis? (Quick Context)

The 4 Design Patterns Hermes Should Adopt

🥇 Pattern 1: Pre-Configured Specialized Sub-Agents ("1+5 Architecture")

What Marvis does

Hermes current state

Concrete proposal

🥈 Pattern 2: Cloud-Local Auto Routing

What Marvis does

Hermes current state

Concrete proposal

🥉 Pattern 3: Desktop/GUI Agent Capabilities

What Marvis does

Hermes current state

Concrete proposal (phased)

4️⃣ Pattern 4: Tiered Security Classification

What Marvis does

Hermes current state

Concrete proposal

What NOT to Copy from Marvis

Related Existing Issues

Why Now

Implementation Priority (my recommendation)

Still need to ship something?

TRENDING