hermes - 💡(How to fix) Fix RFC: On-demand tool/skill/MCP discovery — decouple schema registration from process lifecycle

Root Cause

The architecture tightly couples two concerns that should be separate:

Schema registration — telling the LLM "these capabilities exist and here's how to call them"
Process lifecycle — keeping server processes/connections alive

These are conflated for MCP servers (start → connect → discover → keep-alive), and for built-in tools/skills they're conflated via the system prompt (register all schemas upfront because there's no on-demand discovery mechanism).

Fix Action

Fix / Workaround

Concerns & Mitigations

Concern	Mitigation
LLM might not discover tools it needs	LLMs are good at asking; the index makes ALL tools visible by name+description
Cold-start latency on first MCP tool call	Acceptable trade-off — happens once per MCP server per session, vs every startup
LLM might call tool_get_schema too often	Cache schema in context after first fetch
Stateful MCP servers (memory, databases)	Always-on flag per-server config: `keepalive: true`
Prompt cache invalidation	Dynamic tools are injected mid-session, same as existing `/reload-mcp` pattern

Problem

Currently, Hermes eagerly loads everything at startup:

All built-in tools — tool schemas for all 26 CLI tools are baked into the system prompt (~12K tokens)
All MCP servers — 7+ servers are started as subprocesses and kept alive for the session duration, each contributing tool schemas (77 tools, ~25K tokens)
All installed skills — 24 skill directories are scanned and their content is available for injection

This means a first API call carries ~37K tokens of tool definitions alone, before any actual conversation context. With models like deepseek-v4-flash, processing this many input tokens adds 5-10 seconds to every first response, even for trivial queries.

The real cost: in a typical session, the LLM uses 3-5 tools out of 103 registered, and 1-2 skills out of 24 available. The other 95% of schema content is dead weight on every LLM call.

Root Cause

The architecture tightly couples two concerns that should be separate:

Schema registration — telling the LLM "these capabilities exist and here's how to call them"
Process lifecycle — keeping server processes/connections alive

Proposal: Capability Registry with On-Demand Loading

Core idea

Replace the current "register everything at startup" model with a two-phase discovery pattern:

Phase 1 — Index (startup, fast): ``` For each MCP server: connect → tools/list → cache schema → disconnect (or keep minimal heartbeat) register tool NAME + DESCRIPTION in a lightweight index

For built-in tools: register tool NAME + DESCRIPTION in the same index

For skills: register skill NAME + DESCRIPTION in an index ```

The LLM gets a compact index (~200 tokens) of available capabilities — just names and one-line descriptions, not full JSON schemas. It can then request the full schema for specific tools it decides to use.

Phase 2 — On-demand (when LLM decides to use a capability): ``` LLM: calls tool_get_schema("mcp_zhihu_search_content") Agent: returns full JSON schema for that tool then if the LLM calls it, starts zhihu MCP server → forwards the tool call

LLM: calls skill_load("github-code-review") Agent: reads SKILL.md content and injects it into conversation context ```

Benefits

Aspect	Current	Proposed
Startup time	~12s (20s before zhihu fix)	~2s (only index, no MCP process start)
Input tokens on first call	~37K tool schemas	~200 (compact index)
MCP server processes	7 running all session	0-1 running at a time (on-demand)
Session memory pressure	100+ tool schemas in context	Only schemas for tools LLM actually uses
Skill loading	All scanned at startup, injected when loaded	Index only, loaded on skill_get_content call

Design Sketch

New built-in tool: `tool_get_schema(tool_name: str)` Returns the full OpenAI-format JSON schema for a specific tool by name. The LLM calls this when it sees a promising tool in the index and wants the full parameters.

On-demand MCP server start: ```python async def handle_mcp_tool_call(server_name, tool_name, args): if server_name not in _servers: # Cold start — connect on first use server = await _connect_server(server_name, config) _servers[server_name] = server return await server.session.call_tool(tool_name, arguments=args) ```

Concerns & Mitigations

Concern	Mitigation
LLM might not discover tools it needs	LLMs are good at asking; the index makes ALL tools visible by name+description
Cold-start latency on first MCP tool call	Acceptable trade-off — happens once per MCP server per session, vs every startup
LLM might call tool_get_schema too often	Cache schema in context after first fetch
Stateful MCP servers (memory, databases)	Always-on flag per-server config: `keepalive: true`
Prompt cache invalidation	Dynamic tools are injected mid-session, same as existing `/reload-mcp` pattern

Migration Path

Phase 0 (now): Add `tool_get_schema` built-in tool alongside existing tools
Phase 1: MCP servers start in "probe" mode by default (connect → index → disconnect), controllable via `mcp_servers.<name>.mode: probe` (default) vs `keepalive`
Phase 2: Build tool index into the system prompt instead of full schemas
Phase 3: Extend the same pattern to skills (index in system prompt, load on `skill_get_content`)

Each phase is independently shippable and valuable on its own.

Existing issue #30754 (skill progressive disclosure — complementary; reduces skill token waste)
Slash command `/reload-mcp` (existing hot-reload mechanism)
`tools/mcp_tool.py` — current eager-connect implementation

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix RFC: On-demand tool/skill/MCP discovery — decouple schema registration from process lifecycle

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Concerns & Mitigations

Problem

Root Cause

Proposal: Capability Registry with On-Demand Loading

Core idea

Benefits

Design Sketch

Concerns & Mitigations

Migration Path

Related

Still need to ship something?

TRENDING