hermes - 💡(How to fix) Fix Feature Request: Skill list bloat causes massive context window inflation — need vector-based skill routing or lazy loading

 Problem                                                                                                                                                                            
                                                                                                                                                                                    
 The current skill system injects ALL installed skills (name + category + description) into the system prompt on every turn. With the default skill hub, users easily               
 accumulate 200+ skills, which adds ~10-15K tokens to every single API call.                                                                                                        
                                                                                                                                                                                    
 For reference, my setup has 243 skills. The <available_skills> block alone consumes a significant portion of the context window, even though most skills are never used in a       
 given conversation.                                                                                                                                                                
 Current Architecture                                                                                                                                                               
                                                                                                                                                                                    
 prompt_builder.py build_skills_system_prompt() scans all skill directories and builds a complete index:                                                                            
                                                                                                                                                                                    
 <available_skills>                                                                                                                                                                 
   category: description                                                                                                                                                            
     - skill_name: description...                                                                                                                                                   
     - skill_name: description...                                                                                                                                                   
   (repeated 200+ times)                                                                                                                                                            
 </available_skills>                                                                                                                                                                
                                                                                                                                                                                    
 This is injected into the system prompt unconditionally on every turn.                                                                                                             
 Proposed Solutions                                                                                                                                                                 
 Option A: Vector-based skill routing (preferred)                                                                                                                                   
                                                                                                                                                                                    
 Before building the system prompt, use semantic search (e.g., Hindsight pgvector) to retrieve only the top-K most relevant skills based on the user's message. This would:         
                                                                                                                                                                                    
 - Reduce context by 60-80% for typical conversations                                                                                                                               
 - Improve skill discovery quality (semantic match > keyword scan)                                                                                                                  
 - Be backward compatible (still load all skills as fallback if retrieval fails)                                                                                                    
                                                                                                                                                                                    
 Implementation sketch:                                                                                                                                                             
 1. Build a vector index of all skill_name + description + category                                                                                                                 
 2. In build_skills_system_prompt(), accept user_message parameter                                                                                                                  
 3. Query the index, return only top-K matching skills                                                                                                                              
 4. Add config option: skills.retrieval_mode: vector | all                                                                                                                          
 Option B: Lazy skill loading                                                                                                                                                       
                                                                                                                                                                                    
 Only inject skill names (no descriptions) into the system prompt. When the agent decides to load a skill via skill_view(), fetch the full content then. This is similar to how     
 OpenClaw handles MCP tools.                                                                                                                                                        
 Option C: Skill grouping with config-based filtering                                                                                                                               
                                                                                                                                                                                    
 Add skills.groups in config.yaml to define logical groups, and skills.active_groups to control which groups are visible. Users can toggle groups on/off without modifying          
 individual skills.                                                                                                                                                                 
                                                                                                                                                                                    
 yaml                                                                                                                                                                               
 skills:                                                                                                                                                                            
   groups:                                                                                                                                                                          
     finance: [buffett-search, financial-meeting-summary, ...]                                                                                                                      
     devops: [sglang-install, vllm-operations, ...]                                                                                                                                 
     creative: [comfyui, ascii-art, ...]                                                                                                                                            
   active_groups: [finance, devops]  # Only these appear in system prompt                                                                                                           
                                                                                                                                                                                    
 Workaround                                                                                                                                                                         
                                                                                                                                                                                    
 Currently, users can manually disable skills via skills.disabled in config.yaml, but this is tedious and error-prone with 200+ skills. I had to manually categorize and            
 disable 90 skills I never use (gaming, smart home, social media, etc.) just to reduce context by ~37%.                                                                             
 Impact                                                                                                                                                                             
                                                                                                                                                                                    
 - Token cost: Every conversation carries ~10-15K tokens of skill metadata that's mostly unused                                                                                     
 - Latency: Larger context = slower model inference                                                                                                                                 
 - Quality: More noise in system prompt can dilute instruction following                                                                                                            
                                                                                                                                                                                    
 This is especially painful for local LLM setups where context window and token cost matter more.
Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Feature Request: Skill list bloat causes massive context window inflation — need vector-based skill routing or lazy loading

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Feature Request: Skill list bloat causes massive context window inflation — need vector-based skill routing or lazy loading

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Still need to ship something?

RELATED_DISCOVERY

TRENDING