hermes - 💡(How to fix) Fix [Setup]: For some small 2b、4b、8B、14B parameter models, how can the Hermes agent be run offline when the computing power resources are insufficient?

Error Message

01:53:23INFOaiohttp.accessPOST/v1/responses200 ValueError: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong. raise ValueError( File "/usr/local/lib/hermes-agent/run_agent.py", line 2711, in _check_compression_model_feasibility self._check_compression_model_feasibility() File "/usr/local/lib/hermes-agent/run_agent.py", line 2192, in init ^^^^^^^^ agent = AIAgent( File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 830, in _create_agent ^^^^^^^^^^^^^^^^^^^ agent = self._create_agent( File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2575, in _run ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ result = self.fn(*self.args, **self.kwargs) File "/root/.local/share/uv/python/cpython-3.11.15-linux-x86_64-gnu/lib/python3.11/concurrent/futures/thread.py", line 58, in run ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ return await loop.run_in_executor(None, _run) File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2605, in _run_agent ^^^^^^^^^^^^^^^^ result, agent_usage = await agent_task File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 1777, in _write_sse_responses Traceback (most recent call last): 01:53:23ERRORgateway.platforms.api_serverError running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong. 01:53:23INFOagent.auxiliary_clientAuxiliary auto-detect: using main provider custom (qwen3-8b) 01:53:23INFOagent.auxiliary_clientVision auto-detect: using main provider custom (qwen3-8b)

Code Example



---

01:53:23INFOaiohttp.accessPOST/v1/responses200
ValueError: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
raise ValueError(
File "/usr/local/lib/hermes-agent/run_agent.py", line 2711, in _check_compression_model_feasibility
self._check_compression_model_feasibility()
File "/usr/local/lib/hermes-agent/run_agent.py", line 2192, in __init__
^^^^^^^^
agent = AIAgent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 830, in _create_agent
^^^^^^^^^^^^^^^^^^^
agent = self._create_agent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2575, in _run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
result = self.fn(*self.args, **self.kwargs)
File "/root/.local/share/uv/python/cpython-3.11.15-linux-x86_64-gnu/lib/python3.11/concurrent/futures/thread.py", line 58, in run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return await loop.run_in_executor(None, _run)
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2605, in _run_agent
^^^^^^^^^^^^^^^^
result, agent_usage = await agent_task
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 1777, in _write_sse_responses
Traceback (most recent call last):
01:53:23ERRORgateway.platforms.api_serverError running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
01:53:23INFOagent.auxiliary_clientAuxiliary auto-detect: using main provider custom (qwen3-8b)
01:53:23INFOagent.auxiliary_clientVision auto-detect: using main provider custom (qwen3-8b)

What's Going Wrong?

For some small 2b、4b、8B、14B parameter models, how can the Hermes agent be run offline when the computing power resources are insufficient?

Error running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.

Steps Taken

context_length: 65535 max_tokens: 8192

Installation Method

Docker

Operating System

Ubuntu22.04

Python Version

3.11.9

Hermes Version

0.12.0

Debug Report

Full Error Output

01:53:23INFOaiohttp.accessPOST/v1/responses200
ValueError: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
raise ValueError(
File "/usr/local/lib/hermes-agent/run_agent.py", line 2711, in _check_compression_model_feasibility
self._check_compression_model_feasibility()
File "/usr/local/lib/hermes-agent/run_agent.py", line 2192, in __init__
^^^^^^^^
agent = AIAgent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 830, in _create_agent
^^^^^^^^^^^^^^^^^^^
agent = self._create_agent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2575, in _run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
result = self.fn(*self.args, **self.kwargs)
File "/root/.local/share/uv/python/cpython-3.11.15-linux-x86_64-gnu/lib/python3.11/concurrent/futures/thread.py", line 58, in run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return await loop.run_in_executor(None, _run)
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2605, in _run_agent
^^^^^^^^^^^^^^^^
result, agent_usage = await agent_task
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 1777, in _write_sse_responses
Traceback (most recent call last):
01:53:23ERRORgateway.platforms.api_serverError running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
01:53:23INFOagent.auxiliary_clientAuxiliary auto-detect: using main provider custom (qwen3-8b)
01:53:23INFOagent.auxiliary_clientVision auto-detect: using main provider custom (qwen3-8b)

What I've Already Tried

context_length: 65535 max_tokens: 8192

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Setup]: For some small 2b、4b、8B、14B parameter models, how can the Hermes agent be run offline when the computing power resources are insufficient?

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What's Going Wrong?

Steps Taken

Installation Method

Operating System

Python Version

Hermes Version

Debug Report

Full Error Output

What I've Already Tried

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Setup]: For some small 2b、4b、8B、14B parameter models, how can the Hermes agent be run offline when the computing power resources are insufficient?

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What's Going Wrong?

Steps Taken

Installation Method

Operating System

Python Version

Hermes Version

Debug Report

Full Error Output

What I've Already Tried

Still need to ship something?

RELATED_DISCOVERY

TRENDING