hermes - 💡(How to fix) Fix [Setup]: For some small 2b、4b、8B、14B parameter models, how can the Hermes agent be run offline when the computing power resources are insufficient?

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

01:53:23INFOaiohttp.accessPOST/v1/responses200 ValueError: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong. raise ValueError( File "/usr/local/lib/hermes-agent/run_agent.py", line 2711, in _check_compression_model_feasibility self._check_compression_model_feasibility() File "/usr/local/lib/hermes-agent/run_agent.py", line 2192, in init ^^^^^^^^ agent = AIAgent( File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 830, in _create_agent ^^^^^^^^^^^^^^^^^^^ agent = self._create_agent( File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2575, in _run ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ result = self.fn(*self.args, **self.kwargs) File "/root/.local/share/uv/python/cpython-3.11.15-linux-x86_64-gnu/lib/python3.11/concurrent/futures/thread.py", line 58, in run ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ return await loop.run_in_executor(None, _run) File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2605, in _run_agent ^^^^^^^^^^^^^^^^ result, agent_usage = await agent_task File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 1777, in _write_sse_responses Traceback (most recent call last): 01:53:23ERRORgateway.platforms.api_serverError running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong. 01:53:23INFOagent.auxiliary_clientAuxiliary auto-detect: using main provider custom (qwen3-8b) 01:53:23INFOagent.auxiliary_clientVision auto-detect: using main provider custom (qwen3-8b)

Code Example



---

01:53:23INFOaiohttp.accessPOST/v1/responses200
ValueError: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
raise ValueError(
File "/usr/local/lib/hermes-agent/run_agent.py", line 2711, in _check_compression_model_feasibility
self._check_compression_model_feasibility()
File "/usr/local/lib/hermes-agent/run_agent.py", line 2192, in __init__
^^^^^^^^
agent = AIAgent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 830, in _create_agent
^^^^^^^^^^^^^^^^^^^
agent = self._create_agent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2575, in _run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
result = self.fn(*self.args, **self.kwargs)
File "/root/.local/share/uv/python/cpython-3.11.15-linux-x86_64-gnu/lib/python3.11/concurrent/futures/thread.py", line 58, in run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return await loop.run_in_executor(None, _run)
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2605, in _run_agent
^^^^^^^^^^^^^^^^
result, agent_usage = await agent_task
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 1777, in _write_sse_responses
Traceback (most recent call last):
01:53:23ERRORgateway.platforms.api_serverError running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
01:53:23INFOagent.auxiliary_clientAuxiliary auto-detect: using main provider custom (qwen3-8b)
01:53:23INFOagent.auxiliary_clientVision auto-detect: using main provider custom (qwen3-8b)
RAW_BUFFERClick to expand / collapse

What's Going Wrong?

For some small 2b、4b、8B、14B parameter models, how can the Hermes agent be run offline when the computing power resources are insufficient?

Error running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.

Steps Taken

context_length: 65535 max_tokens: 8192

Installation Method

Docker

Operating System

Ubuntu22.04

Python Version

3.11.9

Hermes Version

0.12.0

Debug Report

Full Error Output

01:53:23INFOaiohttp.accessPOST/v1/responses200
ValueError: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
raise ValueError(
File "/usr/local/lib/hermes-agent/run_agent.py", line 2711, in _check_compression_model_feasibility
self._check_compression_model_feasibility()
File "/usr/local/lib/hermes-agent/run_agent.py", line 2192, in __init__
^^^^^^^^
agent = AIAgent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 830, in _create_agent
^^^^^^^^^^^^^^^^^^^
agent = self._create_agent(
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2575, in _run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
result = self.fn(*self.args, **self.kwargs)
File "/root/.local/share/uv/python/cpython-3.11.15-linux-x86_64-gnu/lib/python3.11/concurrent/futures/thread.py", line 58, in run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return await loop.run_in_executor(None, _run)
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 2605, in _run_agent
^^^^^^^^^^^^^^^^
result, agent_usage = await agent_task
File "/usr/local/lib/hermes-agent/gateway/platforms/api_server.py", line 1777, in _write_sse_responses
Traceback (most recent call last):
01:53:23ERRORgateway.platforms.api_serverError running agent for streaming responses: Auxiliary compression model qwen3-8b has a context window of 28,400 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a compression model with at least 64K context (set auxiliary.compression.model in config.yaml), or set auxiliary.compression.context_length to override the detected value if it is wrong.
01:53:23INFOagent.auxiliary_clientAuxiliary auto-detect: using main provider custom (qwen3-8b)
01:53:23INFOagent.auxiliary_clientVision auto-detect: using main provider custom (qwen3-8b)

What I've Already Tried

context_length: 65535 max_tokens: 8192

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Setup]: For some small 2b、4b、8B、14B parameter models, how can the Hermes agent be run offline when the computing power resources are insufficient?