hermes - 💡(How to fix) Fix MiniMax: stream stall + rebuild failure leaves worker hung (provider=minimax, api_key dropped during _replace_primary_openai_client)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When MiniMax-M2.7 streaming stalls mid-conversation (server stops sending chunks), Hermes correctly detects via chat_completion_helpers.py stale-stream timer and attempts to rebuild the OpenAI SDK client via _replace_primary_openai_client (run_agent.py:2443). The rebuild then fails with:

Failed to rebuild shared OpenAI client (stale_stream_pool_cleanup) ...
provider=minimax base_url=https://api.minimax.io/anthropic model=MiniMax-M2.7
error=The api_key client option must be set either by passing api_key to the client
or by setting the OPENAI_API_KEY environment variable

Failure leaves the worker session unable to recover — process stays alive but cannot make further API calls. In kanban-worker mode this manifests as a hung task that times out at claim expiry. Manual kill -9 + force-complete required per stall.

Error Message

def _replace_primary_openai_client(self, *, reason: str) -> bool: with self._openai_client_lock(): old_client = getattr(self, "client", None) try: # Inject api_key for providers that use custom x-api-key headers # but still need to satisfy the OpenAI SDK validator. kwargs = dict(self._client_kwargs) if getattr(self, "provider", None) in {"minimax", "minimax-cn"} and not kwargs.get("api_key"): import os env_name = "MINIMAX_API_KEY" if self.provider == "minimax" else "MINIMAX_CN_API_KEY" kwargs["api_key"] = os.environ.get(env_name) or "sk-placeholder" new_client = self._create_openai_client(kwargs, reason=reason, shared=True) except Exception as exc: logger.warning( "Failed to rebuild shared OpenAI client (%s) %s error=%s", reason, self._client_log_context(), exc, ) return False self.client = new_client self._close_openai_client(old_client, reason=f"replace:{reason}", shared=True) return True

Root Cause

_replace_primary_openai_client reuses self._client_kwargs cached at init. For minimax provider, auth uses custom x-api-key header (per MiniMax API docs) — not the OpenAI SDK's native api_key param. Initial construction works via default_headers={'x-api-key': KEY} because the SDK accepts an empty api_key field when headers are explicit, OR the credential_pool populates a value at init. On rebuild, however, the cached client_kwargs reaches OpenAI(**kwargs) with api_key empty/None, and the SDK validator falls through to checking OPENAI_API_KEY env (which MiniMax users don't have set).

Fix Action

Fix / Workaround

Local patch I verified works (run_agent.py:2443):

A more robust fix would have create_openai_client in agent_runtime_helpers.py re-resolve credentials from credential_pool at every call rather than relying on cached kwargs — but the targeted patch above unblocks the immediate problem.

PeriodSessions testedRebuild failures
Pre-patch (today, 5+ sessions across 3 profiles)mixed contexts 5–35k tokens5+
Post-patch (3 sessions)28k / 24k / 11k tokens0

Code Example

Failed to rebuild shared OpenAI client (stale_stream_pool_cleanup) ...
provider=minimax base_url=https://api.minimax.io/anthropic model=MiniMax-M2.7
error=The api_key client option must be set either by passing api_key to the client
or by setting the OPENAI_API_KEY environment variable

---

hermes -p <profile> --skills kanban-worker chat -q work kanban task <id>

---

def _replace_primary_openai_client(self, *, reason: str) -> bool:
    with self._openai_client_lock():
        old_client = getattr(self, "client", None)
        try:
            # Inject api_key for providers that use custom x-api-key headers
            # but still need to satisfy the OpenAI SDK validator.
            kwargs = dict(self._client_kwargs)
            if getattr(self, "provider", None) in {"minimax", "minimax-cn"} and not kwargs.get("api_key"):
                import os
                env_name = "MINIMAX_API_KEY" if self.provider == "minimax" else "MINIMAX_CN_API_KEY"
                kwargs["api_key"] = os.environ.get(env_name) or "sk-placeholder"
            new_client = self._create_openai_client(kwargs, reason=reason, shared=True)
        except Exception as exc:
            logger.warning(
                "Failed to rebuild shared OpenAI client (%s) %s error=%s",
                reason, self._client_log_context(), exc,
            )
            return False
        self.client = new_client
    self._close_openai_client(old_client, reason=f"replace:{reason}", shared=True)
    return True
RAW_BUFFERClick to expand / collapse

Summary

When MiniMax-M2.7 streaming stalls mid-conversation (server stops sending chunks), Hermes correctly detects via chat_completion_helpers.py stale-stream timer and attempts to rebuild the OpenAI SDK client via _replace_primary_openai_client (run_agent.py:2443). The rebuild then fails with:

Failed to rebuild shared OpenAI client (stale_stream_pool_cleanup) ...
provider=minimax base_url=https://api.minimax.io/anthropic model=MiniMax-M2.7
error=The api_key client option must be set either by passing api_key to the client
or by setting the OPENAI_API_KEY environment variable

Failure leaves the worker session unable to recover — process stays alive but cannot make further API calls. In kanban-worker mode this manifests as a hung task that times out at claim expiry. Manual kill -9 + force-complete required per stall.

Environment

  • Hermes Agent v0.13.0 → v0.14.0 (persists across update)
  • macOS 26.5, Apple Silicon, Python 3.11.15
  • Provider: minimax (Token Plan Key auth, sk-cp-...)
  • base_url: https://api.minimax.io/anthropic, model: MiniMax-M2.7

Reproduction

  1. Configure profile with MiniMax provider + MINIMAX_API_KEY in .env
  2. Run a kanban-worker session that accumulates context > ~10k tokens:
    hermes -p <profile> --skills kanban-worker chat -q work kanban task <id>
  3. Wait. After context grows past ~10k tokens, MiniMax server drops streams mid-response (no error, no close frame).
  4. Hermes detects stale stream → tries rebuild → fails with above error.

Reproduces 100% on contexts >~10k tokens. Observed today across 5+ separate sessions (3 profiles: anmaioyi, yefan, shihao). Token-at-stall sizes: 9,045 / 10,602 / 25,330 / 34,699.

Root cause

_replace_primary_openai_client reuses self._client_kwargs cached at init. For minimax provider, auth uses custom x-api-key header (per MiniMax API docs) — not the OpenAI SDK's native api_key param. Initial construction works via default_headers={'x-api-key': KEY} because the SDK accepts an empty api_key field when headers are explicit, OR the credential_pool populates a value at init. On rebuild, however, the cached client_kwargs reaches OpenAI(**kwargs) with api_key empty/None, and the SDK validator falls through to checking OPENAI_API_KEY env (which MiniMax users don't have set).

Suggested fix

Ensure client_kwargs.api_key is non-empty when provider is minimax/minimax-cn at rebuild time, by reading the env var directly. Real auth still flows through default_headers={'x-api-key': KEY} — this just satisfies the SDK validator.

Local patch I verified works (run_agent.py:2443):

def _replace_primary_openai_client(self, *, reason: str) -> bool:
    with self._openai_client_lock():
        old_client = getattr(self, "client", None)
        try:
            # Inject api_key for providers that use custom x-api-key headers
            # but still need to satisfy the OpenAI SDK validator.
            kwargs = dict(self._client_kwargs)
            if getattr(self, "provider", None) in {"minimax", "minimax-cn"} and not kwargs.get("api_key"):
                import os
                env_name = "MINIMAX_API_KEY" if self.provider == "minimax" else "MINIMAX_CN_API_KEY"
                kwargs["api_key"] = os.environ.get(env_name) or "sk-placeholder"
            new_client = self._create_openai_client(kwargs, reason=reason, shared=True)
        except Exception as exc:
            logger.warning(
                "Failed to rebuild shared OpenAI client (%s) %s error=%s",
                reason, self._client_log_context(), exc,
            )
            return False
        self.client = new_client
    self._close_openai_client(old_client, reason=f"replace:{reason}", shared=True)
    return True

A more robust fix would have create_openai_client in agent_runtime_helpers.py re-resolve credentials from credential_pool at every call rather than relying on cached kwargs — but the targeted patch above unblocks the immediate problem.

Verification of fix

Comparison of Failed to rebuild occurrences in errors.log (single user, single profile, comparable workloads):

PeriodSessions testedRebuild failures
Pre-patch (today, 5+ sessions across 3 profiles)mixed contexts 5–35k tokens5+
Post-patch (3 sessions)28k / 24k / 11k tokens0

Specific session post-patch (the stress test):

  • yefan, context ~28k tokens, completed in 73s, zero rebuild fails
  • Comparable pre-patch run on same profile: stalled at 10.6k tokens, hung 15+ min until killed

Related observations (separate issues, mentioning for context)

While investigating, I also hit:

  1. HERMES_HOME=$profile_dir causes env_loader.py to look for .env in profile dir, missing the user-level ~/.hermes/.env. When gateways start via launchd (plist sets HERMES_HOME=/Users/ben/.hermes/profiles/<name>), load_hermes_dotenv() reads $HERMES_HOME/.env which doesn't exist in profile dirs by default. Worked around by symlinking ~/.hermes/.env into each profile dir. Worth either documenting or auto-resolving to user-level when running under profile.

  2. provider: minimax-oauth rejected by MiniMax server with 401 + "Please carry the API secret key in the 'X-Api-Key' field of the request header". OAuth tokens stored via hermes auth login minimax-oauth aren't accepted by api.minimax.io/anthropic — the server requires X-Api-Key (the Token Plan Key, format sk-cp-...). Switching provider: minimax-oauthprovider: minimax in profile config + setting MINIMAX_API_KEY resolved auth. If minimax-oauth no longer works server-side it might be worth deprecating or remapping at the client.

Impact

  • All multi-step agent sessions with MiniMax-M2.7 hit the rebuild wall once context grows
  • For kanban-worker mode (multi-agent orchestration), every long task fails until manual recovery
  • Until patched, MiniMax-M2.7 is effectively limited to short single-turn sessions

Reporter notes

Happy to open a PR with the patch above if a maintainer confirms approach. Full errors.log excerpts and reproducer available if useful.

Diagnostic write-up + commit history of working around this: https://github.com/deathsky0725/heaveneye (see docs/hermes-issue-minimax-stream-stall.md + run_agent.py:2443 patch applied locally).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix MiniMax: stream stall + rebuild failure leaves worker hung (provider=minimax, api_key dropped during _replace_primary_openai_client)