hermes - 💡(How to fix) Fix [Bug]: AIAgent.init performs synchronous blocking network I/O (httpx.post) during construction

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Additional Logs / Traceback (optional)

Root Cause

Description During cProfile benchmarking for an orchestration engine that requires spinning up ephemeral, per-turn agents, we identified a severe latency tax and architectural violation during agent instantiation. The AIAgent constructor performs a synchronous, blocking HTTP network call before returning the object. Constructors should strictly build memory state and should not establish outbound network sockets. The Call Chain A cold/warm start analysis reveals that ~96% of the instantiation time (approx. 150–160ms on a local network) is consumed by a hidden probe to a local Ollama server: AIAgent.init → self._check_compression_model_feasibility → get_model_context_length → _query_ollama_api_show → httpx._client.post (synchronous blocking network I/O) Impact

  1. Ephemeral & Multi-Agent Blocker For legacy, monolithic use cases where a single agent is instantiated once at CLI startup, this 150ms penalty is invisible. However, for architectures utilizing ephemeral agents (constructed per-turn) or dense multi-agent orchestration, this forces the runtime to pause and ping an external web server just to allocate an object, destroying the latency budget.
  2. Security & Observability (SSRF / DoS) Burying a synchronous network egress inside an object's boot sequence is a security anti-pattern. If AIAgent is instantiated in a web context or exposed to untrusted input routing, this unobservable network call introduces a potential Server-Side Request Forgery (SSRF) vector. Furthermore, because it is synchronous, a slow or hung local Ollama server will lock the main thread during object creation, creating an easy vector for thread exhaustion and Denial of Service (DoS). Proposed Solution Decouple the compression feasibility check from the constructor.

Fix Action

Fix / Workaround

We have successfully patched this in our downstream fork by extracting the probe into a lazy-evaluating Dispatcher and injecting the resolved context into the AIAgent constructor, which reduced per-turn agent instantiation time from 176ms down to ~15ms. Related Issues

#8499 — auxiliary.compression.context_length config ignored in _check_compression_model_feasibility. Same call site, different defect (config not passed through). Evidence that this function is already a known friction surface. #12977 — custom_providers.models.context_length not propagated to the auxiliary compression feasibility check. Another config-passthrough patch on the same function. The recurring pattern here suggests the right fix is structural (decouple from constructor) rather than additional parameter threading.

Hermes Agent: current main Provider: custom (Ollama, local) Profiling: cProfile, cold/warm start analysis Fork patch: Dispatcher-based lazy evaluation with DI into constructor Project contentgrove-autonomaton Sprint PMCreated by youGrove Canonical Standards4 linestextfailure-modes.md51 linesmdmother-prompt.md37 linesmdContentWhy don't you write that up. I'll file it.Drive

Code Example

import cProfile
from hermes.run_agent import AIAgent

cProfile.run("""
agent = AIAgent(
    provider="ollama",
    base_url="http://localhost:11434",
    model="your-model"
)
""", sort="cumulative")

---

for _ in range(10):
    agent = AIAgent(...)  # each iteration pays the full ~150ms

---

did not run contemporaneously

---
RAW_BUFFERClick to expand / collapse

Bug Description

Description During cProfile benchmarking for an orchestration engine that requires spinning up ephemeral, per-turn agents, we identified a severe latency tax and architectural violation during agent instantiation. The AIAgent constructor performs a synchronous, blocking HTTP network call before returning the object. Constructors should strictly build memory state and should not establish outbound network sockets. The Call Chain A cold/warm start analysis reveals that ~96% of the instantiation time (approx. 150–160ms on a local network) is consumed by a hidden probe to a local Ollama server: AIAgent.init → self._check_compression_model_feasibility → get_model_context_length → _query_ollama_api_show → httpx._client.post (synchronous blocking network I/O) Impact

  1. Ephemeral & Multi-Agent Blocker For legacy, monolithic use cases where a single agent is instantiated once at CLI startup, this 150ms penalty is invisible. However, for architectures utilizing ephemeral agents (constructed per-turn) or dense multi-agent orchestration, this forces the runtime to pause and ping an external web server just to allocate an object, destroying the latency budget.
  2. Security & Observability (SSRF / DoS) Burying a synchronous network egress inside an object's boot sequence is a security anti-pattern. If AIAgent is instantiated in a web context or exposed to untrusted input routing, this unobservable network call introduces a potential Server-Side Request Forgery (SSRF) vector. Furthermore, because it is synchronous, a slow or hung local Ollama server will lock the main thread during object creation, creating an easy vector for thread exhaustion and Denial of Service (DoS). Proposed Solution Decouple the compression feasibility check from the constructor.

Dependency Injection: Allow the feasibility state (or the resolved model_context_length) to be passed into init as an optional argument. Lazy Initialization / Async Resolution: If the context is not provided, the agent should resolve this state asynchronously after the object is instantiated (e.g., during the first actual reasoning loop), rather than blocking the constructor.

We have successfully patched this in our downstream fork by extracting the probe into a lazy-evaluating Dispatcher and injecting the resolved context into the AIAgent constructor, which reduced per-turn agent instantiation time from 176ms down to ~15ms. Related Issues

#8499 — auxiliary.compression.context_length config ignored in _check_compression_model_feasibility. Same call site, different defect (config not passed through). Evidence that this function is already a known friction surface. #12977 — custom_providers.models.context_length not propagated to the auxiliary compression feasibility check. Another config-passthrough patch on the same function. The recurring pattern here suggests the right fix is structural (decouple from constructor) rather than additional parameter threading.

Discovery Context This defect was identified during implementation of GRV-001, a governance standard for model-independent orchestration architecture. GRV-001 requires that agent runtimes treat model bindings as late-resolved configuration rather than constructor-time assumptions — making synchronous network probes during object construction a direct architectural violation. The profiling was conducted against a fork implementing GRV-001's Cognitive Router pattern, which instantiates ephemeral, tier-scoped agents per routing decision. Environment

Hermes Agent: current main Provider: custom (Ollama, local) Profiling: cProfile, cold/warm start analysis Fork patch: Dispatcher-based lazy evaluation with DI into constructor Project contentgrove-autonomaton Sprint PMCreated by youGrove Canonical Standards4 linestextfailure-modes.md51 linesmdmother-prompt.md37 linesmdContentWhy don't you write that up. I'll file it.Drive

Steps to Reproduce

  1. Start a local Ollama instance serving any model.
  2. Configure Hermes with auxiliary.compression.model pointed at the local Ollama endpoint.
  3. Profile AIAgent construction:
import cProfile
from hermes.run_agent import AIAgent

cProfile.run("""
agent = AIAgent(
    provider="ollama",
    base_url="http://localhost:11434",
    model="your-model"
)
""", sort="cumulative")
  1. Observe that httpx._client.post dominates the profile at ~150–160ms, called from _query_ollama_api_show via _check_compression_model_feasibility.
  2. Instantiate in a loop to confirm the penalty is per-construction, not a one-time cold start:
for _ in range(10):
    agent = AIAgent(...)  # each iteration pays the full ~150ms

Impact

1. Ephemeral & Multi-Agent Blocker

For legacy, monolithic use cases where a single agent is instantiated once at CLI startup, this 150ms penalty is invisible. However, for architectures utilizing ephemeral agents (constructed per-turn) or dense multi-agent orchestration, this forces the runtime to pause and ping an external web server just to allocate an object, destroying the latency budget.

2. Security & Observability (SSRF / DoS)

Burying a synchronous network egress inside an object's boot sequence is a security anti-pattern. If AIAgent is instantiated in a web context or exposed to untrusted input routing, this unobservable network call introduces a potential Server-Side Request Forgery (SSRF) vector. Furthermore, because it is synchronous, a slow or hung local Ollama server will lock the main thread during object creation, creating an easy vector for thread exhaustion and Denial of Service (DoS).

Expected Behavior

AIAgent.init should return a fully constructed object without performing any network I/O. Model context length and compression feasibility should be resolved either via dependency injection (caller provides the value) or lazily on first use during the reasoning loop. Construction time should be bounded by memory allocation, not network latency.

Actual Behavior

AIAgent.init blocks on a synchronous httpx.post to the Ollama /api/show endpoint every time an agent is constructed, adding ~150–160ms of mandatory network latency before the object is usable.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

did not run contemporaneously

Operating System

macos 26.4.1 (25E253)

Python Version

No response

Hermes Version

No response

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: AIAgent.init performs synchronous blocking network I/O (httpx.post) during construction