hermes - ✅(Solved) Fix [Bug]: HTTP 502 when using local LLM servers (llama.cpp on WSL2 + Windows 11) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14916Fetched 2026-04-24 10:44:19
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×4referenced ×2

Error Message

  • Send any message → HTTP 502 error API call failed after 3 retries. HTTP 502: Error code: 502

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

PR fix notes

PR #15056: fix(agent): skip TCP keepalives for local endpoints to avoid HTTP 502 (#14916)

Description (problem / solution / changelog)

What does this PR do?

Skip TCP keepalive socket-option injection for local endpoints (localhost, 127.0.0.1, RFC-1918 private ranges, etc.) in _create_openai_client().

The keepalive-enabled custom httpx.Client (added in #10324 / #11277) causes HTTP 502 errors when connecting to local LLM servers such as llama.cpp, Ollama, and vLLM, because those servers do not expect or handle SO_KEEPALIVE / TCP_KEEPIDLE socket options. Remote cloud providers still receive the keepalive injection so dead-peer detection continues to work.

Related Issue

Fixes #14916

Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

- run_agent.py (line ~4570): Added is_local_endpoint(base_url) guard before injecting the keepalive httpx.Client. Local endpoints skip injection; remote endpoints continue to receive it.
- tests/run_agent/test_create_openai_client_local_endpoint.py: New test suite with 6 cases:
  - test_localhost_skips_http_client
  - test_127_0_0_1_skips_http_client
  - test_0_0_0_0_skips_http_client
  - test_openrouter_gets_http_client
  - test_cloud_provider_gets_http_client
  - test_explicit_http_client_preserved_for_localhost

How to Test

1. Run the new tests: pytest tests/run_agent/test_create_openai_client_local_endpoint.py -v
2. Verify local endpoints skip keepalive: all 3 localhost tests should show http_client is None
3. Verify remote endpoints keep keepalive: openrouter / OpenAI tests should show http_client is injected
4. Run existing keepalive tests to confirm no regression: pytest tests/run_agent/test_create_openai_client_reuse.py tests/run_agent/test_create_openai_client_proxy_env.py -v

Checklist

Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains only changes related to this fix/feature (no unrelated commits)
- [ ] I've run pytest tests/ -q and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15.x, Python 3.11

Documentation & Housekeeping

- [ ] I've updated relevant documentation (README, docs/, docstrings) — or N/A
- [ ] I've updated cli-config.yaml.example if I added/changed config keys — or N/A
- [ ] I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
- [ ] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Changed files

  • run_agent.py (modified, +8/-3)
  • tests/run_agent/test_create_openai_client_local_endpoint.py (added, +153/-0)

Code Example

=== HERMES AGENT DEBUG REPORT ===
      Generated: Fri Apr 24 12:50:00 CST 2026 (WSL2)
      [SYSTEM]
      OS: Ubuntu 24.04 (via WSL2)
      Python: System / venv active
      Status: Operational
      [MODEL CONFIGURATION]
      Model ID: Qwen3.6-35B-A3B-Q4_K_M.gguf
      Provider: custom (llama-local)
      Base URL: http://localhost:1234/v1
      Connection: Active (Responding)
      Quantization: 4-bit (Q4_K_M) -> ⚠️ Low precision for complex tasks
      [AGENT SETTINGS]
      Max Turns: 90
      Reasoning Effort: Medium
      Compression: Enabled (Threshold 0.7, Target Ratio 0.2)
      Toolsets: ['hermes-cli']
      [SERVICES STATUS]
      llama-server: Running (Port 1234)
      Training Runs: None active
      Gateway: Healthy
      [KNOWN ISSUES / NOTES]
  - Q4_K_M quantization may cause logic degradation compared to Q8_0/FP16.
  - No fallback providers configured in config.yaml.

---
RAW_BUFFERClick to expand / collapse

Bug Description

In run_agent.py, _create_openai_client() has a skip logic for localhost endpoints that always fails due to using startswith() on a URL:

# Bug: "http://localhost:1234/v1".startswith("localhost") == False!
_skip_local_http_client = str(client_kwargs.get("base_url", "")).startswith(("localhost", "127.0.0.1"))

The base_url is "http://localhost:1234/v1" which starts with http://, not localhost. Therefore _skip_local_http_client is always False, and a custom httpx.Client with TCP keepalive socket options gets injected into the OpenAI SDK. This causes HTTP 502 errors when connecting to local endpoints.

Environment:

  • OS: WSL2 Ubuntu 24.04 + Windows 11 host
  • Hermes Agent version: Latest (post v0.11.0 release)
  • OpenAI SDK: 2.31.0
  • httpx: 0.28.1
  • Local server: llama.cpp llama-server on localhost:1234

The bug was introduced in commit 8c478983e (Teknium, Apr 16, 2026) which added TCP keepalive socket options for detecting dead peer connections (#10324). The skip logic uses the wrong string method — startswith() checks from position 0, but URLs start with their scheme (http:// or https://).

Steps to Reproduce

  • Run llama-server (or Ollama/vLLM) on localhost:1234

  • Configure ~/.hermes/config.yaml:

    model:
      provider: custom
      base_url: http://localhost:1234/v1
      api_key: llama-local
  • Start TUI with hermes --tui

  • Send any message → HTTP 502 error

Expected Behavior

Local endpoints should work without errors. The custom httpx.Client injection should be skipped for localhost/127.0.0.1 URLs.

Actual Behavior

All API calls to local endpoints fail with:

API call failed after 3 retries. HTTP 502: Error code: 502

Affected Component

CLI (interactive chat)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

=== HERMES AGENT DEBUG REPORT ===
      Generated: Fri Apr 24 12:50:00 CST 2026 (WSL2)
      [SYSTEM]
      OS: Ubuntu 24.04 (via WSL2)
      Python: System / venv active
      Status: Operational
      [MODEL CONFIGURATION]
      Model ID: Qwen3.6-35B-A3B-Q4_K_M.gguf
      Provider: custom (llama-local)
      Base URL: http://localhost:1234/v1
      Connection: Active (Responding)
      Quantization: 4-bit (Q4_K_M) -> ⚠️ Low precision for complex tasks
      [AGENT SETTINGS]
      Max Turns: 90
      Reasoning Effort: Medium
      Compression: Enabled (Threshold 0.7, Target Ratio 0.2)
      Toolsets: ['hermes-cli']
      [SERVICES STATUS]
      llama-server: Running (Port 1234)
      Training Runs: None active
      Gateway: Healthy
      [KNOWN ISSUES / NOTES]
  - Q4_K_M quantization may cause logic degradation compared to Q8_0/FP16.
  - No fallback providers configured in config.yaml.

Operating System

Windows 11

Python Version

3.12.3

Hermes Version

Hermes Agent v0.11.0 (2026.4.23)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

The bug is in run_agent.py, function _create_openai_client(). The localhost detection logic uses startswith() which fails because URLs start with their scheme (http://), not the hostname.

Fix: Change from:


_skip_local_http_client = str(client_kwargs.get("base_url", "")).startswith(("localhost", "127.0.0.1"))

To:
_base_url_raw = str(client_kwargs.get("base_url", "")).lower()

_skip_local_http_client = "localhost" in _base_url_raw or "127.0.0.1" in _base_url_raw

This uses substring matching (in) instead of prefix matching (startswith()), correctly identifying localhost URLs regardless of their protocol prefix. The .lower() ensures case-insensitive matching for edge cases like http://localhost:1234/.

### Are you willing to submit a PR for this?

- [x] I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The fix involves changing the localhost detection logic in _create_openai_client() to use substring matching instead of prefix matching.

Guidance

  • Identify the run_agent.py file and locate the _create_openai_client() function.
  • Replace the existing _skip_local_http_client line with the proposed fix, which uses the in operator for substring matching.
  • Verify that the fix works by running the application and checking for HTTP 502 errors when connecting to local endpoints.
  • Test the application with different URL formats, such as http://localhost:1234/v1 and https://127.0.0.1:1234/v1, to ensure the fix is case-insensitive and works with different protocols.

Example

_base_url_raw = str(client_kwargs.get("base_url", "")).lower()
_skip_local_http_client = "localhost" in _base_url_raw or "127.0.0.1" in _base_url_raw

Notes

This fix assumes that the base_url parameter is a string and that the client_kwargs dictionary contains the base_url key. If this is not the case, additional error handling may be necessary.

Recommendation

Apply the proposed workaround by changing the localhost detection logic to use substring matching. This fix should resolve the HTTP 502 errors when connecting to local endpoints.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING