hermes - ✅(Solved) Fix tracking: provider transport refactor (agent/transports/) [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#13473Fetched 2026-04-22 08:06:15
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
cross-referenced ×4labeled ×1subscribed ×1

This is a two-cycle refactor of hermes-agent's provider infrastructure.

Cycle 1 (this issue): Transport layer — Extract format conversion and response normalization from run_agent.py into agent/transports/. Each transport owns convert_messages, convert_tools, build_kwargs, normalize_response. Client lifecycle, streaming, credentials, and prompt caching stay on AIAgent.

Cycle 2 (future): Provider modules — Consolidate per-provider quirks (currently scattered across 5+ files) into single-file provider definitions under providers/. Each provider module declares its auth, endpoints, client headers, temperature behavior, max_tokens defaults, message preprocessing, and extra_body construction in one place. Transports become generic — they read from the provider object instead of checking boolean flags. See Cycle 2 Design below.

Principle: Every PR wires its code to real production paths in the same PR. No dormant abstractions.


Root Cause

Problem Cycle 1 leaves behind: Provider quirks are still scattered across auth.py, runtime_provider.py, models.py, auxiliary_client.py, run_agent.py, and the transports themselves. Adding a new provider requires touching 5+ files. The ChatCompletionsTransport takes 20+ boolean params because each provider's quirks are passed as flags.

Fix Action

Fix / Workaround

PRStatusWhat it doesLines
PR 1 #12975✅ MergedExtract 10 Codex Responses API functions into agent/codex_responses_adapter.py-565 from run_agent.py
PR 2 #13347✅ MergedAdd agent/transports/types.py (NormalizedResponse, ToolCall, Usage) + migrate Anthropic normalize path+554
PR 3 #13366✅ MergedAdd ProviderTransport ABC + AnthropicTransport, wire all Anthropic paths (9 sites)+539/-45
PR 4 #13430🔄 OpenAdd ResponsesApiTransport, wire all Codex paths, remove 7 dead wrappers+590/-169
PR 5 #13447🔄 OpenAdd ChatCompletionsTransport, wire all default paths (210-line kwargs block extracted)+640/-227
PR 6 #13467🔄 OpenAdd BedrockTransport, wire all Bedrock paths+383/-13
PR 7📋 PlannedUnify dispatch — remove dead api_mode branches, collapse normalize shimsDep: 4,5,6
PR 8📋 PlannedSimplify runtime_provider.py — transport registry replaces manual api_mode routingDep: 7
PR 9📋 PlannedDocumentation — architecture guide, transport authoring guideDep: 8
  1. reasoning_content vs reasoning — two distinct fields downstream, transport merges them into reasoning. The thinking-prefill check reads reasoning_content separately.
  2. Prompt caching runs between convert and build_kwargsapply_anthropic_cache_control mutates messages after conversion. Transport can't produce final API-ready messages alone.
  3. ChatCompletionsTransport has 13 provider conditionals — flags passed as explicit params. Works but the param list is long. This is the primary motivation for Cycle 2.
  4. flush_memories and iteration_limit_summary have their own normalize dispatch — wired through transports now but still have separate code paths.
  5. Bedrock normalizes at dispatch, not at the main loop — the transport handles both shapes (raw boto3 dict + already-normalized SimpleNamespace).
  6. _ephemeral_max_output_tokens is consumed by both Anthropic and chat_completions branches — shared agent state that both transports need.

PR fix notes

PR #13805: feat: add ChatCompletionsTransport + wire all default paths

Description (problem / solution / changelog)

Salvages #13447 with regression fixes and Kimi port.

Third transport — handles the default chat_completions api_mode used by ~16 OpenAI-compatible providers. Closes the main PR 5 of the transport refactor series (issue #13473).

Changes vs #13447

  • Preserve tool_call.extra_content (Gemini thought_signature) via ToolCall.provider_data — the original shim stripped it, causing 400 errors on multi-turn Gemini 3 thinking.
  • Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so the thinking-prefill retry check still triggers.
  • Port Kimi/Moonshot quirks that landed on main after the original PR (32000 max_tokens default, top-level reasoning_effort, extra_body.thinking).
  • Skip the SimpleNamespace shim in the main normalize loop — for chat_completions, response.choices[0].message is already the right shape.

Impact

run_agent.py: -239 lines in _build_api_kwargs default branch.

Transport coverage

api_modeTransportbuild_kwargsnormalizevalidate
anthropic_messagesAnthropicTransport
codex_responsesResponsesApiTransport
chat_completionsChatCompletionsTransport
bedrock_converse— (PR #13467)

Validation

Result
New transport tests39 pass (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize, 3 cache, 3 basic)
tests/run_agent/885/885 pass (+ 15 skipped; the single test_concurrent_interrupt failure is a pre-existing flake on origin/main)
E2E — Gemini extra_contentLive check with real openai.types.chat.ChatCompletionMessageToolCall: provider_data["extra_content"] preserved ✅
E2E — Kimi build_kwargsmax_tokens=32000, reasoning_effort=high, extra_body.thinking={"type":"enabled"} ✅
E2E — Kimi thinking-offreasoning_effort omitted, thinking={"type":"disabled"} ✅
E2E — reasoning_contentpreserved separately in provider_data ✅

Closes #13447 (merging this credits @kshitijk4poor's original work).

Changed files

  • agent/transports/__init__.py (modified, +4/-0)
  • agent/transports/chat_completions.py (added, +387/-0)
  • run_agent.py (modified, +91/-239)
  • tests/agent/transports/test_chat_completions.py (added, +349/-0)

PR #13814: feat: add BedrockTransport + wire all Bedrock transport paths

Description (problem / solution / changelog)

Salvages #13467. Fourth and final transport — completes the transport layer with all four api_modes covered (issue #13473, Cycle 1).

Changes vs #13467

One adjustment from the original: the main normalize loop does NOT add a bedrock_converse branch to invoke normalize_response on the response. Bedrock's normalize_converse_response runs at the dispatch site (run_agent.py:5189), so the response already has the OpenAI-compatible .choices[0].message shape by the time the main loop sees it. Falling through to the chat_completions else branch is correct and sidesteps a redundant NormalizedResponse rebuild.

Everything else is preserved: build_kwargs, validate_response, finish_reason branch, normalize_response (still usable by direct callers), map_finish_reason.

Transport coverage — COMPLETE

api_modeTransportbuild_kwargsnormalizevalidate
anthropic_messagesAnthropicTransport
codex_responsesResponsesApiTransport
chat_completionsChatCompletionsTransport
bedrock_converseBedrockTransport

Validation

Result
BedrockTransport tests18 pass
All transport tests117 pass
All bedrock/converse tests across tests/agent/160 pass
tests/run_agent/885/885 pass (+ 15 skipped; the single test_concurrent_interrupt failure is pre-existing on origin/main)
E2E — build_kwargsmodel, region, max_tokens, guardrail ✅
E2E — validate_responseraw dict + normalized SimpleNamespace ✅
E2E — normalize_responsetext + tool call ✅
E2E — _build_api_kwargs integration

Closes #13467 (merging this credits @kshitijk4poor's original work).

Changed files

  • agent/transports/__init__.py (modified, +4/-0)
  • agent/transports/bedrock.py (added, +154/-0)
  • run_agent.py (modified, +30/-13)
  • tests/agent/transports/test_bedrock_transport.py (added, +164/-0)

PR #13862: refactor: unify transport dispatch + collapse normalize shims

Description (problem / solution / changelog)

Summary

PR 7 of the provider transport refactor (#13473). PRs 1-6 wired all 4 transports to production paths. This PR consolidates the wiring.

What this PR does

1. Consolidate 4 transport helpers → 1

Replace _get_anthropic_transport(), _get_codex_transport(), _get_chat_completions_transport(), _get_bedrock_transport() with one generic _get_transport(api_mode=None) that uses a shared dict cache. 22 call sites updated — no behavioral change, just less boilerplate.

2. Collapse 65-line main normalize block → 7 lines

Before (3 branches, each with its own SimpleNamespace construction):

if self.api_mode == "codex_responses":
    _ct = self._get_transport()
    _cnr = _ct.normalize_response(response)
    # ... 35 lines of SimpleNamespace shim with codex-specific fields ...
elif self.api_mode == "anthropic_messages":
    _transport = self._get_transport()
    _nr = _transport.normalize_response(response, strip_tool_prefix=...)
    # ... 26 lines of SimpleNamespace shim ...
else:
    assistant_message = response.choices[0].message

After:

_transport = self._get_transport()
_normalize_kwargs = {}
if self.api_mode == "anthropic_messages":
    _normalize_kwargs["strip_tool_prefix"] = self._is_anthropic_oauth
_nr = _transport.normalize_response(response, **_normalize_kwargs)
assistant_message = self._nr_to_assistant_message(_nr)
finish_reason = _nr.finish_reason

The shared _nr_to_assistant_message() static method handles all 4 api_modes — extracts provider_data fields (codex_reasoning_items, reasoning_details, call_id, response_item_id) into the SimpleNamespace shape downstream expects.

3. Wire chat_completions + bedrock normalize through transports

These were previously falling through to the raw response.choices[0].message else branch. Now all 4 api_modes go through transport.normalize_response().

4. Remove 8 dead codex adapter imports

_chat_content_to_responses_parts, _codex_chat_messages_to_responses_input, _codex_normalize_codex_response, _codex_preflight_codex_api_kwargs, _codex_preflight_codex_input_items, _codex_responses_tools, _codex_extract_responses_message_text, _codex_extract_responses_reasoning_text — all have zero callers after PRs 1-6.

Impact

  • run_agent.py: -46 lines (11,988 → 11,941)
  • 1 test file updated (import path change)

What stays as-is (per plan)

  • flush_memories and _iteration_limit_summary secondary dispatch — low-traffic auxiliary paths, follow-up issue
  • The _nr_to_assistant_message shim itself — removed when downstream code migrates to read NormalizedResponse directly

Test plan

  • 2,605 run_agent + agent tests pass (4 pre-existing failures)
  • Zero regressions from our changes
  • 6K+ errors in other dirs are pre-existing environment issue (same on main)

Changed files

  • run_agent.py (modified, +76/-123)
  • tests/run_agent/test_run_agent_multimodal_prologue.py (modified, +2/-1)

Code Example

@dataclass
class ToolCall:
    id: str | None          # Protocol's canonical ID (call_XXXX, toolu_XXXX, etc.)
    name: str
    arguments: str          # JSON string
    provider_data: dict | None = None   # Per-tool-call protocol metadata

@dataclass
class NormalizedResponse:
    content: str | None
    tool_calls: list[ToolCall] | None
    finish_reason: str                  # "stop", "tool_calls", "length", "content_filter"
    reasoning: str | None = None        # Cross-provider (Anthropic, Codex, DeepSeek, Gemini)
    usage: Usage | None = None
    provider_data: dict | None = None   # Response-level protocol state

---

PR1 ──→ PR4
PR2 ──→ PR3 ──→ PR4
              ──→ PR5
              ──→ PR6
                    PR4+5+6 ──→ PR7 ──→ PR8 ──→ PR9

---

# providers/kimi.py
class KimiProvider:
    name = "kimi-coding"
    aliases = ["kimi", "moonshot"]
    api_mode = "chat_completions"
    
    # Auth (currently in hermes_cli/auth.py)
    env_vars = ["KIMI_API_KEY", "MOONSHOT_API_KEY"]
    base_url = "https://api.kimi.com/v1"
    
    # Client quirks (currently in run_agent.py __init__)
    default_headers = {"User-Agent": "hermes-agent/1.0"}
    
    # Request quirks (currently in auxiliary_client.py)
    fixed_temperature = 0.6
    default_max_tokens = None

---

# providers/nvidia.py
class NvidiaProvider:
    name = "nvidia"
    api_mode = "chat_completions"
    env_vars = ["NVIDIA_API_KEY"]
    base_url = "https://integrate.api.nvidia.com/v1"
    default_max_tokens = 16384  # GLM-4.7 thinking exhaust fix

---

# providers/qwen.py
class QwenPortalProvider:
    name = "qwen-portal"
    api_mode = "chat_completions"
    env_vars = ["QWEN_API_KEY"]
    base_url = "https://portal.qwen.ai/api/v1"
    default_max_tokens = 65536
    
    def prepare_messages(self, messages):
        """Normalize content to list-of-dicts, inject cache_control."""
        ...
    
    def extra_body(self, session_id):
        return {
            "metadata": {"sessionId": session_id},
            "vl_high_resolution_images": True,
        }
RAW_BUFFERClick to expand / collapse

Overview

This is a two-cycle refactor of hermes-agent's provider infrastructure.

Cycle 1 (this issue): Transport layer — Extract format conversion and response normalization from run_agent.py into agent/transports/. Each transport owns convert_messages, convert_tools, build_kwargs, normalize_response. Client lifecycle, streaming, credentials, and prompt caching stay on AIAgent.

Cycle 2 (future): Provider modules — Consolidate per-provider quirks (currently scattered across 5+ files) into single-file provider definitions under providers/. Each provider module declares its auth, endpoints, client headers, temperature behavior, max_tokens defaults, message preprocessing, and extra_body construction in one place. Transports become generic — they read from the provider object instead of checking boolean flags. See Cycle 2 Design below.

Principle: Every PR wires its code to real production paths in the same PR. No dormant abstractions.


Shared Types (agent/transports/types.py)

@dataclass
class ToolCall:
    id: str | None          # Protocol's canonical ID (call_XXXX, toolu_XXXX, etc.)
    name: str
    arguments: str          # JSON string
    provider_data: dict | None = None   # Per-tool-call protocol metadata

@dataclass
class NormalizedResponse:
    content: str | None
    tool_calls: list[ToolCall] | None
    finish_reason: str                  # "stop", "tool_calls", "length", "content_filter"
    reasoning: str | None = None        # Cross-provider (Anthropic, Codex, DeepSeek, Gemini)
    usage: Usage | None = None
    provider_data: dict | None = None   # Response-level protocol state

Cycle 1: PR Tracker

PRStatusWhat it doesLines
PR 1 #12975✅ MergedExtract 10 Codex Responses API functions into agent/codex_responses_adapter.py-565 from run_agent.py
PR 2 #13347✅ MergedAdd agent/transports/types.py (NormalizedResponse, ToolCall, Usage) + migrate Anthropic normalize path+554
PR 3 #13366✅ MergedAdd ProviderTransport ABC + AnthropicTransport, wire all Anthropic paths (9 sites)+539/-45
PR 4 #13430🔄 OpenAdd ResponsesApiTransport, wire all Codex paths, remove 7 dead wrappers+590/-169
PR 5 #13447🔄 OpenAdd ChatCompletionsTransport, wire all default paths (210-line kwargs block extracted)+640/-227
PR 6 #13467🔄 OpenAdd BedrockTransport, wire all Bedrock paths+383/-13
PR 7📋 PlannedUnify dispatch — remove dead api_mode branches, collapse normalize shimsDep: 4,5,6
PR 8📋 PlannedSimplify runtime_provider.py — transport registry replaces manual api_mode routingDep: 7
PR 9📋 PlannedDocumentation — architecture guide, transport authoring guideDep: 8

Dependency Graph

PR1 ──→ PR4
PR2 ──→ PR3 ──→ PR4
              ──→ PR5
              ──→ PR6
                    PR4+5+6 ──→ PR7 ──→ PR8 ──→ PR9

What the Transport Owns vs What Stays on AIAgent

Transport ownsAIAgent keeps
convert_messages() — OpenAI msgs → provider formatClient construction (build_anthropic_client, etc.)
convert_tools() — OpenAI tools → provider formatClient rebuild/teardown on interrupt
build_kwargs() — assemble full API call kwargsCredential refresh/rotation
normalize_response() → NormalizedResponseStreaming (_call_anthropic, _run_codex_stream)
validate_response() — structural checkPrompt caching policy
extract_cache_stats() — provider-specific cache tokensRetry/interrupt threading
map_finish_reason() — provider stop reason → OpenAIFallback provider routing

Transport Coverage

api_modeTransportbuild_kwargsnormalizevalidatecache_statsfinish_reason
anthropic_messagesAnthropicTransport
codex_responsesResponsesApiTransport
chat_completionsChatCompletionsTransport
bedrock_converseBedrockTransport

Abort Points

Each PR delivers standalone value. Safe stopping points:

  • After PR 3 — one transport proven end-to-end, types established
  • After PR 6 — all 4 transports wired, transport layer complete
  • After PR 8 — runtime simplified, full Cycle 1 done
  • After PR 9 — documented, ready for Cycle 2

Known Gaps (from codebase stress test)

  1. reasoning_content vs reasoning — two distinct fields downstream, transport merges them into reasoning. The thinking-prefill check reads reasoning_content separately.
  2. Prompt caching runs between convert and build_kwargsapply_anthropic_cache_control mutates messages after conversion. Transport can't produce final API-ready messages alone.
  3. ChatCompletionsTransport has 13 provider conditionals — flags passed as explicit params. Works but the param list is long. This is the primary motivation for Cycle 2.
  4. flush_memories and iteration_limit_summary have their own normalize dispatch — wired through transports now but still have separate code paths.
  5. Bedrock normalizes at dispatch, not at the main loop — the transport handles both shapes (raw boto3 dict + already-normalized SimpleNamespace).
  6. _ephemeral_max_output_tokens is consumed by both Anthropic and chat_completions branches — shared agent state that both transports need.

Cycle 2: Provider Modules (Next)

Problem Cycle 1 leaves behind: Provider quirks are still scattered across auth.py, runtime_provider.py, models.py, auxiliary_client.py, run_agent.py, and the transports themselves. Adding a new provider requires touching 5+ files. The ChatCompletionsTransport takes 20+ boolean params because each provider's quirks are passed as flags.

Solution: Consolidate per-provider quirks into single-file provider modules under providers/. Each module declares everything about that provider in one place:

# providers/kimi.py
class KimiProvider:
    name = "kimi-coding"
    aliases = ["kimi", "moonshot"]
    api_mode = "chat_completions"
    
    # Auth (currently in hermes_cli/auth.py)
    env_vars = ["KIMI_API_KEY", "MOONSHOT_API_KEY"]
    base_url = "https://api.kimi.com/v1"
    
    # Client quirks (currently in run_agent.py __init__)
    default_headers = {"User-Agent": "hermes-agent/1.0"}
    
    # Request quirks (currently in auxiliary_client.py)
    fixed_temperature = 0.6
    default_max_tokens = None
# providers/nvidia.py
class NvidiaProvider:
    name = "nvidia"
    api_mode = "chat_completions"
    env_vars = ["NVIDIA_API_KEY"]
    base_url = "https://integrate.api.nvidia.com/v1"
    default_max_tokens = 16384  # GLM-4.7 thinking exhaust fix
# providers/qwen.py
class QwenPortalProvider:
    name = "qwen-portal"
    api_mode = "chat_completions"
    env_vars = ["QWEN_API_KEY"]
    base_url = "https://portal.qwen.ai/api/v1"
    default_max_tokens = 65536
    
    def prepare_messages(self, messages):
        """Normalize content to list-of-dicts, inject cache_control."""
        ...
    
    def extra_body(self, session_id):
        return {
            "metadata": {"sessionId": session_id},
            "vl_high_resolution_images": True,
        }

What changes:

  • Transport's build_kwargs receives a provider object instead of 20 flags
  • hermes_cli/auth.py reads ProviderConfig from provider modules
  • hermes_cli/runtime_provider.py resolves api_mode from provider registry
  • hermes_cli/models.py reads model lists from provider modules
  • auxiliary_client.py reads temperature/aux config from provider modules

What this enables:

  • Adding a new OpenAI-compatible provider = one file (providers/newprovider.py)
  • Each provider's behavior is testable in isolation
  • No more "search 5 files to understand how Kimi works"

Current quirk distribution (what Cycle 2 consolidates)

QuirkProviderCurrently inMoves to
Fixed temperature 0.6Kimiauxiliary_client.pyproviders/kimi.py
User-Agent headerKimirun_agent.py client initproviders/kimi.py
Default max_tokens 16384NVIDIAChatCompletionsTransportproviders/nvidia.py
Default max_tokens 65536QwenChatCompletionsTransportproviders/qwen.py
Message normalizationQwenrun_agent.py + transportproviders/qwen.py
vl_high_resolution_imagesQwenChatCompletionsTransportproviders/qwen.py
Developer role swapGPT-5/CodexChatCompletionsTransportproviders/openai_codex.py
think=false suppressionOllama/customChatCompletionsTransportproviders/custom.py
num_ctx overrideOllamaChatCompletionsTransportproviders/custom.py
Provider preferencesOpenRouterChatCompletionsTransportproviders/openrouter.py
Product attribution tagsNousChatCompletionsTransportproviders/nous.py
Reasoning extra_bodyOR/Nous/GitHubChatCompletionsTransporteach provider module
xAI conv headersxAI/GrokResponsesApiTransportproviders/xai.py
Thinking signaturesAnthropicAnthropicTransport → adapterproviders/anthropic.py
Guardrail configBedrockBedrockTransportproviders/bedrock.py
OAuth identity transformAnthropicadapterproviders/anthropic.py
Encrypted reasoningCodex/xAIResponsesApiTransporteach provider module

extent analysis

TL;DR

The primary motivation for Cycle 2 is to consolidate per-provider quirks into single-file provider modules, making it easier to add new providers and reduce code duplication.

Guidance

  1. Review the current quirk distribution: Examine the table in the issue body to understand how provider quirks are currently scattered across multiple files and how they will be consolidated in Cycle 2.
  2. Create provider modules: Start creating provider modules under providers/ for each provider, declaring their auth, endpoints, client headers, temperature behavior, and other quirks in one place.
  3. Update transports to use provider objects: Modify the transports to receive a provider object instead of multiple flags, allowing them to read the necessary information from the provider module.
  4. Refactor affected files: Update files like hermes_cli/auth.py, hermes_cli/runtime_provider.py, hermes_cli/models.py, and auxiliary_client.py to read configuration from the provider modules.

Example

# providers/kimi.py
class KimiProvider:
    name = "kimi-coding"
    aliases = ["kimi", "moonshot"]
    api_mode = "chat_completions"
    env_vars = ["KIMI_API_KEY", "MOONSHOT_API_KEY"]
    base_url = "https://api.kimi.com/v1"
    default_headers = {"User-Agent": "hermes-agent/1.0"}
    fixed_temperature = 0.6
    default_max_tokens = None

Notes

The solution involves significant refactoring, and it's essential to ensure that all provider quirks are correctly consolidated and that the transports are updated to use the new provider modules.

Recommendation

Apply the workaround by creating provider modules and updating the transports to use them, as this will simplify the codebase and make it easier to add new providers.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix tracking: provider transport refactor (agent/transports/) [3 pull requests, 1 participants]