hermes - ✅(Solved) Fix Wrapped API errors lose nested body details and misclassify transient 402s as billing failures [4 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14195Fetched 2026-04-23 07:46:12
View on GitHub
Comments
0
Participants
1
Timeline
10
Reactions
0
Participants
Timeline (top)
cross-referenced ×4labeled ×3referenced ×3

classify_error() walks __cause__ / __context__ to extract a nested status code, but _extract_error_body() only inspects the top-level exception. When an SDK/API error is wrapped, Hermes can keep the nested 402 status code but lose the nested body message that distinguishes transient rate limits from billing failures.

Error Message

classify_error() walks __cause__ / __context__ to extract a nested status code, but _extract_error_body() only inspects the top-level exception. When an SDK/API error is wrapped, Hermes can keep the nested 402 status code but lose the nested body message that distinguishes transient rate limits from billing failures.

  • status_code = _extract_status_code(error) walks the cause chain
  • body = _extract_error_body(error) does not walk the cause chain So a wrapped error like this:
  • outer exception: Exception("outer")
  • __cause__: provider/SDK exception with status_code=402 and body message "Usage limit reached, try again in 5 minutes" Wrap a mock API error with:
  • body = {"error": {"message": "Usage limit reached, try again in 5 minutes"}} inside an outer exception and pass the outer exception to classify_error().

Root Cause

Expected

Classify as a transient/rate-limit condition, because the nested body explicitly says to retry later.

Fix Action

Fixed

PR fix notes

PR #14219: fix(agent): preserve nested error bodies during classification

Description (problem / solution / changelog)

Summary

  • walk wrapped exceptions when extracting structured API error bodies
  • keep nested provider messages available for message-aware classification
  • add regression coverage for wrapped 402 rate-limit bodies

Root cause

  • classify_api_error() already walks __cause__ / __context__ for status codes
  • _extract_error_body() only inspected the top-level exception
  • wrapped SDK errors therefore kept status_code=402 but lost the nested body message that says the condition is transient

Fix

  • make _extract_error_body() traverse the same cause/context chain as _extract_status_code()
  • stop at the first structured body or response.json() payload found

Regression coverage

  • add a helper-level test proving _extract_error_body() reads the nested body from __cause__
  • add an end-to-end classifier test proving a wrapped 402 with "try again" now classifies as rate_limit

Testing

  • scripts/run_tests.sh tests/agent/test_error_classifier.py::TestExtractErrorBody::test_from_cause_chain tests/agent/test_error_classifier.py::TestClassifyApiError::test_wrapped_402_uses_nested_body_message
  • scripts/run_tests.sh tests/agent/test_error_classifier.py
  • scripts/run_tests.sh tests/agent/test_gemini_cloudcode.py::TestGeminiHttpErrorParsing::test_status_code_flows_through_error_classifier

Closes #14195

Changed files

  • agent/error_classifier.py (modified, +19/-13)
  • tests/agent/test_error_classifier.py (modified, +25/-0)

PR #14287: fix(agent): preserve nested API error bodies

Description (problem / solution / changelog)

Summary

  • make _extract_error_body walk cause/context the same way _extract_status_code already does
  • preserve nested SDK body messages when an outer wrapper exception hides the real API payload
  • add regressions for wrapped body extraction and wrapped 402 transient-limit classification

Testing

  • python3 -m pytest -o addopts='' tests/agent/test_error_classifier.py

Fixes #14195

Changed files

  • agent/error_classifier.py (modified, +19/-13)
  • tests/agent/test_error_classifier.py (modified, +27/-0)

PR #2: fix: apply P1 issue hardening (todo/registry/retry)

Description (problem / solution / changelog)

결론

GitHub P1 오픈 이슈 4건의 핵심 재현 케이스를 테스트로 먼저 추가하고, 최소 수정으로 런타임 방어 로직을 반영했습니다.

반영 이슈

변경 요약

  1. todo_tool 하드닝 (#14185)
  • todos가 문자열이면 JSON 파싱 시도
  • 파싱 실패 시 구조화 에러 반환
  • list 타입 검증 추가
  • _validate, _dedupe_by_id에서 non-dict 입력 방어
  1. registry dispatch 하드닝 (#14186)
  • _normalize_tool_name() 추가
  • CamelCase, _tool suffix, 구분자 드리프트를 정규화 fallback으로 처리
  • 예: TodoTool_tool -> todo, WriteFile_tool -> write_file
  1. run_agent 분류 수정 (#14271)
  • local validation fast-fail 분류에서 json.JSONDecodeError 제외
  • transient JSON 파싱 실패가 retry 경로를 타도록 보정
  1. error classifier 보강 (#14195)
  • _extract_error_body()__cause__/__context__ 체인을 따라 nested body를 추출하도록 수정
  • wrapped 402 오류에서 nested message("usage limit ... try again")를 잃지 않도록 보정
  • billing 오분류를 줄이고 transient rate_limit 분류 정확도 개선

테스트

  • scripts/run_tests.sh tests/tools/test_todo_tool.py tests/tools/test_registry.py tests/run_agent/test_run_agent.py::TestRetryExhaustion::test_jsondecode_error_is_retried_not_treated_as_local_validation
  • scripts/run_tests.sh tests/agent/test_error_classifier.py tests/tools/test_todo_tool.py tests/tools/test_registry.py tests/run_agent/test_run_agent.py::TestRetryExhaustion::test_jsondecode_error_is_retried_not_treated_as_local_validation

모두 통과했습니다.

비고

  • aideautomation/aide_hermes_agent 저장소는 코드 구조가 현재 hermes-agent와 상이하여 동일 커밋 체리픽이 불가했습니다.
  • 실행 가능한 반영은 aideautomation/hermes-agent 포크 기준으로 완료했습니다.

Changed files

  • agent/error_classifier.py (modified, +24/-13)
  • run_agent.py (modified, +1/-1)
  • tests/agent/test_error_classifier.py (modified, +23/-0)
  • tests/run_agent/test_run_agent.py (modified, +20/-0)
  • tests/tools/test_registry.py (modified, +24/-0)
  • tests/tools/test_todo_tool.py (modified, +23/-0)
  • tools/registry.py (modified, +43/-1)
  • tools/todo_tool.py (modified, +16/-2)

PR #14349: fix(agent): inspect wrapped API error bodies

Description (problem / solution / changelog)

Summary

  • Fixes #14195.
  • Makes _extract_error_body() walk wrapped exception cause/context chains, matching _extract_status_code().
  • Preserves top-level body precedence while allowing nested SDK bodies to drive retry/billing classification.

Root cause

Wrapped SDK exceptions could expose a nested HTTP status code while hiding the nested structured body. That made transient 402 usage-limit responses look like plain billing failures.

Tests

  • uv run --frozen --python 3.11 --extra dev pytest -o addopts='' tests/agent/test_error_classifier.py -q
  • git diff --check

Changed files

  • agent/error_classifier.py (modified, +19/-13)
  • tests/agent/test_error_classifier.py (modified, +38/-0)
RAW_BUFFERClick to expand / collapse

Summary

classify_error() walks __cause__ / __context__ to extract a nested status code, but _extract_error_body() only inspects the top-level exception. When an SDK/API error is wrapped, Hermes can keep the nested 402 status code but lose the nested body message that distinguishes transient rate limits from billing failures.

Affected files

  • agent/error_classifier.py:263-266
  • agent/error_classifier.py:774-788

Why this is a bug

At classification time:

  • status_code = _extract_status_code(error) walks the cause chain
  • body = _extract_error_body(error) does not walk the cause chain

So a wrapped error like this:

  • outer exception: Exception("outer")
  • __cause__: provider/SDK exception with status_code=402 and body message "Usage limit reached, try again in 5 minutes"

gets classified using:

  • status code: 402 (nested cause found)
  • message/body: outer / {} (nested body lost)

Minimal reproduction

Wrap a mock API error with:

  • status_code = 402
  • body = {"error": {"message": "Usage limit reached, try again in 5 minutes"}}

inside an outer exception and pass the outer exception to classify_error().

Expected

Classify as a transient/rate-limit condition, because the nested body explicitly says to retry later.

Actual

Classifies as billing, because the nested body is ignored and only the status code survives.

Suggested investigation

Make _extract_error_body() traverse __cause__ / __context__ the same way _extract_status_code() already does.

A regression test should cover wrapped 402/400 cases where only the nested body contains the decisive message.

extent analysis

TL;DR

Modify _extract_error_body() to walk the __cause__ and __context__ chain to extract the nested error body.

Guidance

  • Update _extract_error_body() to recursively check the __cause__ and __context__ attributes of the exception to find the nested error body.
  • Verify the fix by testing with a wrapped API error that has a nested body message, such as the minimal reproduction example provided.
  • Add a regression test to cover wrapped 402/400 cases where only the nested body contains the decisive message.
  • Review the classify_error() function to ensure it correctly handles the updated error body extraction.

Example

def _extract_error_body(error):
    if hasattr(error, '__cause__') and error.__cause__:
        return _extract_error_body(error.__cause__)
    elif hasattr(error, '__context__') and error.__context__:
        return _extract_error_body(error.__context__)
    # existing implementation to extract error body

Notes

This fix assumes that the nested error body is the first one found in the __cause__ or __context__ chain. If there are multiple nested error bodies, additional logic may be needed to determine which one to use.

Recommendation

Apply the workaround by modifying _extract_error_body() to walk the __cause__ and __context__ chain, as this will allow the function to correctly extract the nested error body and classify errors as transient or billing-related.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING