hermes - ✅(Solved) Fix Generic 400/disconnect errors misclassified as context_overflow in 1M-context sessions [2 pull requests, 1 participants]

hermes2026-04-27 04:06:56

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16351•Fetched 2026-04-28 06:53:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

JayGwod

Participants

JayGwod

Timeline (top)

referenced ×6labeled ×3cross-referenced ×2

Error Message

from agent.error_classifier import classify_api_error

class FakeHTTP400(Exception): status_code = 400 body = {"error": {"message": "Error"}} def str(self): return "Error"

result = classify_api_error( FakeHTTP400(), provider="openai-codex", model="gpt-5.5", approx_tokens=74320, context_length=1_000_000, num_messages=432, )

print(result.reason, result.retryable, result.should_compress)

Root Cause

Current agent/error_classifier.py has heuristics equivalent to:

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or approx_tokens > 120000 or num_messages > 200

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or approx_tokens > 80000 or num_messages > 80

The absolute fallbacks are reasonable for ~128K/200K context windows, but they are too aggressive for 1M-context sessions. A long session can have hundreds of messages while still being well below the actual context budget.

Fix Action

Fixed

Fixed by PR: fix(error_classifier): avoid large-context false overflow heuristics (https://github.com/NousResearch/hermes-agent/pull/16352)
Fixed by PR: fix(error_classifier): gate absolute msg/token heuristics to small context windows (https://github.com/NousResearch/hermes-agent/pull/16380)

PR fix notes

PR #16352: fix(error_classifier): avoid large-context false overflow heuristics

Repository: NousResearch/hermes-agent
Author: JayGwod
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16352

Description (problem / solution / changelog)

Summary

Fixes a large-context false-positive in agent.error_classifier: generic HTTP 400 responses and server disconnects should not be classified as context_overflow solely because a 1M-context session has many messages.

The previous heuristic used absolute fallbacks:

approx_tokens > 80000 or num_messages > 80
approx_tokens > 120000 or num_messages > 200

Those thresholds are useful proxies for smaller context windows, but they are too aggressive for explicitly large windows. A 1M-context session can have 432 messages and ~74K estimated tokens while still being far below the real budget.

This patch keeps the relative pressure checks for all models, but gates the absolute token/message-count fallbacks to smaller context windows (<= 256000).

Behavior covered

Generic 400 with approx_tokens=74320, context_length=1_000_000, num_messages=432 is now format_error, not context_overflow.
Server disconnect with the same low-pressure 1M context shape is now timeout, not context_overflow.
Existing smaller-window behavior remains covered by existing tests.

Test plan

RED before fix:

/home/ubuntu/.hermes/hermes-agent/venv/bin/python -m pytest \
  tests/agent/test_error_classifier.py::TestClassifyApiError::test_400_generic_many_messages_below_large_context_pressure_is_format_error \
  tests/agent/test_error_classifier.py::TestClassifyApiError::test_disconnect_many_messages_below_large_context_pressure_is_timeout \
  -v -o 'addopts='

Both tests failed with FailoverReason.context_overflow.

GREEN after fix:

/home/ubuntu/.hermes/hermes-agent/venv/bin/python -m pytest tests/agent/test_error_classifier.py -q -o 'addopts='
/home/ubuntu/.hermes/hermes-agent/venv/bin/python -m py_compile agent/error_classifier.py tests/agent/test_error_classifier.py
git diff --check

Result:

120 passed

Manual reproduction after fix:

FakeHTTP400 FailoverReason.format_error False False
Exception FailoverReason.timeout True False

Fixes #16351

Related: #14499, #14858, #14953, #15844, #6751

Changed files

agent/error_classifier.py (modified, +12/-2)
tests/agent/test_error_classifier.py (modified, +32/-0)

PR #16380: fix(error_classifier): gate absolute msg/token heuristics to small context windows

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16380

Description (problem / solution / changelog)

Closes #16351.

Problem

agent/error_classifier.py flagged non-context errors as context_overflow in long-context (1M) Codex/GPT-5.x sessions, purely because num_messages > 80 (generic 400) or num_messages > 200 (disconnect) — even when approx_tokens was a fraction of the actual budget.

Repro from the issue:

classify_api_error(
    FakeHTTP400(),
    provider="openai-codex",
    model="gpt-5.5",
    approx_tokens=74320,
    context_length=1_000_000,
    num_messages=432,
)
# Before: FailoverReason.context_overflow (retryable=True, should_compress=True)
# After:  FailoverReason.format_error      (retryable=False, should_compress=False)

That sent format errors into the compression/probe-down path, causing unnecessary compaction and stale handoff pollution on 1M sessions.

Fix

Apply exactly the gate suggested in the issue body: scope absolute token/message-count fallbacks to context_length <= 256000. Relative pressure thresholds (> 0.6 for disconnect, > 0.4 for generic 400) still fire on any context size.

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or (
    context_length <= 256000 and (approx_tokens > 120000 or num_messages > 200)
)

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or (
    context_length <= 256000 and (approx_tokens > 80000 or num_messages > 80)
)

Existing behavior for ~128K/200K context windows is unchanged.

Tests

tests/agent/test_error_classifier.py — 4 new tests covering the 1M-context regime:

test_400_generic_1m_context_high_message_count_not_overflow — exact repro from issue (74K tokens, 432 msgs, 1M ctx) → format_error.
test_400_generic_1m_context_relative_pressure_still_overflow — 500K tokens / 1M ctx still → context_overflow.
test_disconnect_1m_context_high_message_count_is_timeout — 150K tokens, 300 msgs, 1M ctx → timeout.
test_disconnect_1m_context_relative_pressure_still_overflow — 700K tokens / 1M ctx still → context_overflow.

pytest tests/agent/test_error_classifier.py -q
122 passed (118 pre-existing + 4 new)

Changed files

agent/error_classifier.py (modified, +6/-2)
tests/agent/test_error_classifier.py (modified, +62/-0)

Code Example

from agent.error_classifier import classify_api_error

class FakeHTTP400(Exception):
    status_code = 400
    body = {"error": {"message": "Error"}}
    def __str__(self):
        return "Error"

result = classify_api_error(
    FakeHTTP400(),
    provider="openai-codex",
    model="gpt-5.5",
    approx_tokens=74320,
    context_length=1_000_000,
    num_messages=432,
)

print(result.reason, result.retryable, result.should_compress)

---

FailoverReason.context_overflow True True

---

FailoverReason.format_error False False

---

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or approx_tokens > 120000 or num_messages > 200

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or approx_tokens > 80000 or num_messages > 80

---

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or (
    context_length <= 256000 and (approx_tokens > 120000 or num_messages > 200)
)

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or (
    context_length <= 256000 and (approx_tokens > 80000 or num_messages > 80)
)

RAW_BUFFERClick to expand / collapse

Bug description

agent.error_classifier.classify_api_error() can misclassify generic HTTP 400 errors and server disconnects as FailoverReason.context_overflow in explicitly large-context sessions (for example 1M-token Codex/GPT-5.x sessions), even when the prompt is far below the configured context window.

The problematic path is the absolute size/message-count heuristic. On current main, a generic 400 with many messages is classified as context overflow because num_messages > 80, even when approx_tokens is only ~74K against a 1M context window.

Minimal reproduction

from agent.error_classifier import classify_api_error

class FakeHTTP400(Exception):
    status_code = 400
    body = {"error": {"message": "Error"}}
    def __str__(self):
        return "Error"

result = classify_api_error(
    FakeHTTP400(),
    provider="openai-codex",
    model="gpt-5.5",
    approx_tokens=74320,
    context_length=1_000_000,
    num_messages=432,
)

print(result.reason, result.retryable, result.should_compress)

Current result:

FailoverReason.context_overflow True True

Expected result:

FailoverReason.format_error False False

A similar issue exists for server disconnect messages with the same low token pressure / high message count shape: the absolute num_messages > 200 branch classifies it as context_overflow instead of a transport/timeout condition.

Root cause

Current agent/error_classifier.py has heuristics equivalent to:

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or approx_tokens > 120000 or num_messages > 200

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or approx_tokens > 80000 or num_messages > 80

User impact

This sends non-context errors into the context-overflow recovery path. In long-context Codex sessions, that can cause unnecessary compression and runtime context probe-down from an explicit 1M window to lower probe tiers (currently 256K/128K depending on branch/version), which can lead to repeated compaction and stale handoff pollution.

Suggested fix

Gate the absolute token/message-count heuristics to smaller context windows, and require relative pressure for large-context models. For example:

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or (
    context_length <= 256000 and (approx_tokens > 120000 or num_messages > 200)
)

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or (
    context_length <= 256000 and (approx_tokens > 80000 or num_messages > 80)
)

This preserves existing behavior for smaller context windows while preventing 1M sessions from being classified as overflow solely because they have many messages.

Related work

Related but not identical:

#14499: prevents direct long-context probe collapse by changing probe tiers
#14858: guards untrusted probe shrink when the guessed tier is below the current prompt estimate
#14953: preserves explicit context window after generic overflow
#15844: merged context-length propagation/probe-tier changes
#6751: fixed one Codex 400-format-error compression loop by parsing flat 400 bodies

This issue is specifically about the classifier entering context_overflow too early for large context windows due to absolute message-count/token heuristics.

extent analysis

TL;DR

Update the agent/error_classifier.py heuristics to gate absolute token/message-count checks based on context window size to prevent misclassification of large-context sessions.

Guidance

Review the current agent/error_classifier.py heuristics and update the conditions to include context window size checks, as suggested in the issue.
Verify the changes by running the minimal reproduction code with the updated heuristics and checking the classification result.
Test the updated classifier with various input scenarios, including large-context sessions with high message counts, to ensure correct classification.
Consider adding additional logging or monitoring to detect and report any potential misclassifications.

Example

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or (
    context_length <= 256000 and (approx_tokens > 120000 or num_messages > 200)
)

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or (
    context_length <= 256000 and (approx_tokens > 80000 or num_messages > 80)
)

Notes

The suggested fix is specific to the agent/error_classifier.py file and may require additional testing and validation to ensure correct behavior in all scenarios.

Recommendation

Apply the suggested workaround by updating the agent/error_classifier.py heuristics to gate absolute token/message-count checks based on context window size, as this should prevent misclassification of large-context sessions and improve the overall accuracy of the error classifier.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Generic 400/disconnect errors misclassified as context_overflow in 1M-context sessions [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #16352: fix(error_classifier): avoid large-context false overflow heuristics

Description (problem / solution / changelog)

Summary

Behavior covered

Test plan

Changed files

PR #16380: fix(error_classifier): gate absolute msg/token heuristics to small context windows

Description (problem / solution / changelog)

Problem

Fix

Tests

Changed files

Code Example

Bug description

Minimal reproduction

Root cause

User impact

Suggested fix

Related work

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING