hermes - 💡(How to fix) Fix Discord thread action latency: 338s–405s responses with provider 300s timeout + retry (gpt-5.3-codex)

hermes2026-05-26 13:01:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

In a Discord thread session, simple user requests took ~5.6 to 6.7 minutes to complete. The gateway responded, but end-to-end latency was dominated by a provider stall timeout and retry behavior.

Root Cause

Secondary factors that increase perceived latency:

Multiple tool calls in one turn (expected but not dominant vs 300s timeout)
Occasional extra patch/read cycles due to stale-file warning and failed match (not root cause of 5+ minute delay, but adds overhead)

Fix Action

Fix / Workaround

Secondary factors that increase perceived latency:

Multiple tool calls in one turn (expected but not dominant vs 300s timeout)
Occasional extra patch/read cycles due to stale-file warning and failed match (not root cause of 5+ minute delay, but adds overhead)

Code Example

6707| 2026-05-26 12:00:15.612 inbound message ("Let’s plan my workout for today")
6708| 2026-05-26 12:00:15.873 Langfuse tracing started
6712| 2026-05-26 12:07:00.468 response ready ... time=404.9s api_calls=11

6739| 2026-05-26 12:51:56.567 inbound message ("Remove that shit from today’s log too")
6740| 2026-05-26 12:51:56.821 Langfuse tracing started
6742| 2026-05-26 12:57:34.988 response ready ... time=338.4s api_calls=6

---

6741| 12:53:04 memory_monitor rss=280MB threads=21
6744| 12:58:04 memory_monitor rss=280MB threads=20

---

⏳ Still working... (3 min elapsed — iteration 7/60, waiting for non-streaming response (120s elapsed))
⚠️ No response from provider for 300s (non-streaming, model: gpt-5.3-codex). Aborting call.
⏳ Retrying in 3.0s (attempt 1/3)...
⏳ Still working... (6 min elapsed — iteration 8/60, receiving stream response)

---

⚠️ No response from provider for 300s (non-streaming, model: gpt-5.3-codex). Aborting call.
⏳ Retrying in 2.9s (attempt 1/3)...

RAW_BUFFERClick to expand / collapse

Summary

In a Discord thread session, simple user requests took ~5.6 to 6.7 minutes to complete. The gateway responded, but end-to-end latency was dominated by a provider stall timeout and retry behavior.

User impact

User asked for a single-file edit ("Remove that shit from today’s log too")
Response took 338.4s
Prior turn in same thread took 404.9s
User explicitly reported frustration about the 6-minute delay

Environment / context

Platform: Discord thread
Model: gpt-5.3-codex
Provider: openai-codex
Host: Linux 5.15.0-177-generic
Session id (gateway trace key): 20260526_120015_527597c5
Langfuse trace id repeatedly shown: 880e9a572038cf13f6c5b3f3c9cd21d8

Evidence (gateway.log)

From /home/hermes/.hermes/logs/gateway.log:

6707| 2026-05-26 12:00:15.612 inbound message ("Let’s plan my workout for today")
6708| 2026-05-26 12:00:15.873 Langfuse tracing started
6712| 2026-05-26 12:07:00.468 response ready ... time=404.9s api_calls=11

6739| 2026-05-26 12:51:56.567 inbound message ("Remove that shit from today’s log too")
6740| 2026-05-26 12:51:56.821 Langfuse tracing started
6742| 2026-05-26 12:57:34.988 response ready ... time=338.4s api_calls=6

Memory/thread health around the same period looked stable:

6741| 12:53:04 memory_monitor rss=280MB threads=21
6744| 12:58:04 memory_monitor rss=280MB threads=20

Evidence (Discord-visible runtime diagnostics)

From thread messages emitted by the bot runtime:

⏳ Still working... (3 min elapsed — iteration 7/60, waiting for non-streaming response (120s elapsed))
⚠️ No response from provider for 300s (non-streaming, model: gpt-5.3-codex). Aborting call.
⏳ Retrying in 3.0s (attempt 1/3)...
⏳ Still working... (6 min elapsed — iteration 8/60, receiving stream response)

A second similar timeout signal appeared later in the same thread:

⚠️ No response from provider for 300s (non-streaming, model: gpt-5.3-codex). Aborting call.
⏳ Retrying in 2.9s (attempt 1/3)...

What this suggests

Primary suspect is provider-side or provider-transport stall under non-streaming path, causing full 300s timeout before retry. Gateway itself remained alive and responsive; memory/threads looked normal.

Secondary factors that increase perceived latency:

Multiple tool calls in one turn (expected but not dominant vs 300s timeout)
Occasional extra patch/read cycles due to stale-file warning and failed match (not root cause of 5+ minute delay, but adds overhead)

Requested investigation

Inspect openai-codex provider call path for non-streaming stalls in this session around:
- 2026-05-26 12:00–12:07 UTC
- 2026-05-26 12:51–12:58 UTC
Verify whether retry policy can fail over faster (e.g., earlier timeout/hedged retry) instead of waiting full 300s.
Confirm whether partial progress checkpoints/user-visible incremental responses can reduce perceived freeze in long tool-heavy turns.
Correlate Langfuse trace 880e9a572038cf13f6c5b3f3c9cd21d8 with provider request lifecycle and transport timings.

Nice-to-have instrumentation improvements

Include provider request-id and per-attempt latency in gateway logs for each assistant turn
Emit explicit provider_call_start/provider_call_end/provider_timeout lines in agent.log
Surface retry cause + attempt durations in one structured event

Repro notes

This occurred in active multi-thread Discord usage while another thread was also being answered. Could be useful to test concurrency interactions under the same provider/model settings.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Discord thread action latency: 338s–405s responses with provider 300s timeout + retry (gpt-5.3-codex)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

User impact

Environment / context

Evidence (gateway.log)

Evidence (Discord-visible runtime diagnostics)

What this suggests

Requested investigation

Nice-to-have instrumentation improvements

Repro notes

Still need to ship something?

TRENDING