openclaw - 💡(How to fix) Fix Per-channel timeout configuration (telegram/discord/webchat/cron) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74249Fetched 2026-04-30 06:26:42
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1

Error Message

  1. After the cooldown elapses, OpenAI sometimes leaves that profile in a soft-throttle state where requests hang without responding (no 429, no error, just dead connection)

Root Cause

Because all interactive channels share one timeout:

  • If you lower it (e.g. 90s) to fail-fast on hung profiles → legitimate slow runs (Anthropic Opus compaction on large contexts) get killed
  • If you raise it (e.g. 360s) so compaction survives → a single hung OAuth profile (silent server-side throttle, no 429) blocks Telegram for 6 minutes

Fix Action

Fix / Workaround

Workaround used today (suboptimal)

RAW_BUFFERClick to expand / collapse

Problem

Today agents.defaults.timeoutSeconds is a single global knob that applies to all interactive channels (Telegram, Webchat, Discord) plus is the default for cron jobs without explicit per-job override.

This forces a single trade-off across very different latency expectations:

ChannelUser expectationRealistic upper-bound
Telegram (interactive chat)reply within seconds, fail-fast on hang60–120s
Discord (interactive)similar to Telegram60–120s
Webchat (UI)can wait for compaction / longer chains180–300s
Cron (background)varies by job; some up to 30 min60–1800s, per job

Because all interactive channels share one timeout:

  • If you lower it (e.g. 90s) to fail-fast on hung profiles → legitimate slow runs (Anthropic Opus compaction on large contexts) get killed
  • If you raise it (e.g. 360s) so compaction survives → a single hung OAuth profile (silent server-side throttle, no 429) blocks Telegram for 6 minutes

In production this manifests as continuous tweaking of the global timeout each time a different bug surfaces. My team's memory file shows: 180s → 240s → 360s → 600s → back down → up — over months, with no architectural progress, just whack-a-mole.

Reproduction (silent profile-hang scenario)

  1. Configure 3 OAuth profiles for openai-codex (e.g. ChatGPT Plus accounts)
  2. Hit one profile's daily quota → it returns Try again in ~N min
  3. After the cooldown elapses, OpenAI sometimes leaves that profile in a soft-throttle state where requests hang without responding (no 429, no error, just dead connection)
  4. Send a Telegram message that routes to the throttled profile via advanceAuthProfile
  5. Wait 6 minutes (the global timeoutSeconds) before failover happens
  6. The user-facing channel (Telegram) appears completely unresponsive

Proposed solution

Allow per-channel timeout overrides in agents.defaults and agents.list[i]:

```jsonc { "agents": { "defaults": { "timeoutSeconds": 360, "channelTimeoutSeconds": { "telegram": 90, "discord": 90, "webchat": 240 } } } } ```

Resolution order at request time (highest priority first):

  1. Cron-job specific payload.timeoutSeconds
  2. Per-agent channelTimeoutSeconds[channel]
  3. Global channelTimeoutSeconds[channel]
  4. Per-agent timeoutSeconds
  5. Global timeoutSeconds
  6. DEFAULT_AGENT_TIMEOUT_SECONDS (600s in code)

Why this is more than just config sugar

The current architecture makes Telegram the worst-served lane: users expect <30s replies, but the global timeout has to accommodate Opus-compaction or 200k-context inference that legitimately needs 3–5 minutes. There's no way today to encode "Telegram should fail fast even if other lanes need longer".

Per-channel timeouts also enable a much faster failover loop on hung profiles. With Telegram at 90s, a silent-hang profile is rotated within 1.5 min instead of 6 min.

Related

  • #43187 (Telegram polling stall — restart-cycle hang)
  • #65517 (active-memory plugin event-loop starve → Telegram unresponsive)

Both share the same underlying pain: the gateway has no granularity for "Telegram needs to fail fast".

Workaround used today (suboptimal)

  • Manually clear authProfileOverride per session when a profile hangs
  • Manually demote the hanging profile via openclaw models auth order set
  • Accept that the global timeout will be "wrong" for at least one lane

This is reactive whack-a-mole and doesn't scale.

Acceptance criteria

  • channelTimeoutSeconds parses from config, schema-validated
  • Runtime resolves the right timeout per request based on channel type
  • Existing timeoutSeconds semantics unchanged when channelTimeoutSeconds not set (backward compatible)
  • Reasonable defaults shipped (e.g. telegram=120, discord=120, webchat=300, fallback to global)
  • Documented in agent config reference

OpenClaw version: 2026.4.24

extent analysis

TL;DR

Implement per-channel timeout overrides in agents.defaults and agents.list[i] to allow for more granular control over timeout settings.

Guidance

  • Introduce a channelTimeoutSeconds object in the agents.defaults configuration to specify timeout values for each channel (e.g., Telegram, Discord, Webchat).
  • Update the resolution order to prioritize per-channel timeouts over the global timeout.
  • Set reasonable defaults for each channel, such as 120s for Telegram and Discord, and 300s for Webchat.
  • Ensure backward compatibility by maintaining the existing timeoutSeconds semantics when channelTimeoutSeconds is not set.

Example

{
  "agents": {
    "defaults": {
      "timeoutSeconds": 360,
      "channelTimeoutSeconds": {
        "telegram": 90,
        "discord": 90,
        "webchat": 240
      }
    }
  }
}

Notes

The proposed solution requires updates to the configuration schema and the runtime resolution of timeouts. It is essential to ensure that the new configuration is properly validated and that the existing timeoutSeconds semantics remain unchanged when channelTimeoutSeconds is not set.

Recommendation

Apply the proposed solution to introduce per-channel timeout overrides, as it provides a more granular and scalable approach to managing timeouts across different channels. This will enable faster failover on hung profiles and improve the overall responsiveness of the system.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING