openclaw - 💡(How to fix) Fix Per-channel timeout configuration (telegram/discord/webchat/cron) [1 comments, 2 participants]

openclaw2026-04-29 09:22:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74249•Fetched 2026-04-30 06:26:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

fertilejim

Participants

clawsweeper[bot]

fertilejim

Timeline (top)

commented ×1cross-referenced ×1

Error Message

After the cooldown elapses, OpenAI sometimes leaves that profile in a soft-throttle state where requests hang without responding (no 429, no error, just dead connection)

Root Cause

Because all interactive channels share one timeout:

If you lower it (e.g. 90s) to fail-fast on hung profiles → legitimate slow runs (Anthropic Opus compaction on large contexts) get killed
If you raise it (e.g. 360s) so compaction survives → a single hung OAuth profile (silent server-side throttle, no 429) blocks Telegram for 6 minutes

Fix Action

Fix / Workaround

Workaround used today (suboptimal)

RAW_BUFFERClick to expand / collapse

Problem

Today agents.defaults.timeoutSeconds is a single global knob that applies to all interactive channels (Telegram, Webchat, Discord) plus is the default for cron jobs without explicit per-job override.

This forces a single trade-off across very different latency expectations:

Channel	User expectation	Realistic upper-bound
Telegram (interactive chat)	reply within seconds, fail-fast on hang	60–120s
Discord (interactive)	similar to Telegram	60–120s
Webchat (UI)	can wait for compaction / longer chains	180–300s
Cron (background)	varies by job; some up to 30 min	60–1800s, per job

Because all interactive channels share one timeout:

If you lower it (e.g. 90s) to fail-fast on hung profiles → legitimate slow runs (Anthropic Opus compaction on large contexts) get killed
If you raise it (e.g. 360s) so compaction survives → a single hung OAuth profile (silent server-side throttle, no 429) blocks Telegram for 6 minutes

In production this manifests as continuous tweaking of the global timeout each time a different bug surfaces. My team's memory file shows: 180s → 240s → 360s → 600s → back down → up — over months, with no architectural progress, just whack-a-mole.

Reproduction (silent profile-hang scenario)

Configure 3 OAuth profiles for openai-codex (e.g. ChatGPT Plus accounts)
Hit one profile's daily quota → it returns Try again in ~N min
After the cooldown elapses, OpenAI sometimes leaves that profile in a soft-throttle state where requests hang without responding (no 429, no error, just dead connection)
Send a Telegram message that routes to the throttled profile via advanceAuthProfile
Wait 6 minutes (the global timeoutSeconds) before failover happens
The user-facing channel (Telegram) appears completely unresponsive

Proposed solution

Allow per-channel timeout overrides in agents.defaults and agents.list[i]:

```jsonc { "agents": { "defaults": { "timeoutSeconds": 360, "channelTimeoutSeconds": { "telegram": 90, "discord": 90, "webchat": 240 } } } } ```

Resolution order at request time (highest priority first):

Cron-job specific payload.timeoutSeconds
Per-agent channelTimeoutSeconds[channel]
Global channelTimeoutSeconds[channel]
Per-agent timeoutSeconds
Global timeoutSeconds
DEFAULT_AGENT_TIMEOUT_SECONDS (600s in code)

Why this is more than just config sugar

The current architecture makes Telegram the worst-served lane: users expect <30s replies, but the global timeout has to accommodate Opus-compaction or 200k-context inference that legitimately needs 3–5 minutes. There's no way today to encode "Telegram should fail fast even if other lanes need longer".

Per-channel timeouts also enable a much faster failover loop on hung profiles. With Telegram at 90s, a silent-hang profile is rotated within 1.5 min instead of 6 min.

#43187 (Telegram polling stall — restart-cycle hang)
#65517 (active-memory plugin event-loop starve → Telegram unresponsive)

Both share the same underlying pain: the gateway has no granularity for "Telegram needs to fail fast".

Workaround used today (suboptimal)

Manually clear authProfileOverride per session when a profile hangs
Manually demote the hanging profile via openclaw models auth order set
Accept that the global timeout will be "wrong" for at least one lane

This is reactive whack-a-mole and doesn't scale.

Acceptance criteria

channelTimeoutSeconds parses from config, schema-validated
Runtime resolves the right timeout per request based on channel type
Existing timeoutSeconds semantics unchanged when channelTimeoutSeconds not set (backward compatible)
Reasonable defaults shipped (e.g. telegram=120, discord=120, webchat=300, fallback to global)
Documented in agent config reference

OpenClaw version: 2026.4.24

extent analysis

TL;DR

Implement per-channel timeout overrides in agents.defaults and agents.list[i] to allow for more granular control over timeout settings.

Guidance

Introduce a channelTimeoutSeconds object in the agents.defaults configuration to specify timeout values for each channel (e.g., Telegram, Discord, Webchat).
Update the resolution order to prioritize per-channel timeouts over the global timeout.
Set reasonable defaults for each channel, such as 120s for Telegram and Discord, and 300s for Webchat.
Ensure backward compatibility by maintaining the existing timeoutSeconds semantics when channelTimeoutSeconds is not set.

Example

{
  "agents": {
    "defaults": {
      "timeoutSeconds": 360,
      "channelTimeoutSeconds": {
        "telegram": 90,
        "discord": 90,
        "webchat": 240
      }
    }
  }
}

Notes

The proposed solution requires updates to the configuration schema and the runtime resolution of timeouts. It is essential to ensure that the new configuration is properly validated and that the existing timeoutSeconds semantics remain unchanged when channelTimeoutSeconds is not set.

Recommendation

Apply the proposed solution to introduce per-channel timeout overrides, as it provides a more granular and scalable approach to managing timeouts across different channels. This will enable faster failover on hung profiles and improve the overall responsiveness of the system.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Per-channel timeout configuration (telegram/discord/webchat/cron) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround used today (suboptimal)

Problem

Reproduction (silent profile-hang scenario)

Proposed solution

Why this is more than just config sugar

Related

Workaround used today (suboptimal)

Acceptance criteria

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Per-channel timeout configuration (telegram/discord/webchat/cron) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround used today (suboptimal)

Problem

Reproduction (silent profile-hang scenario)

Proposed solution

Why this is more than just config sugar

Related

Workaround used today (suboptimal)

Acceptance criteria

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING