openclaw - ✅(Solved) Fix Meta: correlated regression cluster in 2026.4.24 to 2026.4.26 around gateway startup/runtime/control-plane stability [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74630Fetched 2026-04-30 06:21:59
View on GitHub
Comments
2
Participants
3
Timeline
6
Reactions
0
Timeline (top)
cross-referenced ×3commented ×2subscribed ×1

This issue is a synthesis of the recent issue/comment corpus, not a claim that one exact root cause is already proven.

The evidence from issues created since 2026-04-17 strongly suggests that 2026.4.24 through 2026.4.26 should be treated as a correlated regression cluster rather than as a large set of unrelated bugs.

The recurring pattern across reports is:

  1. upgrade into 2026.4.24, 2026.4.25, or 2026.4.26
  2. gateway startup becomes slow, inconsistent, or only partially healthy
  3. bundled/plugin runtime-deps work, plugin bootstrap, startup discovery, or restart/reload paths appear to stall or churn
  4. loopback/gateway probe/WebSocket handshake failures start showing up
  5. long-lived channel connections then destabilize
  6. user-visible symptoms show up as WebSocket 1006, probe timeout, channels never connecting, delayed replies, stuck sessions, Telegram churn, Slack socket disconnects, or dropped/stale replies
  7. downgrade to an earlier known-good version removes the instability

This issue is meant to map the cluster and connect related reports. It is not asserting that every report below has one identical root cause, only that the issue and comment stream repeatedly points to a shared control-plane/runtime/bootstrap failure family.

Root Cause

This issue is a synthesis of the recent issue/comment corpus, not a claim that one exact root cause is already proven.

Fix Action

Fix / Workaround

  1. upgrade into 2026.4.24, 2026.4.25, or 2026.4.26
  2. gateway startup becomes slow, inconsistent, or only partially healthy
  3. bundled/plugin runtime-deps work, plugin bootstrap, startup discovery, or restart/reload paths appear to stall or churn
  4. loopback/gateway probe/WebSocket handshake failures start showing up
  5. long-lived channel connections then destabilize
  6. user-visible symptoms show up as WebSocket 1006, probe timeout, channels never connecting, delayed replies, stuck sessions, Telegram churn, Slack socket disconnects, or dropped/stale replies
  7. downgrade to an earlier known-good version removes the instability

4. Telegram-facing symptoms

Some Telegram issues look adapter-specific, but many severe incidents cluster around startup stalls, stuck sessions, runtime churn, or shared dispatch failure:

PR fix notes

PR #74762: fix: gateway model catalog cache regression

Description (problem / solution / changelog)

Summary

Found one regression in the new gateway model catalog cache: it treats an empty catalog as a successful cached catalog, which breaks the underlying retry-on-empty contract.

What ClawSweeper Is Fixing

  • Medium: Gateway caches transient empty model catalogs until reload/restart (regression)
    • File: src/gateway/server-model-catalog.ts:49
    • Evidence: startGatewayModelCatalogRefresh() assigns lastSuccessfulCatalog = catalog for every resolved array, including []. Later, loadGatewayModelCatalog() returns lastSuccessfulCatalog whenever it is truthy, and empty arrays are truthy in JS. The underlying loader explicitly avoids caching empty results at src/agents/model-catalog.ts:215 because an empty catalog can come from transient dependency/filesystem/provider issues and should be retried.
    • Impact: if the first gateway catalog load returns [], models.list, TUI model surfaces, session/model metadata helpers, and related gateway callers keep seeing no models until a model config reload or process restart. This is worse than the prior behavior, where the next request retried immediately.
    • Suggested fix: preserve the underlying no-cache-on-empty behavior in the gateway wrapper. Do not mark an empty result as fresh; keep the cache stale or clear it so the next call retries. Add a regression test where the injected loader returns [] once and a non-empty catalog on the second call.
    • Confidence: high

Expected Repair Surface

  • src/gateway/server-model-catalog.ts
  • src/gateway/server-model-catalog.test.ts
  • src/gateway/server-reload-handlers.ts

Source And Review Context

Expected validation

  • pnpm check:changed

ClawSweeper already ran:

  • pnpm docs:list
  • pnpm install after the first targeted test failed because node_modules was missing
  • pnpm test src/gateway/server-model-catalog.test.ts -- --reporter=verbose passed
  • Injected smoke with first loader call returning [] and second returning a model produced {"first":[],"second":[],"calls":1}, confirming the retry is suppressed
  • git diff --check 57a3d7f6e897f25073e313d5c24b6fb6f60575ae..6421e1f36a3cfdf3ab1b4502b36fe718e0d662d3

Known review limits:

  • Full suite and live gateway smoke were not run; review used focused gateway tests and an injected runtime proof.

ClawSweeper Guardrails

  • Re-check the finding against latest main before changing code.
  • Keep the patch to the narrowest behavior change and matching regression coverage.
  • Do not merge automatically; this PR stays for maintainer review.

ClawSweeper 🐠 replacement reef notes:

  • Cluster: clawsweeper-commit-openclaw-openclaw-6421e1f36a3c
  • Source PRs: none
  • Credit: Detected by ClawSweeper commit review for 6421e1f36a3cfdf3ab1b4502b36fe718e0d662d3.; Original commit author: Peter Steinberger.
  • Validation: pnpm check:changed

fish notes: model gpt-5.5, reasoning medium; reviewed against da5e171ffab1.

Changed files

  • src/gateway/server-model-catalog.test.ts (modified, +18/-0)
  • src/gateway/server-model-catalog.ts (modified, +1/-1)
RAW_BUFFERClick to expand / collapse

Meta: correlated regression cluster in 2026.4.24 to 2026.4.26 around gateway startup/runtime/control-plane stability

Summary

This issue is a synthesis of the recent issue/comment corpus, not a claim that one exact root cause is already proven.

The evidence from issues created since 2026-04-17 strongly suggests that 2026.4.24 through 2026.4.26 should be treated as a correlated regression cluster rather than as a large set of unrelated bugs.

The recurring pattern across reports is:

  1. upgrade into 2026.4.24, 2026.4.25, or 2026.4.26
  2. gateway startup becomes slow, inconsistent, or only partially healthy
  3. bundled/plugin runtime-deps work, plugin bootstrap, startup discovery, or restart/reload paths appear to stall or churn
  4. loopback/gateway probe/WebSocket handshake failures start showing up
  5. long-lived channel connections then destabilize
  6. user-visible symptoms show up as WebSocket 1006, probe timeout, channels never connecting, delayed replies, stuck sessions, Telegram churn, Slack socket disconnects, or dropped/stale replies
  7. downgrade to an earlier known-good version removes the instability

This issue is meant to map the cluster and connect related reports. It is not asserting that every report below has one identical root cause, only that the issue and comment stream repeatedly points to a shared control-plane/runtime/bootstrap failure family.

Why this looks like one regression family

Across the issue/comment corpus, the same higher-level themes recur:

  • bundled runtime deps / staged runtime-deps repair
  • plugin loader / plugin registry / manifest load behavior
  • startup-time gateway probe / readiness / handshake timing
  • event-loop starvation during startup or restart
  • restart / reload / stale runtime state after update
  • stuck processing / running session state causing channel-visible outages

A strong signal is that many channel-facing and transport-facing reports are explicitly being consolidated into a smaller set of control-plane/runtime trackers rather than being treated as isolated Telegram, Slack, or WebSocket bugs.

Release arc visible in the corpus

2026.4.24

Recurring reports around:

  • bonjour / CIAO crash-loop and hostname behavior
  • migration/runtime breakage
  • early update/restart instability

Relevant issues:

  • #72366
  • #72561
  • #72355
  • #72434
  • #72526
  • #72665
  • #73044

2026.4.25

Recurring reports around:

  • runtime-deps staging / plugin-loader / packaging fallout
  • startup-sidecar stalls
  • post-update unhealthy state
  • missing staged deps such as chokidar

Relevant issues:

  • #72846
  • #72848
  • #72882
  • #72956
  • #72992
  • #73176
  • #73140
  • #73332
  • #73524

2026.4.26

Recurring reports around:

  • event-loop starvation
  • CPU spin during or after startup
  • probe timeout / channel startup failure
  • loopback WebSocket handshake timeout / 1006
  • socket instability
  • stuck sessions and delayed or dropped replies

Relevant issues:

  • #72338
  • #73532
  • #73647
  • #73655
  • #73857
  • #73874
  • #74135
  • #74153
  • #74279
  • #74281
  • #74292
  • #74307
  • #74323
  • #74325
  • #74328
  • #74346
  • #74405
  • #74568
  • #74570

Symptom families that seem correlated

1. Gateway startup / readiness / probe / handshake instability

These issues repeatedly describe the gateway reaching ready or opening a listening socket, but remaining unhealthy for probes, channels, or loopback clients:

  • #72338
  • #73524
  • #74135
  • #74279
  • #74281
  • #74292
  • #74323
  • #74325
  • #74568

2. Runtime-deps / plugin bootstrap / staging / update fallout

These issues repeatedly connect startup breakage, plugin loss, stale runtime state, or broken update recovery to runtime-deps/bootstrap paths:

  • #72665
  • #72848
  • #72882
  • #72956
  • #72992
  • #73140
  • #73176
  • #73532
  • #73647
  • #74199
  • #74307
  • #74346
  • #74405
  • #74570
  • #74597

3. WebSocket / 1006 / loopback failures

These issues repeatedly show loopback handshake starvation, timeout, or disconnect loops that appear to be symptoms of blocked startup or runtime load:

  • #73044
  • #73524
  • #74135
  • #74279
  • #74292
  • #74323
  • #74449
  • #74568
  • #74583

4. Telegram-facing symptoms

Some Telegram issues look adapter-specific, but many severe incidents cluster around startup stalls, stuck sessions, runtime churn, or shared dispatch failure:

  • #72338
  • #73323
  • #73647
  • #74154
  • #74299
  • #74344
  • #74540
  • #74550
  • #74581

5. Slack-facing symptoms

Slack has some isolated transport-specific issues, but comments also repeatedly intersect with the same startup/runtime load class:

  • #72808
  • #73857
  • #74011
  • #74358
  • #74590

6. Session wedge / failover / compaction fallout

These issues repeatedly show processing or running sessions wedging the control plane and then surfacing as channel silence, stale output, or delayed replies:

  • #71127
  • #72903
  • #73510
  • #74153
  • #74154
  • #74550
  • #74607
  • #73204
  • #74073
  • #72676
  • #72697
  • #74239

Maintainer response pattern seen in comments

A recurring maintainer pattern across the corpus:

  1. broad reports get narrowed into a smaller number of runtime/control-plane trackers
  2. many issues are closed as fixed on current main with commit evidence
  3. published releases often remain broken while main has partial or full fixes
  4. only the broadest starvation/stall/socket-instability trackers remain open

That pattern makes the stream look fragmented at first glance, but in aggregate it supports treating this as a release-band regression cluster.

Strong candidate umbrella / canonical trackers

If maintainers think this meta issue should instead collapse into existing umbrella issues, the strongest candidates appear to be:

  • #72338
  • #73532
  • #73655
  • #74135

Working hypothesis, stated cautiously

The most evidence-backed reading of the current corpus is:

2026.4.24 through 2026.4.26 introduced an overlapping runtime/bootstrap/control-plane regression cluster. WebSocket instability, probe timeout, Telegram churn, Slack socket disconnects, stuck sessions, delayed replies, and startup/channel failures are often downstream manifestations of that shared instability rather than isolated adapter-only bugs.

This is a mapping / synthesis hypothesis, not a claim that one exact single defect has been proven.

What would be useful next

To confirm or falsify this cluster framing, the most useful maintainer follow-up would likely be:

  • identify whether the broadest remaining failures all still reproduce on a current post-2026.4.26 build
  • determine how much is explained by:
    • runtime-deps staging/repair
    • plugin manifest/registry load churn
    • readiness/probe timing
    • startup event-loop starvation
    • stuck-session recovery gaps
  • decide whether one existing umbrella issue should own this cluster, or whether a dedicated meta tracker like this is useful

Notes

This synthesis intentionally omits private infrastructure details, local paths, hostnames, IPs, org names, and copied local logs. It is based on the public issue/comment stream and tries to connect patterns without overclaiming certainty.

extent analysis

TL;DR

Downgrade to a version prior to 2026.4.24 or wait for a fixed version to be released, as the correlated regression cluster introduced in 2026.4.24 through 2026.4.26 causes gateway startup, runtime, and control-plane stability issues.

Guidance

  • Identify if the issues still reproduce on a current post-2026.4.26 build to confirm the regression cluster.
  • Investigate the role of runtime-deps staging/repair, plugin manifest/registry load churn, readiness/probe timing, startup event-loop starvation, and stuck-session recovery gaps in the failures.
  • Consider using a dedicated meta tracker like this issue to connect patterns and follow up on the regression cluster.
  • Review the strongest candidate umbrella trackers (#72338, #73532, #73655, #74135) for potential fixes or workarounds.

Example

No specific code snippet can be provided without more context, but reviewing the runtime-deps staging and plugin manifest loading code may help identify the root cause of the regression cluster.

Notes

The provided information is based on the public issue/comment stream and may not reflect the full scope of the issue. Further investigation is needed to confirm the root cause and develop a comprehensive fix.

Recommendation

Apply a workaround by downgrading to a version prior to 2026.4.24 until a fixed version is released, as the regression cluster affects gateway startup, runtime, and control-plane stability.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Meta: correlated regression cluster in 2026.4.24 to 2026.4.26 around gateway startup/runtime/control-plane stability [1 pull requests, 2 comments, 3 participants]