openclaw - ✅(Solved) Fix Regression after 2026.3.28: sessionStrategy behavior changed, ws-stream 500 fallback, slower Discord interaction handling [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56881Fetched 2026-04-08 01:46:34
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
1
Author
Participants
Timeline (top)
cross-referenced ×3commented ×1

After upgrading OpenClaw from a working older build to 2026.3.28, our local setup started behaving incorrectly. Rolling back to 2026.3.24 immediately restores normal behavior.

At the moment this looks like an OpenClaw regression rather than a problem in our local reranker server or in memory-lancedb-pro alone.

Root Cause

So the reranker itself is not the root cause.

Fix Action

Fix / Workaround

As a workaround, explicitly setting:

PR fix notes

PR #55535: fix: keep openai-codex on HTTP responses transport

Description (problem / solution / changelog)

Summary

  • keep openai-codex on its existing HTTP responses path instead of routing it into the generic OpenAI websocket transport selector
  • update the websocket transport selector test to explicitly reject the openai-codex / openai-codex-responses pair

Change Type

  • Bug fix
  • New feature
  • Breaking change
  • Refactor
  • Docs
  • Test-only

Scope

This PR is intentionally narrow.

It only changes websocket transport eligibility for openai-codex on the embedded runner path.

It does not change:

  • the openai-codex provider normalization logic
  • HTTP request payload behavior for Codex
  • any image/media-understanding routing
  • any memory provider routing

Linked Issue/PR

  • Related #55523
  • Related #56826
  • Related #56881
  • Follow-up to merged PR #53702
  • This PR fixes a bug or regression

Root Cause / Regression History

openai-codex models normalize to the ChatGPT backend HTTP path (https://chatgpt.com/backend-api) and use openai-codex-responses.

After #53702, the embedded runner's websocket eligibility selector also treated the openai-codex / openai-codex-responses pair as websocket-eligible. However, the websocket connection manager still targets the generic OpenAI Responses websocket endpoint rather than a Codex-specific transport target.

In isolated upstream-main smoke testing, that caused openai-codex requests to attempt websocket first, fail with HTTP 500, and then fall back to HTTP.

This patch keeps openai-codex on the existing HTTP path until there is a verified provider-specific websocket target and end-to-end support for it.

Behavior Changes

Before:

  • openai-codex / openai-codex-responses entered the generic OpenAI websocket selector
  • websocket could fail and fall back to HTTP in isolated smoke tests

After:

  • only native openai / openai-responses is websocket-eligible
  • openai-codex stays on HTTP responses transport

Regression Test Plan

Updated:

  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.websocket.test.ts

Coverage:

  • accepts the native openai / openai-responses websocket pair
  • rejects openai-codex / openai-codex-responses
  • rejects mismatched provider/API websocket pairs

Repro + Verification

Observed on merged upstream/main during isolated smoke validation of #53702:

  • openai-codex/gpt-5.4 requests succeeded, but only after websocket failed and the runner fell back to HTTP
  • gateway log showed websocket connect failure with HTTP 500 before fallback

With this patch:

  • the websocket selector no longer routes openai-codex into the generic OpenAI websocket path
  • focused websocket selector test passes locally

Tests

Passed locally:

  • corepack pnpm test -- src/agents/pi-embedded-runner/run/attempt.spawn-workspace.websocket.test.ts --reporter=verbose
  • pre-commit pnpm check passed during commit

Risks and Mitigations

Risk:

  • if openai-codex later gets a valid provider-specific websocket endpoint, this patch will keep it on HTTP until that support is added explicitly

Mitigation:

  • this preserves the existing working HTTP transport and avoids the currently observed websocket->HTTP fallback path
  • the selector remains a small, centralized surface to expand later once Codex websocket support is verified end-to-end

AI assistance

AI-assisted: drafted and implemented with Codex, then locally reviewed and tested by me.

Changed files

  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.websocket.test.ts (modified, +6/-9)
  • src/agents/pi-embedded-runner/run/attempt.thread-helpers.ts (modified, +4/-4)

Code Example

session-strategy: using none (plugin memory-reflection hooks disabled)

---

"plugins": {
  "entries": {
    "memory-lancedb-pro": {
      "config": {
        "sessionStrategy": "memoryReflection"
      }
    }
  }
}

---

memory-reflection: integrated hooks registered (command:new, command:reset, after_tool_call, before_prompt_build, session_end)

---

[EventQueue] Slow listener detected: InteractionEventListener took 4254ms for event INTERACTION_CREATE

---

[EventQueue] Slow listener detected: InteractionEventListener took 2240ms for event INTERACTION_CREATE

---

[ws-stream] WebSocket connect failed ... Unexpected server response: 500
falling back to HTTP

---

"agents": {
  "defaults": {
    "memorySearch": {
      "enabled": false
    }
  }
}

---

"plugins": {
  "entries": {
    "memory-lancedb-pro": {
      "enabled": true,
      "config": {
        "autoCapture": true,
        "autoRecall": true,
        "embedding": {
          "baseURL": "http://localhost:11434/v1",
          "model": "jina-v5-retrieval-test",
          "dimensions": 1024
        },
        "retrieval": {
          "mode": "hybrid",
          "rerank": "cross-encoder",
          "rerankProvider": "siliconflow",
          "rerankModel": "BAAI/bge-reranker-base",
          "rerankEndpoint": "http://127.0.0.1:18799/v1/rerank",
          "rerankApiKey": "local"
        },
        "autoRecallTimeoutMs": 120000
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

After upgrading OpenClaw from a working older build to 2026.3.28, our local setup started behaving incorrectly. Rolling back to 2026.3.24 immediately restores normal behavior.

At the moment this looks like an OpenClaw regression rather than a problem in our local reranker server or in memory-lancedb-pro alone.

What changed after upgrading to 2026.3.28

1) memory-lancedb-pro started resolving sessionStrategy as none

After the upgrade, logs showed:

session-strategy: using none (plugin memory-reflection hooks disabled)

This disabled memory-reflection hooks unexpectedly.

As a workaround, explicitly setting:

"plugins": {
  "entries": {
    "memory-lancedb-pro": {
      "config": {
        "sessionStrategy": "memoryReflection"
      }
    }
  }
}

and restarting restored:

memory-reflection: integrated hooks registered (command:new, command:reset, after_tool_call, before_prompt_build, session_end)

So something in 2026.3.28 appears to have changed how runtime/plugin config is resolved.

2) Discord interaction handling became slower

Logs also showed:

[EventQueue] Slow listener detected: InteractionEventListener took 4254ms for event INTERACTION_CREATE

and later:

[EventQueue] Slow listener detected: InteractionEventListener took 2240ms for event INTERACTION_CREATE

3) Embedded websocket streaming started failing

Repeated logs:

[ws-stream] WebSocket connect failed ... Unexpected server response: 500
falling back to HTTP

This started appearing around the same time and may be part of the same regression.

What is NOT the problem

Reranker server is healthy

Local reranker endpoint is up and responds correctly:

  • endpoint: http://127.0.0.1:18799/v1/rerank
  • server process is running normally
  • manual POST test returns 200 OK

So the reranker itself is not the root cause.

Rolling back OpenClaw fixes it

Most importantly:

  • OpenClaw 2026.3.28 → broken / degraded behavior
  • OpenClaw 2026.3.24 → works normally again

That strongly suggests an OpenClaw-side regression introduced after 2026.3.24.

Environment

  • OS: Windows 10.0.26200 x64
  • Node: 24.13.0
  • OpenClaw problematic version: 2026.3.28
  • OpenClaw known-good rollback: 2026.3.24
  • Plugin: [email protected]

Relevant config excerpts

Agent defaults

"agents": {
  "defaults": {
    "memorySearch": {
      "enabled": false
    }
  }
}

memory-lancedb-pro

"plugins": {
  "entries": {
    "memory-lancedb-pro": {
      "enabled": true,
      "config": {
        "autoCapture": true,
        "autoRecall": true,
        "embedding": {
          "baseURL": "http://localhost:11434/v1",
          "model": "jina-v5-retrieval-test",
          "dimensions": 1024
        },
        "retrieval": {
          "mode": "hybrid",
          "rerank": "cross-encoder",
          "rerankProvider": "siliconflow",
          "rerankModel": "BAAI/bge-reranker-base",
          "rerankEndpoint": "http://127.0.0.1:18799/v1/rerank",
          "rerankApiKey": "local"
        },
        "autoRecallTimeoutMs": 120000
      }
    }
  }
}

Why I think this should be investigated in OpenClaw

Because the problem disappears completely on rollback to 2026.3.24, I suspect one or more regressions in 2026.3.28 related to:

  1. plugin config resolution / default propagation into plugin runtime
  2. session strategy behavior affecting plugin hook registration
  3. embedded websocket streaming (ws-stream) behavior
  4. Discord interaction handling latency in the event queue

Request

Could you help identify which change after 2026.3.24 caused:

  • memory-lancedb-pro to behave as if sessionStrategy=none
  • websocket streaming to start failing with HTTP 500 fallback
  • Discord interaction listener latency to increase noticeably

If needed, I can provide more logs and a more minimal reproduction.

extent analysis

Fix Plan

To address the issues introduced in OpenClaw version 2026.3.28, follow these steps:

  1. Explicitly set sessionStrategy for memory-lancedb-pro: Update your configuration to include the sessionStrategy explicitly set to "memoryReflection" for the memory-lancedb-pro plugin:
    "plugins": {
      "entries": {
        "memory-lancedb-pro": {
          "config": {
            "sessionStrategy": "memoryReflection"
          }
        }
      }
    }
  2. Adjust Discord Interaction Handling:
    • Review the event queue configuration and ensure that the InteractionEventListener is properly optimized. This might involve adjusting the event handling logic to reduce processing time.
    • Consider implementing a timeout or a queue limit to prevent the event listener from taking too long.
  3. Fix Embedded Websocket Streaming:
    • Investigate the cause of the HTTP 500 errors. This could be due to a misconfiguration or an issue with the server handling the websocket connections.
    • Ensure that the ws-stream configuration is correct and that the server is properly set up to handle websocket connections.
  4. Verify Plugin Config Resolution:
    • Check how plugin configurations are resolved and propagated in OpenClaw 2026.3.28. There might be changes in how default configurations are applied or overridden.
    • Ensure that all necessary configurations for memory-lancedb-pro and other plugins are correctly set and not overridden by default settings.

Verification

To verify that these fixes work:

  • Restart your application with the updated configurations.
  • Monitor logs for the memory-reflection hooks registration and ensure they are integrated correctly.
  • Test Discord interaction handling for latency issues.
  • Verify that websocket streaming is working without falling back to HTTP due to 500 errors.

Extra Tips

  • Regularly review OpenClaw's changelog and documentation for any changes that might affect plugin configurations or behavior.
  • Consider setting up automated tests to catch regressions early.
  • If issues persist, providing more detailed logs or a minimal reproduction environment can help in identifying the root cause more accurately.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING