Back to Issue home

autograd error

#autograd-error

Sorted by views, then solution_desc, solution, and root_cause length (desc).

4412 issues

[Bug]: Telegram forum topic loses ACP/OpenCode routing after heavy bound turn; topic recovers only after gateway restart and then fails again under load

**Title** Telegram forum topic loses ACP/OpenCode routing after heavy bound turn; topic recovers only after gateway restart and then fails again under load **Body** I’m seeing a topic-local failure in OpenClaw’s Telegram ACP thread binding. Environment: * Host/runtime: OpenClaw Gateway running locally on Linux (WSL2, kernel 5.15), Node.js v22.22.1; gateway service is `systemd` managed and reported as `running` (`openclaw status`). * OpenClaw version/channel: stable channel, app/npm latest reported as `2026.3.11` (`openclaw status`). * Transport: Telegram bot channel enabled, using forum topics in group `-1003351905082` (Waggelgroep); issue reproduced in topic context (this thread is topic `1`). * Telegram routing config: `channels.telegram.threadBindings.enabled=true` and `channels.telegram.threadBindings.spawnAcpSessions=true` in `~/.openclaw/openclaw.json`. * Group/topic policy: Telegram group policy is allowlisted (`groupPolicy=allowlist`), messages from authorized sender `5558998798`; routing is topic-aware (forum thread context preserved). * ACP/OpenCode path: ACP runtime/plugin is enabled (`acpx` enabled in config); sessions are persisted under `~/.openclaw/agents/opencode/sessions/`. * Session evidence of topic isolation: persisted OpenCode session metadata includes topic-scoped keys (`groupId` values like `-1000000005082:topic:<id>`) and explicit `threadId`, confirming per-topic routing context instead of global group routing. * Operational symptom: after high-volume / long bound turns in a topic, follow-up messages in the same topic stop routing to the bound ACP/OpenCode session; restarting gateway restores routing temporarily (slash-command work), but the very next message kills it again. **Strongest evidence from logs** 1. **Message reaches Telegram gateway path** * OpenClaw logs raw Telegram updates for the bound topic, including ordinary follow-up messages like `"."`. 2. **Message reaches ACP/OpenCode** * OpenCode loads the persistent session, accepts `POST /session/.../message`, starts `session.prompt step=0`, and resolves tools. 3. **OpenCode remains alive after the topic appears dead** * OpenCode continues emitting `message.part.updated`, `message.part.delta`, and tool/subagent activity after the handoff point. 4. **Telegram side wedges** * OpenClaw logs `typing TTL exceeded (60000ms), auto-stopping typing indicator` instead of a normal usable completion in the topic. **Correlation with heavy turns** This seems much more likely on complex turns, especially when sub-agent/task delegation is involved. In the logs, heavier runs create a developer subagent session and generate a denser nested event stream. I’m treating this as a correlation, not definitive proof of cause. **Likely non-causes** * This does **not** look like “OpenCode never launched”. There are logs proving the bound session received the message and continued processing. * This does **not** look like simple agent-permission inheritance from the orchestrator. The developer subagent is created and proceeds with its own work; what is denied there is further `task` delegation, not the whole execution path. * A separate Telegram chunking bug with `---` exists, but that is a different issue; in my case, the more severe failure persists even after isolating around that. Telegram channel settings support different chunking/streaming modes, so this appears distinct from the already-known delivery fragility around formatting/preview. ([[OpenClaw](https://docs.openclaw.ai/channels/telegram?utm_source=chatgpt.com)][1]) **Working theory** This looks like a bug in the Telegram topic-bound ACP bridge/routing layer inside OpenClaw: * inbound topic message is accepted, * bound ACP session receives and processes it, * but outbound propagation or topic-local routing state wedges under heavier nested event streams, * and after that the topic may stop reaching OpenClaw at all until gateway restart.

unknown model architecture: 'qwen35moe' when loading imported GGUF with mmproj (vision projector)

Imported Qwen3.5-35B-A3B GGUF models fail to load when a vision projector (mmproj) file is attached. The same model loads fine for text-only (without mmproj), and loads fine with mmproj via llama.cpp's --mmproj flag. Ollama version 0.17.7 Steps to reproduce 1. Download a community Qwen 3.5 GGUF (e.g., from llmfan46/Qwen3.5-35B-A3B-heretic-v2-GGUF) and its mmproj file (Qwen3.5-35B-A3B-mmproj-BF16.gguf) 2. Create a Modelfile: FROM Qwen3.5-35B-A3B-heretic-v2-Q5_K_M.gguf FROM Qwen3.5-35B-A3B-mmproj-BF16.gguf TEMPLATE """{{ .Prompt }}""" 3. ollama create qwen3.5:test -f Modelfile → succeeds 4. ollama run qwen3.5:test → fails Also tried ADAPTER instead of second FROM — same result. Error llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' Expected behavior The model should load with vision support, same as it does with llama.cpp: llama-server -m Qwen3.5-35B-A3B-heretic-v2-Q5_K_M.gguf --mmproj Qwen3.5-35B-A3B-mmproj-BF16.gguf -c 4096 This works perfectly — text and vision both functional. Notes - Without mmproj, the model loads fine for text (families: ['qwen35moe']) - With mmproj, families becomes ['qwen35moe', 'clip'] and loading fails - The official qwen3.5:35b works with vision because it has native qwen35moe.vision.* tensors embedded in the main GGUF — no clip involved - PR #14517 fixed text-only loading of imported qwen35moe GGUFs but the multimodal/clip runner path was not updated for this architecture - GPU: 2x RTX 5060 16GB