hermes - ✅(Solved) Fix perf(telegram): drop pointless 1s sleep + same-thread retry on thread-not-found [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#28672Fetched 2026-05-20 04:02:37
View on GitHub
Comments
0
Participants
1
Timeline
11
Reactions
0
Author
Participants
Timeline (top)
labeled ×4referenced ×4cross-referenced ×2closed ×1

Error Message

Step 1 cannot succeed — if Telegram just rejected the thread, the thread is genuinely gone or stale. Sleeping 1s and re-sending the same request just adds 1s of latency to every thread-not-found error before the actual fix (drop thread_id) runs. Affects every thread-not-found error. Common for cron deliveries to deleted topics and any send to a stale dm_topic_id. Low risk — the second retry path was always going to run anyway, just 1s later.

Fix Action

Fix

Drop the retried_thread_not_found first-retry step. Go straight from "thread not found" to "retry without message_thread_id":

if self._is_thread_not_found_error(send_err) and effective_thread_id is not None:
    logger.warning(
        "[%s] Thread %s not found, retrying without message_thread_id",
        self.name, effective_thread_id,
    )
    used_thread_fallback = True
    effective_thread_id = None
    thread_kwargs = {"message_thread_id": None}
    continue

PR fix notes

PR #28681: fix(telegram): address 5 post-merge audit follow-ups

Description (problem / solution / changelog)

Fixes 5 small issues filed during the post-merge salvage audit. Single PR because each fix is small and they touch related files.

Resolves

  • closes #28670 — _GATEWAY_PROVIDER_ERROR_RE false-positives on legitimate prose
  • closes #28672 — pointless 1s sleep + same-thread retry on thread-not-found
  • closes #28674 — Bot API rejection of direct_messages_topic_id had no retry path
  • closes #28676 — dead image-document branch superseded by earlier merge
  • closes #28678 — chat-scoped allowlist doesn't cover channel posts

Changes

#28670 — anchor + length-cap the provider-error sanitizer

_looks_like_gateway_provider_error now uses an anchored regex (^\s*(\W*\s*)?...) and refuses to rewrite messages over 400 chars or 4+ lines. A user asking 'what does HTTP 404 mean?' on Telegram no longer has their entire reply replaced with the provider-error template; the rewrite still fires on actual short provider error envelopes that lead with 'HTTP 503' / 'API call failed' / etc.

#28672 — drop the 1s sleep, keep the retry

The original code did asyncio.sleep(1) then retried with the same message_thread_id. The sleep added latency on every thread-not-found error; the retry IS sometimes useful (Telegram has a one-off thread-not-found flake mode exercised by test_send_retries_transient_thread_not_found_before_fallback), so I kept the retry but removed the sleep.

#28674 — extend DM-topic retry predicate

_should_retry_without_dm_topic_reply_anchor previously required reply_to_message_id is not None, so synthetic / resumed sends that route via direct_messages_topic_id (no anchor) had no retry path if Bot API rejected the topic id. Predicate now also fires when direct_messages_topic_id is set and the BadRequest mentions a topic/thread routing failure. The retry path already correctly strips both fields together — only the trigger needed widening.

#28676 — remove dead branch

Lines 4947-4960 of gateway/platforms/telegram.py checked ext in SUPPORTED_IMAGE_DOCUMENT_TYPES for .png/.jpg/.jpeg/.webp/.gif. The earlier branch at line 4896 (bd0c54d17 fix: route Telegram image documents through photo handling) already handles the exact same extension set and returns before reaching here. Replaced the dead block with a comment.

#28678 — chat-scoped auth covers channels

source.chat_type in {'group', 'forum'} extended to {'group', 'forum', 'channel'} for the chat-scoped allowlist in _is_user_authorized. Operators can now put a channel id in TELEGRAM_GROUP_ALLOWED_CHATS and channel posts get authorized correctly. Previously the only paths that worked for channels were either listing the channel id in TELEGRAM_ALLOWED_USERS (because _build_message_event synthesizes user_id = chat.id for channel posts) or GATEWAY_ALLOW_ALL_USERS=true.

Validation

scripts/run_tests.sh tests/gateway/test_telegram_thread_fallback.py -q    → 41/41
scripts/run_tests.sh tests/cron/test_scheduler.py -q                      → 127/127
scripts/run_tests.sh tests/gateway/test_telegram_thread_fallback.py tests/gateway/test_telegram_documents.py tests/gateway/test_telegram_channel_posts.py tests/gateway/test_unauthorized_dm_behavior.py tests/gateway/test_telegram_noise_filter.py tests/gateway/test_telegram_group_gating.py -q  → 144/147

The 3 failures in the broader set are pre-existing test-pollution failures that reproduce on plain main without these changes.

Changed files

  • gateway/platforms/telegram.py (modified, +51/-25)
  • gateway/run.py (modified, +42/-9)

PR #37: chore: sync with upstream main (2026-05-19)

Description (problem / solution / changelog)

Daily sync with upstream. Auto-created by cron job.

New upstream commits (2080): ff0a70381 fix(web): consume bundled design system assets (#26391) 070eeaae6 chore(deps): bump @babel/plugin-transform-modules-systemjs in /website 43f8edbaa chore(deps): bump fast-uri from 3.1.0 to 3.1.2 in /website a9c38c7c3 chore(deps): bump python-dotenv from 1.2.1 to 1.2.2 dffcb6ffd chore(deps): bump python-multipart from 0.0.22 to 0.0.27 7f1d1248a chore(deps): bump lodash-es and langium in /website c4bcc778c chore(deps): bump lodash from 4.17.23 to 4.18.1 in /website 0b75d24fd chore(deps): bump follow-redirects from 1.15.11 to 1.16.0 in /website fc90f1b6a chore(deps): bump dompurify from 3.3.3 to 3.4.2 in /website f1254b1bc fix(cli): exit prompt_toolkit cleanly on SIGTERM/SIGHUP instead of raising KeyboardInterrupt (#28688) 709e37e19 fix(dashboard): add scheduled kanban i18n strings (#28534) c4981167e chore(actions)(deps): bump actions/checkout from 4.3.1 to 6.0.2 7bcdced6c fix(kanban): respawn guard defers blocker_auth instead of auto-blocking (#28683) b10b78320 chore(actions)(deps): bump actions/setup-python from 5.3.0 to 6.2.0 bbee1dd7c chore(actions)(deps): bump docker/build-push-action from 6.19.2 to 7.1.0 269245740 chore(actions)(deps): bump docker/login-action from 3.7.0 to 4.1.0 424f2cc6e chore(actions)(deps): bump the actions-minor-patch group across 1 directory with 2 updates a3c753128 fix(telegram): address post-merge audit follow-ups (#28670, #28672, #28674, #28676, #28678) 88ee58f7d fix(kanban): stale reclaim must not tick failure counter (#28680) 7f253f555 fix(acp): use tempfile.gettempdir() in workspace auto-approve 58591d9e3 feat: show names of user-modified skills in bundled skill sync summary aedb8ac83 feat(update): syntax-validate critical files post-pull, auto-rollback on failure (#28669) a0bd11d02 fix(tests): catch up 25 stale tests after recent merges (#28626) 12c39830f fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975 4039e2abb chore(release): alias xxxigm noreply for upcoming #27986 salvage (#28594) 62573f44c fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes d759a67c0 fix: add recovery hints to loop guard warnings 87c6edc1d fix(skills): add timeout to Google OAuth urlopen calls b8a9cbd18 fix: tolerate unreadable gateway JSONL transcripts 663ee1486 fix(cron): allow emoji ZWJ sequences in prompts ...

Changed files

  • .env.example (modified, +1/-0)
  • .github/workflows/contributor-check.yml (modified, +1/-1)
  • .github/workflows/deploy-site.yml (modified, +2/-2)
  • .github/workflows/docker-publish.yml (modified, +13/-13)
  • .github/workflows/docs-site-checks.yml (modified, +2/-2)
  • .github/workflows/history-check.yml (modified, +1/-1)
  • .github/workflows/lint.yml (modified, +4/-4)
  • .github/workflows/nix-lockfile-fix.yml (modified, +2/-2)
  • .github/workflows/nix.yml (modified, +1/-1)
  • .github/workflows/osv-scanner.yml (modified, +1/-1)
  • .github/workflows/skills-index.yml (modified, +4/-4)
  • .github/workflows/supply-chain-audit.yml (modified, +2/-2)
  • .github/workflows/tests.yml (modified, +5/-2)
  • .github/workflows/upload_to_pypi.yml (modified, +3/-3)
  • .github/workflows/uv-lockfile-check.yml (modified, +1/-1)
  • AGENTS.md (modified, +8/-6)
  • README.md (modified, +1/-1)
  • acp_adapter/auth.py (modified, +13/-2)
  • acp_adapter/edit_approval.py (modified, +9/-1)
  • acp_adapter/permissions.py (modified, +22/-2)
  • acp_adapter/server.py (modified, +55/-1)
  • acp_adapter/tools.py (modified, +178/-13)
  • agent/agent_init.py (modified, +42/-7)
  • agent/agent_runtime_helpers.py (modified, +24/-3)
  • agent/anthropic_adapter.py (modified, +148/-14)
  • agent/auxiliary_client.py (modified, +189/-11)
  • agent/azure_identity_adapter.py (added, +555/-0)
  • agent/background_review.py (modified, +12/-0)
  • agent/chat_completion_helpers.py (modified, +15/-3)
  • agent/context_compressor.py (modified, +52/-3)
  • agent/conversation_compression.py (modified, +40/-4)
  • agent/conversation_loop.py (modified, +20/-5)
  • agent/copilot_acp_client.py (modified, +4/-1)
  • agent/credential_pool.py (modified, +118/-1)
  • agent/error_classifier.py (modified, +29/-0)
  • agent/memory_manager.py (modified, +59/-5)
  • agent/prompt_builder.py (modified, +7/-2)
  • agent/redact.py (modified, +1/-0)
  • agent/shell_hooks.py (modified, +4/-1)
  • agent/skill_bundles.py (added, +410/-0)
  • agent/skill_preprocessing.py (modified, +8/-0)
  • agent/system_prompt.py (modified, +6/-2)
  • agent/tool_guardrails.py (modified, +21/-4)
  • batch_runner.py (modified, +21/-2)
  • cli-config.yaml.example (modified, +9/-0)
  • cli.py (modified, +233/-13)
  • cron/jobs.py (modified, +44/-0)
  • cron/scheduler.py (modified, +147/-17)
  • gateway/config.py (modified, +51/-7)
  • gateway/platforms/base.py (modified, +45/-11)
  • gateway/platforms/dingtalk.py (modified, +10/-0)
  • gateway/platforms/matrix.py (modified, +37/-0)
  • gateway/platforms/mattermost.py (modified, +26/-5)
  • gateway/platforms/signal.py (modified, +25/-0)
  • gateway/platforms/telegram.py (modified, +740/-112)
  • gateway/platforms/telegram_network.py (modified, +10/-0)
  • gateway/platforms/wecom.py (modified, +1/-1)
  • gateway/run.py (modified, +885/-78)
  • gateway/session.py (modified, +17/-11)
  • gateway/session_context.py (modified, +8/-0)
  • gateway/sticker_cache.py (modified, +17/-4)
  • gateway/stream_consumer.py (modified, +36/-9)
  • hermes_cli/auth.py (modified, +470/-57)
  • hermes_cli/auth_commands.py (modified, +49/-0)
  • hermes_cli/azure_detect.py (modified, +126/-20)
  • hermes_cli/bundles.py (added, +229/-0)
  • hermes_cli/commands.py (modified, +32/-5)
  • hermes_cli/config.py (modified, +17/-0)
  • hermes_cli/cron.py (modified, +9/-0)
  • hermes_cli/doctor.py (modified, +92/-12)
  • hermes_cli/gateway.py (modified, +20/-11)
  • hermes_cli/gateway_windows.py (modified, +54/-3)
  • hermes_cli/kanban.py (modified, +327/-30)
  • hermes_cli/kanban_db.py (modified, +1138/-112)
  • hermes_cli/kanban_decompose.py (modified, +45/-8)
  • hermes_cli/kanban_diagnostics.py (modified, +9/-0)
  • hermes_cli/kanban_specify.py (modified, +7/-2)
  • hermes_cli/kanban_swarm.py (added, +279/-0)
  • hermes_cli/main.py (modified, +382/-31)
  • hermes_cli/model_switch.py (modified, +1/-1)
  • hermes_cli/oneshot.py (modified, +9/-0)
  • hermes_cli/providers.py (modified, +1/-0)
  • hermes_cli/proxy/adapters/__init__.py (modified, +2/-0)
  • hermes_cli/proxy/adapters/xai.py (added, +136/-0)
  • hermes_cli/proxy/cli.py (modified, +3/-2)
  • hermes_cli/runtime_provider.py (modified, +94/-7)
  • hermes_cli/skin_engine.py (modified, +6/-1)
  • hermes_cli/uninstall.py (modified, +1/-1)
  • hermes_cli/web_server.py (modified, +146/-27)
  • hermes_constants.py (modified, +74/-1)
  • hermes_logging.py (modified, +1/-1)
  • hermes_state.py (modified, +45/-0)
  • locales/af.yaml (modified, +1/-0)
  • locales/de.yaml (modified, +1/-0)
  • locales/en.yaml (modified, +1/-0)
  • locales/es.yaml (modified, +1/-0)
  • locales/fr.yaml (modified, +1/-0)
  • locales/ga.yaml (modified, +1/-0)
  • locales/hu.yaml (modified, +1/-0)
  • locales/it.yaml (modified, +1/-0)

Code Example

if self._is_thread_not_found_error(send_err) and effective_thread_id is not None:
    logger.warning(
        "[%s] Thread %s not found, retrying without message_thread_id",
        self.name, effective_thread_id,
    )
    used_thread_fallback = True
    effective_thread_id = None
    thread_kwargs = {"message_thread_id": None}
    continue
RAW_BUFFERClick to expand / collapse

From post-merge audit of PR #28505 (#25368 salvage, report cron topic fallback).

Bug

gateway/platforms/telegram.py around line 1755 handles "thread not found" errors with a two-step retry:

  1. First attempt: await asyncio.sleep(1) then retry with the SAME message_thread_id.
  2. Second attempt (after the first inevitably fails): retry without message_thread_id.

Step 1 cannot succeed — if Telegram just rejected the thread, the thread is genuinely gone or stale. Sleeping 1s and re-sending the same request just adds 1s of latency to every thread-not-found error before the actual fix (drop thread_id) runs.

Repro path

Cron delivery to a configured topic that was deleted on the Telegram side:

  1. Cron schedules send to telegram:chat_id:stale_thread_id.
  2. Telegram rejects: Bad Request: message thread not found.
  3. Adapter sleeps 1s, retries with same stale_thread_id → fails again.
  4. Adapter retries without thread_id → succeeds.
  5. Cron observes thread_fallback metadata, marks delivery degraded.

Total latency: ~1s + 2 HTTP round-trips when 1 HTTP round-trip would suffice.

Fix

Drop the retried_thread_not_found first-retry step. Go straight from "thread not found" to "retry without message_thread_id":

if self._is_thread_not_found_error(send_err) and effective_thread_id is not None:
    logger.warning(
        "[%s] Thread %s not found, retrying without message_thread_id",
        self.name, effective_thread_id,
    )
    used_thread_fallback = True
    effective_thread_id = None
    thread_kwargs = {"message_thread_id": None}
    continue

Scope

Affects every thread-not-found error. Common for cron deliveries to deleted topics and any send to a stale dm_topic_id. Low risk — the second retry path was always going to run anyway, just 1s later.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix perf(telegram): drop pointless 1s sleep + same-thread retry on thread-not-found [2 pull requests, 1 participants]