hermes - 💡(How to fix) Fix Bundled red-team skill descriptions are sent to every fallback provider. OpenAI Cyber Abuse warning triggered.

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

grep -E 'jailbreak|GODMODE|abliterate' ~/.hermes/.skills_prompt_snapshot.json
RAW_BUFFERClick to expand / collapse

Bundled red-team skill descriptions are sent to every fallback provider

Problem

The skill manifest (.skills_prompt_snapshot.json, built by agent/prompt_builder.py) is injected into the system prompt on every turn, so it goes to whatever provider serves that turn — including third-party APIs reached via fallback.

Two default-seeded builtins carry content-moderation trigger phrases in their description: frontmatter:

SkillPathDescription
godmodeskills/red-teaming/godmodeJailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN.
obliteratusskills/mlops/inference/obliteratusOBLITERATUS: abliterate LLM refusals (diff-in-means).

When the agent falls back to a provider like OpenAI, those phrases ship in the system prompt. Moderation classifiers score on input, so "Jailbreak LLMs" / "abliterate LLM refusals" can read as policy-violating content with no skill ever invoked.

Why it matters

This ships by default. Any install with a moderated third-party API anywhere in its fallback chain sends these phrases to that API on every fallback turn — no user action, no skill use. The realistic downside is an account warning or suspension on the fallback provider.

Reproduce

grep -E 'jailbreak|GODMODE|abliterate' ~/.hermes/.skills_prompt_snapshot.json

Returns matches on a default install — confirming the phrases are in the system prompt sent to whichever provider handles the turn.

Possible fixes (your call)

  1. Make godmode / obliteratus opt-in instead of default-seeded. Narrowest, and questions whether red-team tooling should auto-install into every profile at all.
  2. Reword the descriptions.
  3. Strip or rewrite flagged descriptions when the target provider is a moderated third-party API. Covers future descriptions too, but most invasive.

Happy to PR whichever you prefer.

Note

This surfaced from a real OpenAI "Cyber Abuse" warning on an account using OpenAI subscription fallback. OpenAI later reversed it as incorrectly issued, and I can't 100% prove these phrases were the cause though it's the only reasonable option. This OpenAI Business subscription has only been used as a Hermes fallback. This fallback instance was the first time my OpenAI had been used in the past 3 days (confirmed). It also only used 3% of my 5 hour usage, 1% weekly. There was not many logs to comb through, and confidence is high here; However, this is still filed as a latent exposure, not a confirmed incident. The phrases shipping to a moderated API/OAuth by default is the issue regardless of whether they triggered that particular warning.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING