hermes - 💡(How to fix) Fix Skills Guard misses multi-word prompt-injection phrase variants [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

tools/skills_guard.py currently catches several prompt-injection phrases with flexible multi-word matching, but a few related phrases still use rigid adjacent-word regexes. That means benign-looking extra words can cause the cooperative Skills Guard scanner to miss variants such as:

  • system prompt temporary override
  • new temporary policy
  • updated internal guidelines
  • revised hidden instructions

This is not intended as a security-boundary report. Per the project threat model, Skills Guard is a heuristic / defense-in-depth scanner. The goal is to improve scanner coverage for common multi-word prompt-injection variants.

Root Cause

tools/skills_guard.py currently catches several prompt-injection phrases with flexible multi-word matching, but a few related phrases still use rigid adjacent-word regexes. That means benign-looking extra words can cause the cooperative Skills Guard scanner to miss variants such as:

  • system prompt temporary override
  • new temporary policy
  • updated internal guidelines
  • revised hidden instructions

This is not intended as a security-boundary report. Per the project threat model, Skills Guard is a heuristic / defense-in-depth scanner. The goal is to improve scanner coverage for common multi-word prompt-injection variants.

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Summary

tools/skills_guard.py currently catches several prompt-injection phrases with flexible multi-word matching, but a few related phrases still use rigid adjacent-word regexes. That means benign-looking extra words can cause the cooperative Skills Guard scanner to miss variants such as:

  • system prompt temporary override
  • new temporary policy
  • updated internal guidelines
  • revised hidden instructions

This is not intended as a security-boundary report. Per the project threat model, Skills Guard is a heuristic / defense-in-depth scanner. The goal is to improve scanner coverage for common multi-word prompt-injection variants.

Suggested fix

Widen the affected regexes in tools/skills_guard.py so the keyword phrases allow intervening words, matching the style already used by nearby patterns such as ignore ... previous ... instructions.

Candidate patterns:

  • system\s+(?:\w+\s+)*prompt\s+(?:\w+\s+)*override
  • new\s+(?:\w+\s+)*policy
  • updated\s+(?:\w+\s+)*guidelines
  • revised\s+(?:\w+\s+)*instructions

Regression coverage

Add tests under tests/tools/test_skills_guard.py that assert the scanner catches the multi-word variants above and maps them to the existing sys_prompt_override / fake_policy pattern IDs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Skills Guard misses multi-word prompt-injection phrase variants [2 pull requests]