feat: defense posture community patterns (CP-1001 — CP-1006) by ppcvote · Pull Request #1669 · NVIDIA/garak

ppcvote · 2026-04-05T06:26:26Z

Summary

Six YAML-based community patterns for assessing LLM system prompt defense posture, as discussed in #1666.

What this adds:

community_modules/contrib/defense-posture/ — 6 patterns + index + README
Each pattern includes static indicators (regex, <1ms) + behavioral criteria + calibration metadata
Based on defense pattern analysis of 1,646 unique production system prompts from 4 public datasets

Patterns

ID	Name	OWASP	Gap Rate (n=1,646)	Hardening
CP-1001	Role Boundary Defense	LLM01	92.4%	+2
CP-1002	System Prompt Data Leakage	LLM01	9.4%	+3
CP-1003	Multi-Language Bypass Resistance	LLM01	64.3%	+3
CP-1004	Social Engineering Resistance	LLM01	71.4%	+2
CP-1005	Output Weaponization Defense	LLM02	88.3%	+2
CP-1006	Indirect Injection via External Data	LLM01	97.8%	+3

Average defense score: 36/100. Only 1.1% scored A. 78.3% scored F.

Design

Each pattern supports two scoring modes in one pass:

Static (`static_indicators`): Regex patterns for <1ms hardening score. Zero cost.
Behavioral (`behavioral`): Pass/fail criteria for model inference. Returns 0.0 (defended) → 1.0 (compromised).

Data source

1,646 unique production system prompts from 4 public datasets:

jujumilk3/leaked-system-prompts (121 — ChatGPT, Claude, Grok, Perplexity, Cursor, v0)
x1xhlol/system-prompts-and-models-of-ai-tools (80 — Cursor, Windsurf, Devin, Augment)
elder-plinius/CL4R1T4S (56 — Claude, Gemini, Grok)
LouisShark/chatgpt_system_prompt (1,389 — GPT Store custom GPTs)

Scanned with prompt-defense-audit (deterministic regex, <5ms). Deduplicated by content hash.

Fully reproducible: clone the 4 dataset repos and run the scanner.

Limitations: Regex measures keyword presence, not behavioral resilience. Leaked prompts may be outdated. Selection bias possible. GPT Store prompts (84% of sample) are typically less hardened than platform-level prompts.

Calibration readiness

Each pattern includes `calibration.expected_false_refusal_delta`. The `hardening_score_contribution` fields sum to 15, enabling the "hardening score ≥ 10" threshold analysis discussed in #1666.

Ref: #1666

Six YAML-based community patterns for assessing LLM system prompt defense posture, as discussed in NVIDIA#1666. Each pattern includes: - Probe prompts with attack metadata - Static indicators (regex, <1ms) for hardening score - Behavioral pass/fail criteria for model inference scoring - Calibration metadata for false-refusal correlation - Empirical gap rates from 721 production AI applications Patterns: - CP-1001: Role Boundary Defense (41% gap rate) - CP-1002: System Prompt Data Leakage (59% gap rate) - CP-1003: Multi-Language Bypass Resistance (72% gap rate) - CP-1004: Social Engineering Resistance (82% gap rate) - CP-1005: Output Weaponization Defense (66% gap rate) - CP-1006: Indirect Injection via External Data (96% gap rate) Total hardening score: 0-15 (threshold >= 10 for "adequately hardened") Dataset: doi:10.5281/zenodo.19410475 Ref: NVIDIA#1666

github-actions · 2026-04-05T06:26:39Z

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

ppcvote · 2026-04-05T06:30:30Z

I have read the DCO Document and I hereby sign the DCO

ppcvote · 2026-04-05T06:31:31Z

recheck

…oduction prompts Previous data incorrectly used HTML analysis of 721 websites as proxy for system prompt defense rates. This update uses actual system prompt analysis from jujumilk3/leaked-system-prompts (n=121). Key changes: - Source: jujumilk3/leaked-system-prompts (not website HTML scans) - Sample: 121 real production system prompts (not 721 website URLs) - All gap rates updated to match actual measurements - Methodology description corrected - Limitations section added to README

ppcvote mentioned this pull request Apr 5, 2026

Proposal: Prompt defense posture detectors — assess system prompt hardening #1666

Open

github-actions bot added a commit that referenced this pull request Apr 5, 2026

@ppcvote has signed the CLA in #1669

ca68c29

This was referenced Apr 5, 2026

MCP Server Security Scanning: OWASP MCP Top 10 coverage #1639

Open

feat: Add Chain-of-Thought manipulation probes #1668

Open

ppcvote added 5 commits April 5, 2026 14:50

fix: update links to English versions for international reviewers

93d92d7

data: upgrade to n=1,646 from 4 datasets (was n=121 from 1 dataset)

68bec7f

fix: remove all residual n=121 references from YAML descriptions

6a51402

fix: mark calibration deltas as estimated, fix dataset field consistency

f55e0be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: defense posture community patterns (CP-1001 — CP-1006)#1669

feat: defense posture community patterns (CP-1001 — CP-1006)#1669
ppcvote wants to merge 6 commits intoNVIDIA:mainfrom
ppcvote:feat/community-defense-posture-patterns

ppcvote commented Apr 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

ppcvote commented Apr 5, 2026

Uh oh!

ppcvote commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ppcvote commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Patterns

Design

Data source

Calibration readiness

Uh oh!

github-actions bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ppcvote commented Apr 5, 2026

Uh oh!

ppcvote commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ppcvote commented Apr 5, 2026 •

edited

Loading

github-actions bot commented Apr 5, 2026 •

edited

Loading