feat: Crescendo multi-turn jailbreak probes (Russinovich et al., 2024) by Christbowel · Pull Request #1653 · NVIDIA/garak

Christbowel · 2026-03-27T16:59:37Z

Implements the Crescendo multi-turn jailbreak attack (Russinovich et al., arXiv:2404.01833),
accepted at USENIX Security 2025. Crescendo is not currently in garak.

What this adds

garak/probes/crescendo.py

CrescendoReplay: pre-scripted replay probe, no auxiliary LLM required. Replays
fixed Crescendo attack conversations turn-by-turn. CrescendoCached kept as a
backwards-compatible alias. generations forced to 1 (pre-scripted turns have no
variance, replaying N times is waste).
Crescendo: fully adaptive probe using an attacker LLM that generates each turn
based on the target's prior response, with a comprehensive meta-prompt including
worked examples from the paper.

garak/detectors/judge.py

CrescendoJudge: two-stage LLM judge. Primary judge scores 0-100 against the
specific attack goal. Secondary judge activates when primary score < 70 and
re-evaluates the primary's explanation to correct false negatives caused by the
judge's own safety alignment — as described in the paper.

garak/data/crescendo/

crescendo_cached.jsonl: 4 pre-scripted attack conversations
prompt_template_attack.txt: meta-prompt with Crescendo technique description
and worked examples
prompt_template_backtrack.txt: backtrack prompt (opt-in extension)

Parameters

backtrack_on_refusal (default False): paper-faithful mode lets the attacker
LLM handle refusals organically. Set True to enable explicit FITD-inspired
backtracking as an extension.
secondary_detectors (default []): hook for cross-validation metrics such as
Perspective API or Azure Content Safety.

Faithfulness to paper

Faithful: adaptive attacker LLM, multi-turn escalation referencing target responses,
two-stage judge, max 10 turns, 0-100 scoring scale.

Documented divergences: meta-prompt examples are reconstructions (exact paper prompts
not published); secondary judge re-prompts instead of parsing primary's text inline;
backtrack_on_refusal is an opt-in extension not in the original paper.

Verification

python -m garak --target_type test.Blank --probes crescendo.CrescendoReplay
→ 26 attempts, completes in ~4s
python -m pytest tests/test_crescendo.py -v → 15 passed
Crescendo (adaptive) requires a configured attacker LLM via red_team_model_type
and red_team_model_name params (default: nim.NVOpenAIChat /
mistralai/mixtral-8x22b-instruct-v0.1)
CrescendoJudge requires a configured judge LLM (inherits ModelAsJudge defaults)

Implements the Crescendo attack (Russinovich et al., arXiv:2404.01833), accepted at USENIX Security 2025. - CrescendoReplay: pre-scripted replay probe, no auxiliary LLM required. CrescendoCached kept as a backwards-compatible alias. - Crescendo: fully adaptive probe using an attacker LLM that generates each turn based on the target's prior response. - judge.CrescendoJudge: two-stage LLM judge (primary 0-100 score + secondary correction for aligned-judge false negatives). - backtrack_on_refusal param (default False, paper-faithful) enables explicit FITD-inspired backtracking as an opt-in extension. - secondary_detectors hook for cross-validation metrics. - 15 unit tests, all passing. Closes NVIDIA#1513 Signed-off-by: christbowel <0xdeadbeef@christbowel.com>

github-actions · 2026-03-27T16:59:48Z

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

Christbowel · 2026-03-27T17:04:38Z

I have read the DCO Document and I hereby sign the DCO

Christbowel · 2026-03-27T20:28:30Z

recheck

jmartin-tech

Thanks for this. A terrific start, code inspection lead to some suggestions. The execution is not fully tested yet. Happy to iterate as needed to get this moved forward.

garak/detectors/judge.py

jmartin-tech · 2026-03-30T14:20:39Z

garak/probes/crescendo.py

+            if attempt.notes is None:
+                attempt.notes = {}
+            attempt.notes["cached_turns"] = turns
+            attempt.notes["turn_idx"] = 0


Why is this index needed? As I read this attack flow the index is simply len(attempt.turns) - 1.

Thanks for the suggestion. I double-checked the Attempt API and attempt.turns doesn't seem to exist as a direct attribute, it sits under attempt.conversations[0].turns. Beyond that, after sending cached_turns[0], the conversation contains [user_0, assistant_0] (len=2), so len(turns) - 1 would give 1 while turn_idx should still be 0 at that point. This would cause a skip on every iteration. I think keeping turn_idx explicitly in notes is the safest way to track position in cached_turns. Happy to revisit if I'm misreading the attack flow!

Sorry you are correct is should be atttempt.prompt.turns. Though I forgot to account for the assistant turns in as it builds. I still believe this should not be note this is state of the attack process not really the attempt.

Agreed on the distinction. The constraint is that _generate_next_attempts only receives last_attempt, so the position needs to travel with it somehow. I could maintain a dict on self keyed by attempt UUID, but that risks memory accumulation in long runs. Would a dedicated attack_state dict on the attempt be cleaner, or is there an existing pattern in the codebase you'd point me to?

garak/probes/crescendo.py

jmartin-tech · 2026-03-30T14:25:12Z

garak/probes/crescendo.py

+    def _generate_next_attempts(
+        self, last_attempt: garak.attempt.Attempt
+    ) -> Iterable[garak.attempt.Attempt]:
+        turn_idx = last_attempt.notes.get("turn_idx", 0)


As noted above this could be:

Suggested change

turn_idx = last_attempt.notes.get("turn_idx", 0)

turn_idx = len(last_attempt.turns) - 1

Same discussion as above, keeping turn_idx in notes for now pending your feedback on where state should live.

garak/probes/crescendo.py

jmartin-tech · 2026-03-30T15:40:06Z

garak/probes/crescendo.py

+        if next_attempt.notes is None:
+            next_attempt.notes = {}
+        next_attempt.notes["cached_turns"] = cached_turns
+        next_attempt.notes["turn_idx"] = next_idx


Based on idea that this value is len(next_attempt.turns)-1:

Suggested change

next_attempt.notes["turn_idx"] = next_idx

Same as above, keeping turn_idx pending your feedback on state management.

garak/detectors/judge.py

- Replace cached_turns in notes with cache_idx (index into self.cached_conversations) to avoid serializing full turn lists into every attempt in report.jsonl - Use attempt.goal instead of attempt.notes[goal] throughout, consistent with the native Attempt API - Propagate per-conversation goal from cached JSONL into attempt.goal - Add max_tokens: 1024 to red_team_model_config default (150 is too short for attacker LLM responses) - Simplify CrescendoJudge.detect() goal lookup to attempt.goal - Pass full conversation to judge instead of last message only, consistent with how Crescendo exploits accumulated context - Add docstring note on why CrescendoReplay uses conversation mode Signed-off-by: christbowel <0xdeadbeef@christbowel.com>

- Add docs/source/garak.probes.crescendo.rst - Add garak.probes.crescendo to docs/source/probes.rst toctree - Exclude crescendo probes from NON_PROMPT_PROBES in langservice test (same pattern as fitd.FITD: probes that use IterativeProbe do not populate self.prompts) Signed-off-by: christbowel <0xdeadbeef@christbowel.com>

github-actions bot added a commit that referenced this pull request Mar 27, 2026

@Christbowel has signed the CLA in #1653

220b1ff

jmartin-tech requested changes Mar 30, 2026

View reviewed changes

Christbowel added 2 commits March 30, 2026 23:16

	turn_idx = last_attempt.notes.get("turn_idx", 0)
	turn_idx = len(last_attempt.turns) - 1

Conversation

Christbowel commented Mar 27, 2026

What this adds

Parameters

Faithfulness to paper

Verification

Uh oh!

github-actions bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Christbowel commented Mar 27, 2026

Uh oh!

Christbowel commented Mar 27, 2026

Uh oh!

jmartin-tech left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jmartin-tech Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Christbowel Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Christbowel Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jmartin-tech Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Christbowel Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmartin-tech Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Christbowel Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 27, 2026 •

edited

Loading

jmartin-tech left a comment •

edited

Loading

Christbowel Mar 30, 2026 •

edited

Loading

jmartin-tech Mar 30, 2026 •

edited

Loading