Skip to content

feature: make jailbreak model-based rail fully usable in the open-source distribution #1739

@m-misiura

Description

@m-misiura

Did you check the docs?

  • I have read all the NeMo-Guardrails docs

Is your feature request related to a problem? Please describe.

The model-based jailbreak detection rail depends on a trained Random Forest classifier (snowflake.pkl) that is not included in the repository and does not seem to have public download or training instructions. This makes the rail effectively unavailable to open-source users.

Describe the solution you'd like

If this rail is to be kept as it is, perhaps:

  • shipping a pre-trained classifier in the repo, or providing a training script and dataset reference so the community can reproduce it
  • or explicitly documenting this

Describe alternatives you've considered

Re-architecting the rails so that it does not require a separate embedding model + classifier could also be considered (I'd be happy to take this on as it might actually align with the hf classifier rail that was suggested by the RH team)

cc @Pouyanpi

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requeststatus: needs triageNew issues that have not yet been reviewed or categorized.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions