Fix: Minor logging uplift for debugging of prompt injection mitigation#7195
Fix: Minor logging uplift for debugging of prompt injection mitigation#7195dorien-koelemeijer wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the prompt-injection scanning internals to surface (via logging) whether pattern-based scanning was used as a fallback when ML-based command injection detection isn’t available or fails.
Changes:
- Add a
used_pattern_detectionflag toDetailedScanResultto track when pattern-based scanning was used. - Switch the
tracing::info!fieldhas_patternsto report the fallback-path usage rather than presence of pattern matches. - Propagate
used_pattern_detectionthrough intermediate scan results used to build the final explanation/logging.
| confidence: max_confidence, | ||
| pattern_matches: Vec::new(), | ||
| ml_confidence: Some(max_confidence), | ||
| used_pattern_detection: false, | ||
| }) |
There was a problem hiding this comment.
scan_conversation always sets ml_confidence: Some(max_confidence) even if every classifier call failed (all scan_with_classifier results were None), which makes downstream logic treat this as a real ML signal (and currently reduces tool_confidence by 10% when the value is 0.0). Track whether any classification succeeded (e.g., fold an Option<f32> or keep a success flag) and return ml_confidence: None when there were no successful results.
| monotonic_counter.goose.security_command_classifier_enabled = if command_classifier_enabled { 1 } else { 0 }, | ||
| monotonic_counter.goose.security_prompt_classifier_enabled = if prompt_classifier_enabled { 1 } else { 0 }, |
There was a problem hiding this comment.
Using monotonic_counter.* = if enabled { 1 } else { 0 } is likely to produce confusing metrics (a counter with value 0 is typically a no-op and the name suggests a gauge); consider logging the booleans as normal fields (e.g. command_classifier_enabled = ...) and emitting a separate monotonic_counter metric with value 1 (or separate enabled/disabled counters) so disabled configurations are still observable.
| monotonic_counter.goose.security_command_classifier_enabled = if command_classifier_enabled { 1 } else { 0 }, | |
| monotonic_counter.goose.security_prompt_classifier_enabled = if prompt_classifier_enabled { 1 } else { 0 }, | |
| monotonic_counter.goose.security_classifier_configuration_logged = 1, | |
| security_command_classifier_enabled = command_classifier_enabled, | |
| security_prompt_classifier_enabled = prompt_classifier_enabled, |
Summary
Type of Change
AI Assistance
Testing
Local/manual testing.