Skip to content

Fix: Minor logging uplift for debugging of prompt injection mitigation#7195

Open
dorien-koelemeijer wants to merge 3 commits intomainfrom
fix/pattern-based-fallback
Open

Fix: Minor logging uplift for debugging of prompt injection mitigation#7195
dorien-koelemeijer wants to merge 3 commits intomainfrom
fix/pattern-based-fallback

Conversation

@dorien-koelemeijer
Copy link
Collaborator

@dorien-koelemeijer dorien-koelemeijer commented Feb 13, 2026

Summary

  • Minor logging uplift to make sure we can validate that there is a fallback to pattern-based detection if prompt injection mitigation feature is enabled, but command injection isn't available/misconfigured.
  • Datadog metrics uplift

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Local/manual testing.

Copilot AI review requested due to automatic review settings February 13, 2026 00:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the prompt-injection scanning internals to surface (via logging) whether pattern-based scanning was used as a fallback when ML-based command injection detection isn’t available or fails.

Changes:

  • Add a used_pattern_detection flag to DetailedScanResult to track when pattern-based scanning was used.
  • Switch the tracing::info! field has_patterns to report the fallback-path usage rather than presence of pattern matches.
  • Propagate used_pattern_detection through intermediate scan results used to build the final explanation/logging.

Copilot AI review requested due to automatic review settings February 17, 2026 01:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines 243 to 247
confidence: max_confidence,
pattern_matches: Vec::new(),
ml_confidence: Some(max_confidence),
used_pattern_detection: false,
})
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_conversation always sets ml_confidence: Some(max_confidence) even if every classifier call failed (all scan_with_classifier results were None), which makes downstream logic treat this as a real ML signal (and currently reduces tool_confidence by 10% when the value is 0.0). Track whether any classification succeeded (e.g., fold an Option<f32> or keep a success flag) and return ml_confidence: None when there were no successful results.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +81
monotonic_counter.goose.security_command_classifier_enabled = if command_classifier_enabled { 1 } else { 0 },
monotonic_counter.goose.security_prompt_classifier_enabled = if prompt_classifier_enabled { 1 } else { 0 },
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using monotonic_counter.* = if enabled { 1 } else { 0 } is likely to produce confusing metrics (a counter with value 0 is typically a no-op and the name suggests a gauge); consider logging the booleans as normal fields (e.g. command_classifier_enabled = ...) and emitting a separate monotonic_counter metric with value 1 (or separate enabled/disabled counters) so disabled configurations are still observable.

Suggested change
monotonic_counter.goose.security_command_classifier_enabled = if command_classifier_enabled { 1 } else { 0 },
monotonic_counter.goose.security_prompt_classifier_enabled = if prompt_classifier_enabled { 1 } else { 0 },
monotonic_counter.goose.security_classifier_configuration_logged = 1,
security_command_classifier_enabled = command_classifier_enabled,
security_prompt_classifier_enabled = prompt_classifier_enabled,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants