Skip to content

feat: Add secrets detection and prevention to protect credentials #116

@mensfeld

Description

@mensfeld

Overview

Add automatic secrets detection and prevention to stop AI agents from accidentally exposing API keys, credentials, tokens, and other sensitive data in workspace files. Scan for secrets before commits, alert on detection, and optionally block operations that would expose credentials.

Motivation

Current Problem:

  • AI agents might accidentally hardcode API keys in source code
  • Credentials could be written to .env files that get committed
  • SSH private keys could be copied to workspace
  • Database connection strings with passwords exposed
  • No warning before pushing sensitive data

With Secrets Detection:

coi shell --block-secrets

# AI tries to write:
# API_KEY = "sk-abc123..."

⚠️  BLOCKED: Potential secret detected in src/config.py
  Type: API Key (Anthropic)
  Line: 15: API_KEY = "sk-abc123..."
  
  This appears to be a sensitive credential.
  Use environment variables instead: os.getenv('API_KEY')

Use Cases

1. Pre-Commit Protection

# Scan workspace before committing
coi secrets scan

# Output:
⚠️  Found 3 potential secrets in workspace:

src/config.py:15
  Type: API Key
  Pattern: sk-[a-zA-Z0-9]{48}
  Line: API_KEY = "sk-abc123..."

.env.example:5
  Type: AWS Access Key
  Pattern: AKIA[0-9A-Z]{16}
  Line: AWS_ACCESS_KEY=AKIAI...

database.yml:12
  Type: Database Password
  Line: password: "super_secret_123"

Run: coi secrets clean

2. Real-Time Protection

# Block AI from writing secrets
coi shell --block-secrets

# AI can still work, but:
# - Can't write files with secrets
# - Can't commit files with secrets
# - Gets warning to use env vars instead

3. Historical Scanning

# Scan past sessions for exposed secrets
coi secrets scan --session session-abc123

# Scan all sessions for a project
coi secrets scan --project backend-api --all-sessions

# Generate audit report
coi secrets audit --project backend-api > secrets-audit.json

4. Cleanup & Remediation

# Find and remove secrets
coi secrets clean

# Shows each secret and asks:
# Remove from file? [y/N]
# Replace with env var? [Y/n]
# Add to .gitignore? [Y/n]

# Automatic cleanup (dangerous)
coi secrets clean --auto --replace-with-env-vars

Proposed Implementation

Detection Strategies

1. Pattern-Based Detection

var secretPatterns = []SecretPattern{
    {
        Name:    "Anthropic API Key",
        Pattern: regexp.MustCompile(`sk-ant-[a-zA-Z0-9-]{95}`),
        Severity: "high",
    },
    {
        Name:    "OpenAI API Key",
        Pattern: regexp.MustCompile(`sk-[a-zA-Z0-9]{48}`),
        Severity: "high",
    },
    {
        Name:    "AWS Access Key",
        Pattern: regexp.MustCompile(`AKIA[0-9A-Z]{16}`),
        Severity: "high",
    },
    {
        Name:    "GitHub Token",
        Pattern: regexp.MustCompile(`ghp_[a-zA-Z0-9]{36}`),
        Severity: "high",
    },
    {
        Name:    "Stripe API Key",
        Pattern: regexp.MustCompile(`sk_live_[a-zA-Z0-9]{24}`),
        Severity: "high",
    },
    {
        Name:    "Private Key",
        Pattern: regexp.MustCompile(`-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----`),
        Severity: "critical",
    },
    {
        Name:    "Generic Secret",
        Pattern: regexp.MustCompile(`(?i)(secret|password|token|api_?key)\s*[:=]\s*["']([^"']{8,})["']`),
        Severity: "medium",
    },
}

2. Entropy-Based Detection

func hasHighEntropy(value string) bool {
    // Calculate Shannon entropy
    entropy := calculateEntropy(value)
    
    // High entropy strings are likely secrets
    return entropy > 4.5 && len(value) > 16
}

func calculateEntropy(s string) float64 {
    freq := make(map[rune]float64)
    for _, c := range s {
        freq[c]++
    }
    
    var entropy float64
    length := float64(len(s))
    for _, count := range freq {
        p := count / length
        entropy -= p * math.Log2(p)
    }
    
    return entropy
}

3. Integration with Existing Tools

# Use gitleaks
gitleaks detect --source /workspace --no-git

# Use trufflehog
trufflehog filesystem /workspace

# Use detect-secrets
detect-secrets scan /workspace

File Monitoring

Monitor workspace file writes in real-time:

func monitorWorkspaceWrites(container string) {
    // Use inotify or fsnotify to watch workspace
    watcher, _ := fsnotify.NewWatcher()
    watcher.Add(getWorkspacePath(container))
    
    for event := range watcher.Events {
        if event.Op&fsnotify.Write == fsnotify.Write {
            // Scan newly written file
            if hasSecrets(event.Name) {
                alertUser(event.Name)
                // Optionally block/remove
            }
        }
    }
}

Command-Line Interface

Scanning

# Scan workspace
coi secrets scan

# Scan specific files
coi secrets scan src/config.py .env

# Scan with specific tools
coi secrets scan --tool gitleaks
coi secrets scan --tool trufflehog
coi secrets scan --tool detect-secrets

# Output formats
coi secrets scan --format table
coi secrets scan --format json
coi secrets scan --format sarif  # GitHub compatible

# Severity filtering
coi secrets scan --severity high
coi secrets scan --severity critical

Prevention

# Enable real-time protection
coi shell --block-secrets

# Different modes
coi shell --warn-secrets      # Warn but don't block
coi shell --block-secrets     # Block file writes with secrets
coi shell --audit-secrets     # Log all secrets to audit trail

# Per-session scanning
coi secrets scan --session session-abc123

# Historical audit
coi secrets audit --all-sessions

Cleanup

# Interactive cleanup
coi secrets clean

# Auto-replace with env vars
coi secrets clean --auto-fix

# Remove detected secrets
coi secrets clean --remove

# Preview changes
coi secrets clean --dry-run

Configuration

# Configure detection rules
coi secrets config --add-pattern "custom_token:[a-z0-9]{32}"
coi secrets config --ignore-file test_fixtures.py
coi secrets config --ignore-pattern "EXAMPLE_.*"

# Whitelist known false positives
coi secrets whitelist add "sk-test-123"  # Test API key

Example Output

Scan Results

Secrets Scan Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

CRITICAL (1)
  src/keys.py:23
    Private Key: -----BEGIN RSA PRIVATE KEY-----
    Risk: Critical - Never commit private keys to source control
    
HIGH (3)
  src/config.py:15
    Anthropic API Key: sk-ant-api03-...
    Risk: High - API key with full account access
    
  .env:7
    AWS Access Key: AKIAIOSFODNN7EXAMPLE
    Risk: High - AWS credentials with potential broad access
    
  database.yml:12
    Database Password: password: "MySuperSecret123"
    Risk: High - Database credentials in plaintext

MEDIUM (2)
  test/fixtures.py:45
    Generic Secret: api_key = "test_key_12345678"
    Risk: Medium - May be test data (review manually)

SUMMARY
  Total files scanned: 142
  Secrets found: 6 (1 critical, 3 high, 2 medium)
  Files affected: 4

RECOMMENDATIONS
  1. Move all secrets to environment variables
  2. Add .env to .gitignore
  3. Rotate exposed API keys immediately
  4. Use secret management (e.g., 1Password, AWS Secrets Manager)
  
Run: coi secrets clean --interactive

Real-Time Block

❌ BLOCKED: Secret detected

File: src/config.py
Line: 15
Type: Anthropic API Key
Pattern: sk-ant-api03-xxxxxxxxxxxxx

AI attempted to write:
  API_KEY = "sk-ant-api03-xxxxxxxxx..."

This appears to be a sensitive credential.

RECOMMENDED FIXES:

1. Use environment variable:
   import os
   API_KEY = os.getenv('ANTHROPIC_API_KEY')

2. Use configuration file (not committed):
   # In config.py
   from config_local import API_KEY  # Add config_local.py to .gitignore

3. Use secrets manager:
   from secretsmanager import get_secret
   API_KEY = get_secret('anthropic_api_key')

The file was NOT written. Please fix and try again.

Cleanup Interactive

Secret Cleanup Wizard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Found: Anthropic API Key in src/config.py:15
  API_KEY = "sk-ant-api03-..."

Options:
  [1] Replace with environment variable (recommended)
  [2] Remove line entirely
  [3] Comment out with warning
  [4] Skip (keep as-is)
  [5] Whitelist (mark as false positive)

Choice: 1

✓ Replaced with: API_KEY = os.getenv('ANTHROPIC_API_KEY')
✓ Added import: import os
✓ Created .env.example with: ANTHROPIC_API_KEY=your_key_here

Next: Found AWS Access Key in .env:7
...

Implementation Phases

Phase 1: Basic Scanning (MVP)

  • Pattern-based secret detection
  • Common secret patterns (API keys, tokens, passwords)
  • coi secrets scan command
  • Integration with gitleaks or trufflehog
  • Text output with findings

Phase 2: Real-Time Protection

  • File write monitoring in containers
  • --block-secrets flag for coi shell
  • Real-time alerts when secrets detected
  • Block file writes with secrets
  • Suggest fixes (use env vars)

Phase 3: Cleanup & Remediation

  • coi secrets clean command
  • Interactive cleanup wizard
  • Auto-replace with env vars
  • Generate .env.example files
  • Whitelist management

Phase 4: Advanced Detection

  • Entropy-based detection
  • Machine learning models for secret detection
  • Context-aware detection (reduce false positives)
  • Custom pattern support
  • Integration with multiple scanning tools

Phase 5: Integration & Reporting

  • GitHub Actions integration (SARIF output)
  • Pre-commit hook generation
  • Audit trail of detected secrets
  • Rotation recommendations
  • Secret manager integration suggestions

Configuration

# ~/.config/coi/config.toml

[secrets]
enabled = true
block_by_default = false  # Warn by default, don't block

[secrets.scan]
tools = ["gitleaks", "trufflehog"]  # Tools to use for scanning
severity_threshold = "medium"        # minimum severity to report

[secrets.patterns]
# Custom patterns
custom = [
  { name = "Company Token", pattern = "COMP_[A-Z0-9]{32}", severity = "high" }
]

[secrets.ignore]
# Ignore patterns (false positives)
patterns = [
  "EXAMPLE_.*",
  "TEST_KEY_.*",
]

# Ignore files
files = [
  "test/fixtures/*.py",
  "**/*_test.go",
]

[secrets.whitelist]
# Known safe values
values = [
  "sk-test-12345",  # Test API key
]

[secrets.auto_fix]
replace_with_env_vars = true
create_env_example = true
add_to_gitignore = true

Integration with Git Hooks

Generate pre-commit hook:

coi secrets install-hook

# Creates .git/hooks/pre-commit:
#!/bin/sh
coi secrets scan --severity high || exit 1

Or integrate with existing pre-commit framework:

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: coi-secrets-check
        name: COI Secrets Detection
        entry: coi secrets scan
        language: system
        pass_filenames: false

Secret Types Detected

API Keys

  • Anthropic (Claude)
  • OpenAI (ChatGPT)
  • Google Cloud
  • AWS
  • Azure
  • Stripe
  • Twilio
  • SendGrid
  • GitHub
  • GitLab

Credentials

  • Database connection strings
  • JDBC URLs with passwords
  • Redis connection strings
  • SMTP credentials

Keys

  • SSH private keys
  • PGP private keys
  • TLS/SSL certificates
  • JWT tokens
  • Session tokens
  • OAuth tokens

Cloud Provider Secrets

  • AWS access keys
  • GCP service account keys
  • Azure connection strings
  • DigitalOcean tokens
  • Heroku API keys

Generic Patterns

  • High-entropy strings
  • Password fields
  • Secret/token fields
  • Base64-encoded credentials

Benefits

Security:

  • Prevent accidental credential exposure
  • Reduce attack surface
  • Comply with security policies
  • Protect production systems

Education:

  • Teach AI agents best practices
  • Show proper secret management
  • Guide to environment variables
  • Prevent bad habits

Compliance:

  • Meet security audit requirements
  • SOC2/ISO27001 compliance
  • Prevent data breaches
  • Maintain audit trails

Cost:

  • Prevent compromised API keys
  • Avoid key rotation costs
  • Prevent unauthorized usage
  • Reduce security incidents

Technical Considerations

Performance

Scanning strategies:

  • Incremental scan (only changed files)
  • Background scanning (don't block AI)
  • Cached results (avoid re-scanning)
  • Parallel scanning

False Positives

Reduce with:

  • Context awareness (test files, examples)
  • Entropy analysis (high randomness = likely secret)
  • Whitelist management
  • Pattern refinement

False Negatives

Improve detection:

  • Multiple scanning tools
  • Custom patterns for company-specific secrets
  • Regular expression updates
  • Community-contributed patterns

Related Issues

Integration with Secret Managers

Suggest integration with:

# After detecting secrets
⚠️  Secrets detected. Consider using a secret manager:

1Password CLI:
  op inject -i config.template.yml -o config.yml

AWS Secrets Manager:
  aws secretsmanager get-secret-value --secret-id api-key

HashiCorp Vault:
  vault kv get secret/api-key

Environment variables:
  export ANTHROPIC_API_KEY=$(cat ~/.secrets/anthropic)
  coi shell

Open Questions

  1. Should we block or warn by default?

    • Proposal: Warn by default, configurable to block
  2. How to handle test fixtures with fake secrets?

    • Proposal: Whitelist patterns, special markers in code
  3. Should we scan container filesystem or just workspace?

    • Proposal: Workspace only (main risk), optional full scan
  4. Should we auto-rotate detected secrets?

    • Proposal: No (too risky), provide rotation instructions
  5. How to handle secrets in git history?

    • Proposal: Separate tool/command, use git-filter-repo

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions