feat: Add secrets detection and prevention to protect credentials

## Overview

Add automatic secrets detection and prevention to stop AI agents from accidentally exposing API keys, credentials, tokens, and other sensitive data in workspace files. Scan for secrets before commits, alert on detection, and optionally block operations that would expose credentials.

## Motivation

**Current Problem:**
- AI agents might accidentally hardcode API keys in source code
- Credentials could be written to `.env` files that get committed
- SSH private keys could be copied to workspace
- Database connection strings with passwords exposed
- No warning before pushing sensitive data

**With Secrets Detection:**
```bash
coi shell --block-secrets

# AI tries to write:
# API_KEY = "sk-abc123..."

⚠️  BLOCKED: Potential secret detected in src/config.py
  Type: API Key (Anthropic)
  Line: 15: API_KEY = "sk-abc123..."
  
  This appears to be a sensitive credential.
  Use environment variables instead: os.getenv('API_KEY')
```

## Use Cases

### 1. Pre-Commit Protection

```bash
# Scan workspace before committing
coi secrets scan

# Output:
⚠️  Found 3 potential secrets in workspace:

src/config.py:15
  Type: API Key
  Pattern: sk-[a-zA-Z0-9]{48}
  Line: API_KEY = "sk-abc123..."

.env.example:5
  Type: AWS Access Key
  Pattern: AKIA[0-9A-Z]{16}
  Line: AWS_ACCESS_KEY=AKIAI...

database.yml:12
  Type: Database Password
  Line: password: "super_secret_123"

Run: coi secrets clean
```

### 2. Real-Time Protection

```bash
# Block AI from writing secrets
coi shell --block-secrets

# AI can still work, but:
# - Can't write files with secrets
# - Can't commit files with secrets
# - Gets warning to use env vars instead
```

### 3. Historical Scanning

```bash
# Scan past sessions for exposed secrets
coi secrets scan --session session-abc123

# Scan all sessions for a project
coi secrets scan --project backend-api --all-sessions

# Generate audit report
coi secrets audit --project backend-api > secrets-audit.json
```

### 4. Cleanup & Remediation

```bash
# Find and remove secrets
coi secrets clean

# Shows each secret and asks:
# Remove from file? [y/N]
# Replace with env var? [Y/n]
# Add to .gitignore? [Y/n]

# Automatic cleanup (dangerous)
coi secrets clean --auto --replace-with-env-vars
```

## Proposed Implementation

### Detection Strategies

#### 1. Pattern-Based Detection

```go
var secretPatterns = []SecretPattern{
    {
        Name:    "Anthropic API Key",
        Pattern: regexp.MustCompile(`sk-ant-[a-zA-Z0-9-]{95}`),
        Severity: "high",
    },
    {
        Name:    "OpenAI API Key",
        Pattern: regexp.MustCompile(`sk-[a-zA-Z0-9]{48}`),
        Severity: "high",
    },
    {
        Name:    "AWS Access Key",
        Pattern: regexp.MustCompile(`AKIA[0-9A-Z]{16}`),
        Severity: "high",
    },
    {
        Name:    "GitHub Token",
        Pattern: regexp.MustCompile(`ghp_[a-zA-Z0-9]{36}`),
        Severity: "high",
    },
    {
        Name:    "Stripe API Key",
        Pattern: regexp.MustCompile(`sk_live_[a-zA-Z0-9]{24}`),
        Severity: "high",
    },
    {
        Name:    "Private Key",
        Pattern: regexp.MustCompile(`-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----`),
        Severity: "critical",
    },
    {
        Name:    "Generic Secret",
        Pattern: regexp.MustCompile(`(?i)(secret|password|token|api_?key)\s*[:=]\s*["']([^"']{8,})["']`),
        Severity: "medium",
    },
}
```

#### 2. Entropy-Based Detection

```go
func hasHighEntropy(value string) bool {
    // Calculate Shannon entropy
    entropy := calculateEntropy(value)
    
    // High entropy strings are likely secrets
    return entropy > 4.5 && len(value) > 16
}

func calculateEntropy(s string) float64 {
    freq := make(map[rune]float64)
    for _, c := range s {
        freq[c]++
    }
    
    var entropy float64
    length := float64(len(s))
    for _, count := range freq {
        p := count / length
        entropy -= p * math.Log2(p)
    }
    
    return entropy
}
```

#### 3. Integration with Existing Tools

```bash
# Use gitleaks
gitleaks detect --source /workspace --no-git

# Use trufflehog
trufflehog filesystem /workspace

# Use detect-secrets
detect-secrets scan /workspace
```

### File Monitoring

Monitor workspace file writes in real-time:

```go
func monitorWorkspaceWrites(container string) {
    // Use inotify or fsnotify to watch workspace
    watcher, _ := fsnotify.NewWatcher()
    watcher.Add(getWorkspacePath(container))
    
    for event := range watcher.Events {
        if event.Op&fsnotify.Write == fsnotify.Write {
            // Scan newly written file
            if hasSecrets(event.Name) {
                alertUser(event.Name)
                // Optionally block/remove
            }
        }
    }
}
```

## Command-Line Interface

### Scanning

```bash
# Scan workspace
coi secrets scan

# Scan specific files
coi secrets scan src/config.py .env

# Scan with specific tools
coi secrets scan --tool gitleaks
coi secrets scan --tool trufflehog
coi secrets scan --tool detect-secrets

# Output formats
coi secrets scan --format table
coi secrets scan --format json
coi secrets scan --format sarif  # GitHub compatible

# Severity filtering
coi secrets scan --severity high
coi secrets scan --severity critical
```

### Prevention

```bash
# Enable real-time protection
coi shell --block-secrets

# Different modes
coi shell --warn-secrets      # Warn but don't block
coi shell --block-secrets     # Block file writes with secrets
coi shell --audit-secrets     # Log all secrets to audit trail

# Per-session scanning
coi secrets scan --session session-abc123

# Historical audit
coi secrets audit --all-sessions
```

### Cleanup

```bash
# Interactive cleanup
coi secrets clean

# Auto-replace with env vars
coi secrets clean --auto-fix

# Remove detected secrets
coi secrets clean --remove

# Preview changes
coi secrets clean --dry-run
```

### Configuration

```bash
# Configure detection rules
coi secrets config --add-pattern "custom_token:[a-z0-9]{32}"
coi secrets config --ignore-file test_fixtures.py
coi secrets config --ignore-pattern "EXAMPLE_.*"

# Whitelist known false positives
coi secrets whitelist add "sk-test-123"  # Test API key
```

## Example Output

### Scan Results

```
Secrets Scan Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

CRITICAL (1)
  src/keys.py:23
    Private Key: -----BEGIN RSA PRIVATE KEY-----
    Risk: Critical - Never commit private keys to source control
    
HIGH (3)
  src/config.py:15
    Anthropic API Key: sk-ant-api03-...
    Risk: High - API key with full account access
    
  .env:7
    AWS Access Key: AKIAIOSFODNN7EXAMPLE
    Risk: High - AWS credentials with potential broad access
    
  database.yml:12
    Database Password: password: "MySuperSecret123"
    Risk: High - Database credentials in plaintext

MEDIUM (2)
  test/fixtures.py:45
    Generic Secret: api_key = "test_key_12345678"
    Risk: Medium - May be test data (review manually)

SUMMARY
  Total files scanned: 142
  Secrets found: 6 (1 critical, 3 high, 2 medium)
  Files affected: 4

RECOMMENDATIONS
  1. Move all secrets to environment variables
  2. Add .env to .gitignore
  3. Rotate exposed API keys immediately
  4. Use secret management (e.g., 1Password, AWS Secrets Manager)
  
Run: coi secrets clean --interactive
```

### Real-Time Block

```
❌ BLOCKED: Secret detected

File: src/config.py
Line: 15
Type: Anthropic API Key
Pattern: sk-ant-api03-xxxxxxxxxxxxx

AI attempted to write:
  API_KEY = "sk-ant-api03-xxxxxxxxx..."

This appears to be a sensitive credential.

RECOMMENDED FIXES:

1. Use environment variable:
   import os
   API_KEY = os.getenv('ANTHROPIC_API_KEY')

2. Use configuration file (not committed):
   # In config.py
   from config_local import API_KEY  # Add config_local.py to .gitignore

3. Use secrets manager:
   from secretsmanager import get_secret
   API_KEY = get_secret('anthropic_api_key')

The file was NOT written. Please fix and try again.
```

### Cleanup Interactive

```
Secret Cleanup Wizard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Found: Anthropic API Key in src/config.py:15
  API_KEY = "sk-ant-api03-..."

Options:
  [1] Replace with environment variable (recommended)
  [2] Remove line entirely
  [3] Comment out with warning
  [4] Skip (keep as-is)
  [5] Whitelist (mark as false positive)

Choice: 1

✓ Replaced with: API_KEY = os.getenv('ANTHROPIC_API_KEY')
✓ Added import: import os
✓ Created .env.example with: ANTHROPIC_API_KEY=your_key_here

Next: Found AWS Access Key in .env:7
...
```

## Implementation Phases

### Phase 1: Basic Scanning (MVP)
- [ ] Pattern-based secret detection
- [ ] Common secret patterns (API keys, tokens, passwords)
- [ ] `coi secrets scan` command
- [ ] Integration with gitleaks or trufflehog
- [ ] Text output with findings

### Phase 2: Real-Time Protection
- [ ] File write monitoring in containers
- [ ] `--block-secrets` flag for `coi shell`
- [ ] Real-time alerts when secrets detected
- [ ] Block file writes with secrets
- [ ] Suggest fixes (use env vars)

### Phase 3: Cleanup & Remediation
- [ ] `coi secrets clean` command
- [ ] Interactive cleanup wizard
- [ ] Auto-replace with env vars
- [ ] Generate .env.example files
- [ ] Whitelist management

### Phase 4: Advanced Detection
- [ ] Entropy-based detection
- [ ] Machine learning models for secret detection
- [ ] Context-aware detection (reduce false positives)
- [ ] Custom pattern support
- [ ] Integration with multiple scanning tools

### Phase 5: Integration & Reporting
- [ ] GitHub Actions integration (SARIF output)
- [ ] Pre-commit hook generation
- [ ] Audit trail of detected secrets
- [ ] Rotation recommendations
- [ ] Secret manager integration suggestions

## Configuration

```toml
# ~/.config/coi/config.toml

[secrets]
enabled = true
block_by_default = false  # Warn by default, don't block

[secrets.scan]
tools = ["gitleaks", "trufflehog"]  # Tools to use for scanning
severity_threshold = "medium"        # minimum severity to report

[secrets.patterns]
# Custom patterns
custom = [
  { name = "Company Token", pattern = "COMP_[A-Z0-9]{32}", severity = "high" }
]

[secrets.ignore]
# Ignore patterns (false positives)
patterns = [
  "EXAMPLE_.*",
  "TEST_KEY_.*",
]

# Ignore files
files = [
  "test/fixtures/*.py",
  "**/*_test.go",
]

[secrets.whitelist]
# Known safe values
values = [
  "sk-test-12345",  # Test API key
]

[secrets.auto_fix]
replace_with_env_vars = true
create_env_example = true
add_to_gitignore = true
```

## Integration with Git Hooks

Generate pre-commit hook:

```bash
coi secrets install-hook

# Creates .git/hooks/pre-commit:
#!/bin/sh
coi secrets scan --severity high || exit 1
```

Or integrate with existing pre-commit framework:

```yaml
# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: coi-secrets-check
        name: COI Secrets Detection
        entry: coi secrets scan
        language: system
        pass_filenames: false
```

## Secret Types Detected

### API Keys
- Anthropic (Claude)
- OpenAI (ChatGPT)
- Google Cloud
- AWS
- Azure
- Stripe
- Twilio
- SendGrid
- GitHub
- GitLab

### Credentials
- Database connection strings
- JDBC URLs with passwords
- Redis connection strings
- SMTP credentials

### Keys
- SSH private keys
- PGP private keys
- TLS/SSL certificates
- JWT tokens
- Session tokens
- OAuth tokens

### Cloud Provider Secrets
- AWS access keys
- GCP service account keys
- Azure connection strings
- DigitalOcean tokens
- Heroku API keys

### Generic Patterns
- High-entropy strings
- Password fields
- Secret/token fields
- Base64-encoded credentials

## Benefits

**Security:**
- Prevent accidental credential exposure
- Reduce attack surface
- Comply with security policies
- Protect production systems

**Education:**
- Teach AI agents best practices
- Show proper secret management
- Guide to environment variables
- Prevent bad habits

**Compliance:**
- Meet security audit requirements
- SOC2/ISO27001 compliance
- Prevent data breaches
- Maintain audit trails

**Cost:**
- Prevent compromised API keys
- Avoid key rotation costs
- Prevent unauthorized usage
- Reduce security incidents

## Technical Considerations

### Performance

Scanning strategies:
- Incremental scan (only changed files)
- Background scanning (don't block AI)
- Cached results (avoid re-scanning)
- Parallel scanning

### False Positives

Reduce with:
- Context awareness (test files, examples)
- Entropy analysis (high randomness = likely secret)
- Whitelist management
- Pattern refinement

### False Negatives

Improve detection:
- Multiple scanning tools
- Custom patterns for company-specific secrets
- Regular expression updates
- Community-contributed patterns

## Related Issues

- Monitoring (#112) - Monitor secret exposure attempts
- Session management - Track which sessions exposed secrets
- Audit mode - Full audit trail of secret detections

## Integration with Secret Managers

Suggest integration with:

```bash
# After detecting secrets
⚠️  Secrets detected. Consider using a secret manager:

1Password CLI:
  op inject -i config.template.yml -o config.yml

AWS Secrets Manager:
  aws secretsmanager get-secret-value --secret-id api-key

HashiCorp Vault:
  vault kv get secret/api-key

Environment variables:
  export ANTHROPIC_API_KEY=$(cat ~/.secrets/anthropic)
  coi shell
```

## Open Questions

1. Should we block or warn by default?
   - Proposal: Warn by default, configurable to block

2. How to handle test fixtures with fake secrets?
   - Proposal: Whitelist patterns, special markers in code

3. Should we scan container filesystem or just workspace?
   - Proposal: Workspace only (main risk), optional full scan

4. Should we auto-rotate detected secrets?
   - Proposal: No (too risky), provide rotation instructions

5. How to handle secrets in git history?
   - Proposal: Separate tool/command, use git-filter-repo

## References

- [gitleaks](https://github.com/gitleaks/gitleaks) - Secret scanning tool
- [trufflehog](https://github.com/trufflesecurity/trufflehog) - Find secrets in git
- [detect-secrets](https://github.com/Yelp/detect-secrets) - Yelp's secret scanner
- [OWASP Secrets Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html)


feat: Add secrets detection and prevention to protect credentials #116

Description

Overview

Motivation

Use Cases

1. Pre-Commit Protection

2. Real-Time Protection

3. Historical Scanning

4. Cleanup & Remediation

Proposed Implementation

Detection Strategies

1. Pattern-Based Detection

2. Entropy-Based Detection

3. Integration with Existing Tools

File Monitoring

Command-Line Interface

Scanning

Prevention

Cleanup

Configuration

Example Output

Scan Results

Real-Time Block

Cleanup Interactive

Implementation Phases

Phase 1: Basic Scanning (MVP)

Phase 2: Real-Time Protection

Phase 3: Cleanup & Remediation

Phase 4: Advanced Detection

Phase 5: Integration & Reporting

Configuration

Integration with Git Hooks

Secret Types Detected

API Keys

Credentials

Keys

Cloud Provider Secrets

Generic Patterns

Benefits

Technical Considerations

Performance

False Positives

False Negatives

Related Issues

Integration with Secret Managers

Open Questions

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions