feat: Add container monitoring and observability (coi monitor)

## Overview

Add comprehensive monitoring and observability capabilities to COI, allowing users to monitor container network activity, I/O operations, resource usage, and security events from outside the container.

## Motivation

**Security & Audit:**
- Detect data exfiltration attempts in real-time
- Log all network connections for compliance/audit trails
- Alert on suspicious behavior (unexpected connections, high bandwidth, etc.)
- Understand what AI agents are actually doing

**Debugging:**
- Troubleshoot network isolation issues (see what's being blocked)
- Identify performance bottlenecks
- Verify firewall rules are working correctly

**Cost Control:**
- Monitor API usage if AI is making external calls
- Track bandwidth consumption
- Enforce rate limits

**Forensics:**
- Post-session analysis of what went wrong
- Replay session activity
- Generate audit reports

## Proposed Command: `coi monitor`

### Basic Usage

\`\`\`bash
# Live monitoring dashboard (TUI)
coi monitor <container>

# JSON output for scripting/integration
coi monitor <container> --json

# Monitor all COI containers
coi monitor --all

# Auto-detect container from current workspace
coi monitor
\`\`\`

### Monitoring Modes

\`\`\`bash
# Specific monitoring types
coi monitor <container> --network          # Network connections only
coi monitor <container> --io               # Disk I/O only
coi monitor <container> --resources        # CPU/memory/cgroup stats
coi monitor <container> --firewall         # Firewall events only

# Combined
coi monitor <container> --network --io     # Multiple modes
\`\`\`

### Alert Thresholds

\`\`\`bash
# Alert on events
coi monitor <container> --alert-on-new-connections
coi monitor <container> --alert-on-firewall-block

# Threshold alerts
coi monitor <container> --bandwidth-threshold 100MB/min
coi monitor <container> --io-threshold 1000iops
coi monitor <container> --cpu-threshold 80%
\`\`\`

### Output & Integration

\`\`\`bash
# Logging
coi monitor <container> --log-file /tmp/coi-monitor.log

# Prometheus metrics export
coi monitor <container> --export-prometheus :9090

# Output formats
coi monitor <container> --format json|table|dashboard
\`\`\`

### Audit & Forensics

\`\`\`bash
# Full audit mode (syscall + network tracing)
coi monitor <container> --audit

# Record session for later replay
coi monitor <container> --record-session

# Network packet capture
coi monitor <container> --pcap /tmp/traffic.pcap

# Replay recorded session
coi monitor replay <session-id>

# Show session statistics
coi monitor stats <session-id>
\`\`\`

## Example Output (TUI Dashboard Mode)

\`\`\`
Container: coi-abc12345-1
Uptime: 15m 32s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

NETWORK ACTIVITY
  Active Connections: 3
  ├─ 52.84.142.12:443 (HTTPS) - api.anthropic.com ✓ ALLOWED
  ├─ 8.8.8.8:53 (DNS) ✓ ALLOWED
  └─ 192.168.1.1:80 (HTTP) ✗ BLOCKED (RFC1918)
  
  Bandwidth: ↓ 1.2 MB/s  ↑ 45 KB/s
  Total: ↓ 18.5 MB  ↑ 2.1 MB
  Firewall Blocks: 5 attempts

DISK I/O
  Read:  125 KB/s  (1.8 GB total)
  Write: 45 KB/s   (456 MB total)
  IOPS:  45 read, 12 write

RESOURCES
  CPU:    15% (2 cores)
  Memory: 512 MB / 2 GB (25%)
  Processes: 12

RECENT EVENTS
  [15:32:45] ✓ Connected to api.anthropic.com:443
  [15:32:42] ✗ Blocked connection to 192.168.1.1:80 (RFC1918)
  [15:32:40] ✓ DNS query: api.anthropic.com -> 52.84.142.12
  [15:32:38] ℹ File write: /workspace/output.txt (1.2 KB)
\`\`\`

## Implementation Approaches

### 1. Network Monitoring

**Option A: Connection Tracking (conntrack)**
- Use `conntrack` to monitor active connections
- Filter by container IP address
- Pros: Low overhead, real-time
- Cons: Only shows active connections, no historical data

**Option B: eBPF (bpftrace/bcc-tools)**
- Trace network syscalls at kernel level
- Can capture all connection attempts (even failed ones)
- Pros: Very detailed, can't be bypassed
- Cons: Requires BPF support, more complex

**Option C: Firewalld Logs**
- Parse firewalld logs for block/allow events
- Already available with network isolation
- Pros: Easy, already logged
- Cons: Only shows firewall decisions, not all traffic

**Recommended: Hybrid approach**
- Use conntrack for active connections
- Parse firewalld logs for blocks
- Optional eBPF mode for deep inspection (`--audit`)

### 2. I/O Monitoring

**Option A: Cgroup Stats**
- Read from `/sys/fs/cgroup/.../io.stat`
- Incus already uses cgroups
- Pros: Built-in, accurate, low overhead
- Cons: Aggregated stats only

**Option B: eBPF (biosnoop/biolatency)**
- Trace block I/O at kernel level
- Per-file granularity
- Pros: Very detailed
- Cons: Higher overhead

**Recommended: Cgroup stats by default, eBPF for `--audit` mode**

### 3. Resource Monitoring

**Use Incus API + Cgroups:**
- `incus info <container>` provides basic stats
- Cgroup stats for detailed metrics
- `/proc/<pid>/` for process-level data

### 4. Event Correlation

**Challenge:** Map host-level events back to containers
- Network: Match by container IP (get from Incus)
- I/O: Match by cgroup path
- Processes: Match by PID namespace

## Data Sources

| Metric | Source | Method |
|--------|--------|--------|
| Active connections | `/proc/net/tcp`, conntrack | Parse /proc or use conntrack CLI |
| Bandwidth | cgroup `io.stat`, iftop | Read cgroup stats or parse iftop |
| Firewall events | firewalld logs | Parse journalctl output |
| Disk I/O | cgroup `io.stat` | Read from sysfs |
| CPU usage | cgroup `cpu.stat` | Read from sysfs |
| Memory | cgroup `memory.current` | Read from sysfs |
| DNS queries | eBPF (optional) | Trace UDP port 53 |
| Syscalls | eBPF (optional) | Trace syscall entry points |

## Implementation Phases

### Phase 1: Basic Monitoring (MVP)
- [ ] `coi monitor <container>` - simple table output
- [ ] Network: active connections from `/proc/net/tcp` + conntrack
- [ ] I/O: basic stats from cgroup
- [ ] Resources: CPU/memory from cgroup
- [ ] Parse firewalld logs for blocks
- [ ] JSON output mode (`--json`)

### Phase 2: Enhanced Output
- [ ] TUI dashboard mode (using bubbletea or similar)
- [ ] Real-time updates
- [ ] Color-coded output (green=allowed, red=blocked)
- [ ] Bandwidth calculation (bytes/sec)
- [ ] Event timeline

### Phase 3: Alerts & Thresholds
- [ ] Alert on new connections (`--alert-on-new-connections`)
- [ ] Bandwidth thresholds (`--bandwidth-threshold`)
- [ ] Firewall block alerts
- [ ] Webhook notifications

### Phase 4: Audit & Forensics
- [ ] Session recording (`--record-session`)
- [ ] eBPF integration for deep inspection (`--audit`)
- [ ] Packet capture (`--pcap`)
- [ ] Session replay (`coi monitor replay`)
- [ ] Summary statistics (`coi monitor stats`)

### Phase 5: Integration
- [ ] Prometheus metrics export
- [ ] SIEM webhooks
- [ ] Grafana dashboards
- [ ] Audit log export (JSON/CSV)

## Technical Considerations

**Privileges:**
- Network monitoring: Requires access to `/proc/net/`, conntrack (may need sudo)
- Cgroup stats: Read-only access to `/sys/fs/cgroup/` (usually accessible)
- eBPF: Requires CAP_BPF/CAP_ADMIN (sudo)
- Firewalld logs: Requires journalctl access (incus-admin group should have this)

**Performance:**
- Cgroup stats: Negligible overhead
- Conntrack: Very low overhead
- eBPF: Low overhead but depends on trace frequency
- Packet capture: Can be heavy with high traffic

**Container Identification:**
- Get container IP from Incus API
- Match network events by source IP
- Match I/O by cgroup path (Incus sets this)

## Security & Privacy

**Concerns:**
- Packet capture could expose sensitive data
- Full syscall tracing reveals all container activity
- Logs could contain API keys or credentials

**Mitigations:**
- Audit mode (`--audit`, `--pcap`) requires explicit opt-in
- Warn users about sensitive data in logs
- Option to filter/redact credentials in output
- Secure storage for recorded sessions

## Use Cases

1. **Security Auditor**: "I need to verify the AI didn't exfiltrate data"
   ```bash
   coi monitor <container> --log-file audit.log
   # Review all connections after session
   ```

2. **Developer**: "Why is network isolation blocking my package install?"
   ```bash
   coi monitor <container> --firewall
   # See exactly what's being blocked
   ```

3. **Compliance Officer**: "Generate audit report for AI agent session"
   ```bash
   coi monitor <container> --record-session
   coi monitor stats <session-id> --export report.pdf
   ```

4. **DevOps**: "Integrate COI metrics into our monitoring stack"
   ```bash
   coi monitor --all --export-prometheus :9090
   # Scrape with Prometheus, visualize in Grafana
   ```

## Open Questions

1. Should monitoring be opt-in or always-on?
   - Proposal: Basic stats always available via `coi monitor`, audit mode opt-in
   
2. How long to keep recorded sessions?
   - Proposal: Configurable retention policy, default 7 days

3. Should we auto-start monitoring when `--audit` flag is used with `coi shell`?
   - Proposal: Yes, `coi shell --audit` automatically enables monitoring

4. Privacy concerns with packet capture?
   - Proposal: Explicit warning, require `--i-understand-the-risks` flag

## Related Issues

- Network isolation (#XX - if exists)
- Session management (#XX - if exists)
- Resource limits (#99)

## References

- [Incus Metrics API](https://linuxcontainers.org/incus/docs/main/metrics/)
- [cgroup v2 documentation](https://www.kernel.org/doc/Documentation/cgroup-v2.txt)
- [eBPF for container monitoring](https://ebpf.io/)
- [conntrack-tools](https://netfilter.org/projects/conntrack-tools/)

Metric	Source	Method
Active connections	`/proc/net/tcp`, conntrack	Parse /proc or use conntrack CLI
Bandwidth	cgroup `io.stat`, iftop	Read cgroup stats or parse iftop
Firewall events	firewalld logs	Parse journalctl output
Disk I/O	cgroup `io.stat`	Read from sysfs
CPU usage	cgroup `cpu.stat`	Read from sysfs
Memory	cgroup `memory.current`	Read from sysfs
DNS queries	eBPF (optional)	Trace UDP port 53
Syscalls	eBPF (optional)	Trace syscall entry points

feat: Add container monitoring and observability (coi monitor) #112

Description

Overview

Motivation

Proposed Command: coi monitor

Basic Usage

Live monitoring dashboard (TUI)

JSON output for scripting/integration

Monitor all COI containers

Auto-detect container from current workspace

Monitoring Modes

Specific monitoring types

Combined

Alert Thresholds

Alert on events

Threshold alerts

Output & Integration

Logging

Prometheus metrics export

Output formats

Audit & Forensics

Full audit mode (syscall + network tracing)

Record session for later replay

Network packet capture

Replay recorded session

Show session statistics

Example Output (TUI Dashboard Mode)

Implementation Approaches

1. Network Monitoring

2. I/O Monitoring

3. Resource Monitoring

4. Event Correlation

Data Sources

Implementation Phases

Phase 1: Basic Monitoring (MVP)

Phase 2: Enhanced Output

Phase 3: Alerts & Thresholds

Phase 4: Audit & Forensics

Phase 5: Integration

Technical Considerations

Security & Privacy

Use Cases

Open Questions

Related Issues

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposed Command: `coi monitor`