Skip to content

Latest commit

 

History

History
451 lines (320 loc) · 13.3 KB

File metadata and controls

451 lines (320 loc) · 13.3 KB

Development Guide

This guide covers building, testing, and contributing to s3dedup.

Building from Source

# Build binary
cargo build

# Build for release
cargo build --release

# Run in development mode
cargo run -- server --config config.json

# Format code
cargo fmt

# Run linter
cargo clippy

# Check for compilation errors without building
cargo check

Testing

Quick Start

# Run all unit tests (no external dependencies)
cargo test --lib

# Run all tests (requires PostgreSQL + S3-compatible storage)
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/s3dedup_test"
export S3_ACCESS_KEY=GK0123456789abcdef01234567
export S3_SECRET_KEY=abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789
cargo test --features test-mocks

Unit Tests

Run all unit tests without external dependencies:

# Run all unit tests
cargo test --lib

# Run specific test
cargo test --lib test_hash_key_deterministic

# Run with output
cargo test --lib -- --nocapture

Integration Tests

Integration tests require external services. Run specific tests with appropriate setup:

# Run all integration tests (requires PostgreSQL + S3-compatible storage)
cargo test --features test-mocks --test integration_test

# Run specific integration test (requires PostgreSQL + S3-compatible storage)
cargo test --features test-mocks --test integration_test test_put_and_get_file -- --nocapture

PostgreSQL Lock Tests

The PostgreSQL advisory locks implementation requires a running PostgreSQL instance.

Setup PostgreSQL for Testing

Option 1: Docker (Recommended)

# Start PostgreSQL container
docker run -d \
  --name postgres-test \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=s3dedup_test \
  -p 5432:5432 \
  postgres:15

Option 2: Local PostgreSQL Installation

# Create test database (requires PostgreSQL installed locally)
psql -U postgres -c "CREATE DATABASE s3dedup_test;"

Run PostgreSQL Lock Tests

Once PostgreSQL is running:

# Set DATABASE_URL environment variable
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/s3dedup_test"

# Run all PostgreSQL lock tests
cargo test --features test-mocks --test postgres_locks_test -- --nocapture

# Run specific PostgreSQL lock test
cargo test --features test-mocks --test postgres_locks_test test_exclusive_lock_mutual_exclusion -- --nocapture

# Run with debug logging
RUST_LOG=debug cargo test --features test-mocks --test postgres_locks_test -- --nocapture

PostgreSQL Lock Tests Overview

The postgres_locks_test.rs suite validates PostgreSQL advisory lock functionality:

  • test_postgres_locks_creation: Verifies lock system initialization
  • test_exclusive_lock_mutual_exclusion: Ensures exclusive locks block concurrent acquisitions
  • test_shared_locks_concurrent: Verifies multiple shared locks can coexist
  • test_exclusive_blocks_shared: Ensures exclusive locks block shared locks
  • test_different_keys_independent: Verifies different lock keys don't interfere
  • test_lock_release_on_guard_drop: Ensures locks release when guard is dropped

Lock Release Mechanism: Both memory and PostgreSQL locks must call .release().await before being dropped. For memory locks, this explicitly drops the Tokio RwLock guard. For PostgreSQL locks, this calls the PostgreSQL advisory unlock function.

Migration Tests

Run migration tests with PostgreSQL:

# Set DATABASE_URL environment variable
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/s3dedup_test"

# Run all migration tests
cargo test --features test-mocks --test migration_test -- --nocapture

# Run specific migration test
cargo test --features test-mocks --test migration_test test_offline_migration_empty -- --nocapture

Cleaner Tests

Run background cleaner tests with PostgreSQL:

# Set DATABASE_URL environment variable
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/s3dedup_test"

# Run all cleaner tests
cargo test --features test-mocks --test cleaner_test -- --nocapture

Full Integration Testing Setup

For complete integration testing with all services:

# Option 1: Using Docker Compose (recommended for CI/CD)
docker-compose up -d

# Run all tests (credentials are set up automatically)
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/s3dedup_test"
export S3_ACCESS_KEY=GK0123456789abcdef01234567
export S3_SECRET_KEY=abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789
cargo test --features test-mocks

# Cleanup
docker-compose down

Option 2: Manual Setup with Garage

# Terminal 1: Start PostgreSQL
docker run -d \
  --name postgres-test \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=s3dedup_test \
  -p 5432:5432 \
  postgres:15

# Terminal 2: Start Garage (S3-compatible storage)
# Garage requires configuration - see docker/garage.toml
# For quick testing, use docker-compose instead

# Terminal 3: Run tests
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/s3dedup_test"
export S3_ACCESS_KEY=GK0123456789abcdef01234567
export S3_SECRET_KEY=abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789
cargo test --features test-mocks

# Cleanup
docker stop postgres-test
docker rm postgres-test

Test Categories

Test Type Command Dependencies Time
Unit tests cargo test --lib None ~1-2s
PostgreSQL lock tests cargo test --features test-mocks --test postgres_locks_test PostgreSQL + S3 ~5-10s
Migration tests cargo test --features test-mocks --test migration_test PostgreSQL + S3 ~10-20s
Integration tests cargo test --features test-mocks --test integration_test PostgreSQL + S3 ~20-30s
Cleaner tests cargo test --features test-mocks --test cleaner_test PostgreSQL + S3 ~5-10s
All tests cargo test --features test-mocks PostgreSQL + S3 ~30-50s

Environment Variables for Testing

# PostgreSQL connection (required for integration/lock/migration tests)
DATABASE_URL=postgres://postgres:postgres@localhost:5432/s3dedup_test

# S3 connection (required for integration tests)
S3_ENDPOINT=http://localhost:3900  # Default for Garage; adjust for your S3 backend
S3_ACCESS_KEY=your_access_key
S3_SECRET_KEY=your_secret_key

# Logging during tests
RUST_LOG=debug              # Show debug logs
RUST_LOG=s3dedup=debug     # Only s3dedup crate logs
RUST_BACKTRACE=1           # Enable backtraces for panics

Troubleshooting Tests

PostgreSQL Connection Refused

# Verify PostgreSQL is running
docker ps | grep postgres

# Check connection manually
psql -U postgres -h localhost -d s3dedup_test -c "SELECT 1;"

Database Already Exists Error

# Drop and recreate test database
dropdb -U postgres s3dedup_test || true
createdb -U postgres s3dedup_test

Test Hangs or Times Out

# Run with timeout
timeout 30 cargo test --features test-mocks --test postgres_locks_test test_exclusive_lock_mutual_exclusion -- --nocapture

# Check PostgreSQL locks status
psql -U postgres -d s3dedup_test -c "SELECT * FROM pg_locks WHERE locktype = 'advisory';"

Connection Pool Sizing

The POSTGRES_MAX_CONNECTIONS setting controls the maximum number of concurrent database connections from a single s3dedup instance. This single pool is shared between KV storage operations and lock management.

How to Choose Pool Size

Pool Size = (Concurrent Requests × 1.5) + Lock Overhead

General Guidelines

Deployment Concurrency Recommended Pool Size Notes
Low 1-5 concurrent requests 10 Default, suitable for development/testing
Medium 5-20 concurrent requests 20-30 Small production deployments
High 20-100 concurrent requests 50-100 Large production deployments
Very High 100+ concurrent requests 100-200 Use multiple instances with load balancing

Factors to Consider

  1. Number of s3dedup Instances

    • If you have N instances, each needs its own pool
    • Total connections = N instances × pool_size
    • PostgreSQL must have enough capacity for all instances
    • Example: 3 instances × 30 pool_size = 90 connections needed
  2. Lock Contention

    • File operations acquire locks (1 connection per lock)
    • Concurrent uploads/downloads increase lock pressure
    • Add 20% overhead for lock operations
    • Example: 20 concurrent requests → pool_size = (20 × 1.5) + overhead ≈ 35
  3. Database Configuration

    • Check PostgreSQL max_connections setting
    • Reserve connections for maintenance, monitoring, backups
    • Example: PostgreSQL with 200 max_connections:
      • Reserve 10 for maintenance
      • If 3 s3dedup instances: (200 - 10) / 3 ≈ 63 per instance
  4. Memory Usage Per Connection

    • Each connection uses ~5-10 MB of memory
    • Pool size 50 = ~250-500 MB per instance
    • Monitor actual usage and adjust accordingly

Example Configurations

Development (1 instance, low throughput):

"postgres": {
  "pool_size": 10
}

Production (3 instances, medium throughput):

"postgres": {
  "pool_size": 30
}

With PostgreSQL max_connections = 100:

  • 3 × 30 = 90 connections (10 reserved)

High-Availability (5 instances, high throughput with PostgreSQL max_connections = 200):

"postgres": {
  "pool_size": 35
}
  • 5 × 35 = 175 connections (25 reserved for other operations)

Monitoring and Tuning

Monitor these metrics to optimize pool size:

  1. Connection Utilization: Check if connections are frequently exhausted

    SELECT count(*) FROM pg_stat_activity WHERE datname = 's3dedup';
  2. Lock Wait Times: Monitor if operations wait for available connections

  3. Memory Usage: Watch instance memory as pool size increases

Scaling Strategy:

  • Start Conservative: Begin with pool_size = 10-20
  • Monitor Usage: Track connection utilization over 1-2 weeks
  • Increase Gradually: Increment by 10-20 when you see high utilization
  • Scale Horizontally: Instead of very large pools (>100), use more instances with moderate pools

Lock Implementation Details

Memory Locks

Memory-based locks use Tokio RwLock for efficient single-instance deployments:

  • Shared Locks: Multiple readers can hold the lock simultaneously
  • Exclusive Locks: Only one writer can hold the lock
  • Release: Explicitly dropping the Tokio guard via .release().await
  • Cleanup: Lock entries are cleaned up from the HashMap when no references remain

PostgreSQL Locks

PostgreSQL advisory locks provide distributed locking for multi-instance deployments:

  • Session-Scoped: Locks are tied to a database connection
  • Shared vs Exclusive: Identical semantics to memory locks but database-enforced
  • Explicit Release: Must call .release().await to unlock before connection returns to pool
  • Key Hashing: Lock keys are hashed to 64-bit integers for PostgreSQL's lock API
  • Atomic Release: Both lock release and connection return to pool happen atomically

Lock Guard Pattern

Both implementations follow the same pattern:

// Acquire lock
let guard = lock.acquire_exclusive().await?;

// ... do work ...

// Explicitly release
guard.release().await?;  // Calls drop in memory locks, pg_advisory_unlock in PostgreSQL

This unified pattern ensures consistent behavior across both backends.

Architecture

For detailed architecture documentation, see:

Key Components

Locks Module (src/locks/)

  • mod.rs: Trait definitions and enums
  • memory.rs: In-memory lock implementation using Tokio RwLock
  • postgres.rs: PostgreSQL advisory lock implementation

Routes Module (src/routes/ft/)

  • get_file.rs: GET endpoint with shared locks
  • put_file.rs: PUT endpoint with exclusive locks
  • delete_file.rs: DELETE endpoint with exclusive locks
  • utils.rs: Shared utilities for Filetracker protocol

Storage Module (src/kvstorage/)

  • mod.rs: Trait definitions
  • sqlite.rs: SQLite backend implementation
  • postgres.rs: PostgreSQL backend implementation

Code Style

Follow Rust conventions:

# Format code
cargo fmt

# Check linter warnings
cargo clippy

# Fix common issues
cargo clippy --fix

Contributing

When making changes:

  1. Write tests for new functionality
  2. Ensure all existing tests pass: cargo test --features test-mocks
  3. Run clippy: cargo clippy
  4. Format code: cargo fmt
  5. Document public APIs with doc comments
  6. Update relevant documentation files

Performance Considerations

Lock Contention

  • Use shortest critical sections possible
  • Release locks explicitly with .release().await to avoid holding connections
  • PostgreSQL connection pool size should account for lock overhead

Database Pool Sizing

See docs/configuration.md for detailed pool sizing guidance.

Lock Performance

  • Memory locks are faster (no network round-trip)
  • PostgreSQL locks have slight overhead but enable distributed coordination
  • Choose based on deployment model (single-instance vs multi-instance HA)