Performance Tuning

Optimize sr-search-replace for maximum speed and efficiency when working with large files, extensive directory trees, and complex replacement operations.

Quick Optimization Rules

Follow these rules for immediate performance improvements:

1. Skip Binary Files

Always use --skip-binary to avoid processing binary files:

sr "pattern" "replacement" --skip-binary --recursive .

Impact: 2-3x faster on mixed directories (code + images + binaries)

2. Exclude Large Directories

Exclude unnecessary directories (node_modules, .git, build artifacts):

sr "pattern" "replacement" --exclude "node_modules" --exclude ".git" --exclude "dist" --recursive .

Impact: Eliminates processing of millions of files

3. Use --dry-run for Validation

Test patterns before large operations:

sr --count-only "pattern" --recursive . # Validate quickly
sr "pattern" "replacement" --dry-run --recursive . # Preview changes

Impact: Prevents unnecessary processing if pattern is incorrect

4. Limit File Recursion Depth

When possible, target specific directories rather than entire filesystem:

# Slow: searches entire filesystem
sr "pattern" "replacement" --recursive /

# Fast: targets specific directory
sr "pattern" "replacement" --recursive /project/src/

Impact: 5-10x faster by reducing scope

5. Use Efficient Patterns

Avoid overly complex regex patterns:

# Slow: complex lookahead/lookbehind
sr --regex "(?<=\s)old(?=\s)" "new" *.txt

# Fast: simple word boundary
sr --regex "\bold\b" "new" *.txt

Configuration Optimization

Buffer Size Tuning

Increment buffer size for large files:

# In code configuration
BUFFER_SIZE = 65536 # 64 KB (default: 8 KB)
MAX_BUFFER_SIZE = 268435456 # 256 MB

When to increase:

Processing files > 100 MB
Many small files in one operation
System has plenty of RAM (> 8 GB)

When to keep small:

Limited RAM (< 2 GB)
Many concurrent processes
Processing thousands of files simultaneously

Worker Thread Configuration

Configure parallel processing:

# Environment variables
export SR_MAX_WORKERS="4" # Number of parallel threads
export SR_WORKER_TIMEOUT="300" # Timeout per thread (seconds)

Recommended settings:

CPUs/Cores = 4: Use 2-3 workers
CPUs/Cores = 8: Use 4-6 workers
CPUs/Cores = 16+: Use 8-12 workers

Timeout Configuration

Optimize operation timeout:

export SR_TIMEOUT="600" # 10 minutes for large operations

Guidelines:

Large files (> 500 MB): Use 600-900 seconds
Recursive directory trees: Use 300-600 seconds
Single file replacement: Use 60-120 seconds

Search Patterns & Matching

Literal Strings vs Regex

Literal strings are 5-10x faster than regex:

# Slow: regex processing
sr --regex "old_text" "new_text" *.txt

# Fast: literal string matching
sr "old_text" "new_text" *.txt

Use regex only when necessary (character classes, quantifiers, anchors)

Pattern Specificity

More specific patterns process faster:

# Slow: many matches to process
sr "import" "import" *.py # Matches hundreds of times

# Fast: few matches
sr "from old_module import" "from new_module import" *.py # Matches 5 times

Case-Sensitive Matching

Case-sensitive search is faster:

# Slower: case-insensitive requires extra processing
sr --ignore-case "error" "warning" *.log

# Faster: case-sensitive
sr "Error" "Warning" *.log

File Processing Optimization

Target Specific File Types

Use file patterns to minimize scope:

# Process only Python files
sr "pattern" "replacement" *.py

# Process JavaScript and TypeScript
sr "pattern" "replacement" *.{js,ts}

# Process specific directory
sr "pattern" "replacement" src/*.js

Filter by File Size

Skip large binary files:

sr "pattern" "replacement" --skip-binary --recursive .

Exclude Problematic Directories

sr "pattern" "replacement" \
 --exclude "node_modules" \
 --exclude ".git" \
 --exclude "dist" \
 --exclude "build" \
 --exclude "*.min.js" \
 --recursive .

Memory & Buffer Management

Processing Large Files

For files > 100 MB:

# Increase buffer size
export SR_BUFFER_SIZE="262144" # 256 KB

# Process with adequate timeout
sr "pattern" "replacement" --timeout "600" large_file.txt

Streaming vs Loading

Configuration for different file sizes:

# For files < 50 MB: Load entire file
SR_STREAM_MODE="false"
SR_BUFFER_SIZE="65536"

# For files 50-500 MB: Stream processing
SR_STREAM_MODE="true"
SR_BUFFER_SIZE="131072"

# For files > 500 MB: Large buffer streaming
SR_STREAM_MODE="true"
SR_BUFFER_SIZE="262144"

Parallel Processing

Enable Parallel Processing

Process multiple files simultaneously:

export SR_PARALLEL="true"
export SR_MAX_PARALLEL_JOBS="4"
sr "pattern" "replacement" --recursive large_directory/

Job Distribution

Optimal job distribution by file count:

Files: 1-10 -> Use 1 worker
Files: 11-100 -> Use 2-3 workers
Files: 101-1000 -> Use 4-6 workers
Files: 1000+ -> Use 8-12 workers

Monitor Parallel Performance

# Enable performance logging
export SR_DEBUG="true"
sr "pattern" "replacement" --recursive . 2>&1 | grep -i "worker\|thread\|time"

Large-Scale Operations

Batch Operations

Process large operations in batches:

# Phase 1: Small files (faster)
sr "pattern1" "replacement1" --exclude ">10M" --recursive .

# Phase 2: Large files (slower)
sr "pattern1" "replacement1" --exclude "<10M" --recursive .

Session Tracking for Large Operations

# Create session for tracking
sr "pattern1" "replacement1" --session "large_migration" --backup --recursive .
sr "pattern2" "replacement2" --session "large_migration" --backup --recursive .

Monitoring Progress

# Count total matches first
sr --count-only "pattern" --recursive . > initial_count.txt

# Show progress during operation
sr "pattern" "replacement" --dry-run --recursive . | tail -5

Benchmarking & Profiling

Basic Benchmarking

Measure operation time:

# Time the operation
time sr "pattern" "replacement" --recursive directory/

# Output shows: real, user, sys time

Dry-Run Performance

Test pattern performance without file writes:

# Fast validation without disk I/O
time sr "pattern" "replacement" --dry-run --recursive .

Profiling with Count-Only

# Fast pattern validation
time sr --count-only "pattern" --recursive .

Advanced Performance Tuning

Memory Caching

Enable regex pattern caching:

export SR_CACHE_PATTERNS="true"
export SR_CACHE_SIZE="1000" # Cache up to 1000 patterns

I/O Optimization

Optimize file I/O:

export SR_USE_MMAP="true" # Use memory-mapped I/O (faster for large files)
export SR_DIRECT_IO="true" # Bypass filesystem cache

CPU Affinity

Bind workers to specific CPU cores:

export SR_CPU_AFFINITY="0,1,2,3" # Use cores 0-3

Performance Troubleshooting

Slow Performance Diagnostics

Problem: Operation takes too long

# 1. Check if pattern is matching
sr --count-only "pattern" --recursive . | head -20

# 2. Check file count
find . -type f | wc -l

# 3. Check largest files
find . -type f -exec wc -c {} \; | sort -rn | head -10

# 4. Check excluded directories
find . -type d -name "node_modules" -o -name ".git" | wc -l

Bottleneck Analysis

Pattern Matching Bottleneck: Use simpler patterns
Disk I/O Bottleneck: Increase buffer size, use SSD
CPU Bottleneck: Enable parallel processing
Memory Bottleneck: Reduce buffer size, process in batches

Common Performance Issues

Issue: Using regex on large files

Solution: Use literal strings when possible

# Slow
sr --regex "old" "new" huge_file.txt

# Fast
sr "old" "new" huge_file.txt

Issue: Processing node_modules or .git

Solution: Exclude large directories

sr "pattern" "replacement" --exclude "node_modules" --exclude ".git" --recursive .

Issue: Many small files

Solution: Enable parallel processing

export SR_MAX_PARALLEL_JOBS="8"
sr "pattern" "replacement" --recursive .

Performance Summary Table

Optimization	Impact	Effort	Priority
Skip binary files	2-3x	Low	HIGH
Exclude directories	5-10x	Low	HIGH
Use literal strings	5-10x	Medium	HIGH
Parallel processing	2-4x	Medium	MEDIUM
Buffer tuning	1-2x	Medium	MEDIUM
Batch operations	1-1.5x	High	LOW

Next Steps

Review Tool Configuration for detailed tuning options
See Command Reference for all command options
Check Troubleshooting for common issues

Performance Tuning

Performance Tuning

Table of Contents

Quick Optimization Rules

1. Skip Binary Files

2. Exclude Large Directories

3. Use --dry-run for Validation

4. Limit File Recursion Depth

5. Use Efficient Patterns

Configuration Optimization

Buffer Size Tuning

Worker Thread Configuration

Timeout Configuration

Search Patterns & Matching

Literal Strings vs Regex

Pattern Specificity

Case-Sensitive Matching

File Processing Optimization

Target Specific File Types

Filter by File Size

Exclude Problematic Directories

Memory & Buffer Management

Processing Large Files

Streaming vs Loading

Parallel Processing

Enable Parallel Processing

Job Distribution

Monitor Parallel Performance

Large-Scale Operations

Batch Operations

Session Tracking for Large Operations

Monitoring Progress

Benchmarking & Profiling

Basic Benchmarking

Dry-Run Performance

Profiling with Count-Only

Advanced Performance Tuning

Memory Caching

I/O Optimization

CPU Affinity

Performance Troubleshooting

Slow Performance Diagnostics

Bottleneck Analysis

Common Performance Issues

Performance Summary Table

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally