-
Notifications
You must be signed in to change notification settings - Fork 0
Performance Tuning
Optimize sr-search-replace for maximum speed and efficiency when working with large files, extensive directory trees, and complex replacement operations.
- Quick Optimization Rules
- Configuration Optimization
- Search Patterns & Matching
- File Processing Optimization
- Memory & Buffer Management
- Parallel Processing
- Large-Scale Operations
- Benchmarking & Profiling
- Advanced Performance Tuning
- Performance Troubleshooting
Follow these rules for immediate performance improvements:
Always use --skip-binary to avoid processing binary files:
sr "pattern" "replacement" --skip-binary --recursive .Impact: 2-3x faster on mixed directories (code + images + binaries)
Exclude unnecessary directories (node_modules, .git, build artifacts):
sr "pattern" "replacement" --exclude "node_modules" --exclude ".git" --exclude "dist" --recursive .Impact: Eliminates processing of millions of files
Test patterns before large operations:
sr --count-only "pattern" --recursive . # Validate quickly
sr "pattern" "replacement" --dry-run --recursive . # Preview changesImpact: Prevents unnecessary processing if pattern is incorrect
When possible, target specific directories rather than entire filesystem:
# Slow: searches entire filesystem
sr "pattern" "replacement" --recursive /
# Fast: targets specific directory
sr "pattern" "replacement" --recursive /project/src/Impact: 5-10x faster by reducing scope
Avoid overly complex regex patterns:
# Slow: complex lookahead/lookbehind
sr --regex "(?<=\s)old(?=\s)" "new" *.txt
# Fast: simple word boundary
sr --regex "\bold\b" "new" *.txtIncrement buffer size for large files:
# In code configuration
BUFFER_SIZE = 65536 # 64 KB (default: 8 KB)
MAX_BUFFER_SIZE = 268435456 # 256 MBWhen to increase:
- Processing files > 100 MB
- Many small files in one operation
- System has plenty of RAM (> 8 GB)
When to keep small:
- Limited RAM (< 2 GB)
- Many concurrent processes
- Processing thousands of files simultaneously
Configure parallel processing:
# Environment variables
export SR_MAX_WORKERS="4" # Number of parallel threads
export SR_WORKER_TIMEOUT="300" # Timeout per thread (seconds)Recommended settings:
- CPUs/Cores = 4: Use 2-3 workers
- CPUs/Cores = 8: Use 4-6 workers
- CPUs/Cores = 16+: Use 8-12 workers
Optimize operation timeout:
export SR_TIMEOUT="600" # 10 minutes for large operationsGuidelines:
- Large files (> 500 MB): Use 600-900 seconds
- Recursive directory trees: Use 300-600 seconds
- Single file replacement: Use 60-120 seconds
Literal strings are 5-10x faster than regex:
# Slow: regex processing
sr --regex "old_text" "new_text" *.txt
# Fast: literal string matching
sr "old_text" "new_text" *.txtUse regex only when necessary (character classes, quantifiers, anchors)
More specific patterns process faster:
# Slow: many matches to process
sr "import" "import" *.py # Matches hundreds of times
# Fast: few matches
sr "from old_module import" "from new_module import" *.py # Matches 5 timesCase-sensitive search is faster:
# Slower: case-insensitive requires extra processing
sr --ignore-case "error" "warning" *.log
# Faster: case-sensitive
sr "Error" "Warning" *.logUse file patterns to minimize scope:
# Process only Python files
sr "pattern" "replacement" *.py
# Process JavaScript and TypeScript
sr "pattern" "replacement" *.{js,ts}
# Process specific directory
sr "pattern" "replacement" src/*.jsSkip large binary files:
sr "pattern" "replacement" --skip-binary --recursive .sr "pattern" "replacement" \
--exclude "node_modules" \
--exclude ".git" \
--exclude "dist" \
--exclude "build" \
--exclude "*.min.js" \
--recursive .For files > 100 MB:
# Increase buffer size
export SR_BUFFER_SIZE="262144" # 256 KB
# Process with adequate timeout
sr "pattern" "replacement" --timeout "600" large_file.txtConfiguration for different file sizes:
# For files < 50 MB: Load entire file
SR_STREAM_MODE="false"
SR_BUFFER_SIZE="65536"
# For files 50-500 MB: Stream processing
SR_STREAM_MODE="true"
SR_BUFFER_SIZE="131072"
# For files > 500 MB: Large buffer streaming
SR_STREAM_MODE="true"
SR_BUFFER_SIZE="262144"Process multiple files simultaneously:
export SR_PARALLEL="true"
export SR_MAX_PARALLEL_JOBS="4"
sr "pattern" "replacement" --recursive large_directory/Optimal job distribution by file count:
Files: 1-10 -> Use 1 worker
Files: 11-100 -> Use 2-3 workers
Files: 101-1000 -> Use 4-6 workers
Files: 1000+ -> Use 8-12 workers
# Enable performance logging
export SR_DEBUG="true"
sr "pattern" "replacement" --recursive . 2>&1 | grep -i "worker\|thread\|time"Process large operations in batches:
# Phase 1: Small files (faster)
sr "pattern1" "replacement1" --exclude ">10M" --recursive .
# Phase 2: Large files (slower)
sr "pattern1" "replacement1" --exclude "<10M" --recursive .# Create session for tracking
sr "pattern1" "replacement1" --session "large_migration" --backup --recursive .
sr "pattern2" "replacement2" --session "large_migration" --backup --recursive .# Count total matches first
sr --count-only "pattern" --recursive . > initial_count.txt
# Show progress during operation
sr "pattern" "replacement" --dry-run --recursive . | tail -5Measure operation time:
# Time the operation
time sr "pattern" "replacement" --recursive directory/
# Output shows: real, user, sys timeTest pattern performance without file writes:
# Fast validation without disk I/O
time sr "pattern" "replacement" --dry-run --recursive .# Fast pattern validation
time sr --count-only "pattern" --recursive .Enable regex pattern caching:
export SR_CACHE_PATTERNS="true"
export SR_CACHE_SIZE="1000" # Cache up to 1000 patternsOptimize file I/O:
export SR_USE_MMAP="true" # Use memory-mapped I/O (faster for large files)
export SR_DIRECT_IO="true" # Bypass filesystem cacheBind workers to specific CPU cores:
export SR_CPU_AFFINITY="0,1,2,3" # Use cores 0-3Problem: Operation takes too long
# 1. Check if pattern is matching
sr --count-only "pattern" --recursive . | head -20
# 2. Check file count
find . -type f | wc -l
# 3. Check largest files
find . -type f -exec wc -c {} \; | sort -rn | head -10
# 4. Check excluded directories
find . -type d -name "node_modules" -o -name ".git" | wc -l- Pattern Matching Bottleneck: Use simpler patterns
- Disk I/O Bottleneck: Increase buffer size, use SSD
- CPU Bottleneck: Enable parallel processing
- Memory Bottleneck: Reduce buffer size, process in batches
Issue: Using regex on large files
Solution: Use literal strings when possible
# Slow
sr --regex "old" "new" huge_file.txt
# Fast
sr "old" "new" huge_file.txtIssue: Processing node_modules or .git
Solution: Exclude large directories
sr "pattern" "replacement" --exclude "node_modules" --exclude ".git" --recursive .Issue: Many small files
Solution: Enable parallel processing
export SR_MAX_PARALLEL_JOBS="8"
sr "pattern" "replacement" --recursive .| Optimization | Impact | Effort | Priority |
|---|---|---|---|
| Skip binary files | 2-3x | Low | HIGH |
| Exclude directories | 5-10x | Low | HIGH |
| Use literal strings | 5-10x | Medium | HIGH |
| Parallel processing | 2-4x | Medium | MEDIUM |
| Buffer tuning | 1-2x | Medium | MEDIUM |
| Batch operations | 1-1.5x | High | LOW |
- Review Tool Configuration for detailed tuning options
- See Command Reference for all command options
- Check Troubleshooting for common issues