Skip to content

Performance Tuning

Mikhail Deynekin edited this page Dec 23, 2025 · 2 revisions

Performance Tuning

Optimize sr-search-replace for maximum speed and efficiency when working with large files, extensive directory trees, and complex replacement operations.

Table of Contents

Quick Optimization Rules

Follow these rules for immediate performance improvements:

1. Skip Binary Files

Always use --skip-binary to avoid processing binary files:

sr "pattern" "replacement" --skip-binary --recursive .

Impact: 2-3x faster on mixed directories (code + images + binaries)

2. Exclude Large Directories

Exclude unnecessary directories (node_modules, .git, build artifacts):

sr "pattern" "replacement" --exclude "node_modules" --exclude ".git" --exclude "dist" --recursive .

Impact: Eliminates processing of millions of files

3. Use --dry-run for Validation

Test patterns before large operations:

sr --count-only "pattern" --recursive . # Validate quickly
sr "pattern" "replacement" --dry-run --recursive . # Preview changes

Impact: Prevents unnecessary processing if pattern is incorrect

4. Limit File Recursion Depth

When possible, target specific directories rather than entire filesystem:

# Slow: searches entire filesystem
sr "pattern" "replacement" --recursive /

# Fast: targets specific directory
sr "pattern" "replacement" --recursive /project/src/

Impact: 5-10x faster by reducing scope

5. Use Efficient Patterns

Avoid overly complex regex patterns:

# Slow: complex lookahead/lookbehind
sr --regex "(?<=\s)old(?=\s)" "new" *.txt

# Fast: simple word boundary
sr --regex "\bold\b" "new" *.txt

Configuration Optimization

Buffer Size Tuning

Increment buffer size for large files:

# In code configuration
BUFFER_SIZE = 65536 # 64 KB (default: 8 KB)
MAX_BUFFER_SIZE = 268435456 # 256 MB

When to increase:

  • Processing files > 100 MB
  • Many small files in one operation
  • System has plenty of RAM (> 8 GB)

When to keep small:

  • Limited RAM (< 2 GB)
  • Many concurrent processes
  • Processing thousands of files simultaneously

Worker Thread Configuration

Configure parallel processing:

# Environment variables
export SR_MAX_WORKERS="4" # Number of parallel threads
export SR_WORKER_TIMEOUT="300" # Timeout per thread (seconds)

Recommended settings:

  • CPUs/Cores = 4: Use 2-3 workers
  • CPUs/Cores = 8: Use 4-6 workers
  • CPUs/Cores = 16+: Use 8-12 workers

Timeout Configuration

Optimize operation timeout:

export SR_TIMEOUT="600" # 10 minutes for large operations

Guidelines:

  • Large files (> 500 MB): Use 600-900 seconds
  • Recursive directory trees: Use 300-600 seconds
  • Single file replacement: Use 60-120 seconds

Search Patterns & Matching

Literal Strings vs Regex

Literal strings are 5-10x faster than regex:

# Slow: regex processing
sr --regex "old_text" "new_text" *.txt

# Fast: literal string matching
sr "old_text" "new_text" *.txt

Use regex only when necessary (character classes, quantifiers, anchors)

Pattern Specificity

More specific patterns process faster:

# Slow: many matches to process
sr "import" "import" *.py # Matches hundreds of times

# Fast: few matches
sr "from old_module import" "from new_module import" *.py # Matches 5 times

Case-Sensitive Matching

Case-sensitive search is faster:

# Slower: case-insensitive requires extra processing
sr --ignore-case "error" "warning" *.log

# Faster: case-sensitive
sr "Error" "Warning" *.log

File Processing Optimization

Target Specific File Types

Use file patterns to minimize scope:

# Process only Python files
sr "pattern" "replacement" *.py

# Process JavaScript and TypeScript
sr "pattern" "replacement" *.{js,ts}

# Process specific directory
sr "pattern" "replacement" src/*.js

Filter by File Size

Skip large binary files:

sr "pattern" "replacement" --skip-binary --recursive .

Exclude Problematic Directories

sr "pattern" "replacement" \
 --exclude "node_modules" \
 --exclude ".git" \
 --exclude "dist" \
 --exclude "build" \
 --exclude "*.min.js" \
 --recursive .

Memory & Buffer Management

Processing Large Files

For files > 100 MB:

# Increase buffer size
export SR_BUFFER_SIZE="262144" # 256 KB

# Process with adequate timeout
sr "pattern" "replacement" --timeout "600" large_file.txt

Streaming vs Loading

Configuration for different file sizes:

# For files < 50 MB: Load entire file
SR_STREAM_MODE="false"
SR_BUFFER_SIZE="65536"

# For files 50-500 MB: Stream processing
SR_STREAM_MODE="true"
SR_BUFFER_SIZE="131072"

# For files > 500 MB: Large buffer streaming
SR_STREAM_MODE="true"
SR_BUFFER_SIZE="262144"

Parallel Processing

Enable Parallel Processing

Process multiple files simultaneously:

export SR_PARALLEL="true"
export SR_MAX_PARALLEL_JOBS="4"
sr "pattern" "replacement" --recursive large_directory/

Job Distribution

Optimal job distribution by file count:

Files: 1-10 -> Use 1 worker
Files: 11-100 -> Use 2-3 workers
Files: 101-1000 -> Use 4-6 workers
Files: 1000+ -> Use 8-12 workers

Monitor Parallel Performance

# Enable performance logging
export SR_DEBUG="true"
sr "pattern" "replacement" --recursive . 2>&1 | grep -i "worker\|thread\|time"

Large-Scale Operations

Batch Operations

Process large operations in batches:

# Phase 1: Small files (faster)
sr "pattern1" "replacement1" --exclude ">10M" --recursive .

# Phase 2: Large files (slower)
sr "pattern1" "replacement1" --exclude "<10M" --recursive .

Session Tracking for Large Operations

# Create session for tracking
sr "pattern1" "replacement1" --session "large_migration" --backup --recursive .
sr "pattern2" "replacement2" --session "large_migration" --backup --recursive .

Monitoring Progress

# Count total matches first
sr --count-only "pattern" --recursive . > initial_count.txt

# Show progress during operation
sr "pattern" "replacement" --dry-run --recursive . | tail -5

Benchmarking & Profiling

Basic Benchmarking

Measure operation time:

# Time the operation
time sr "pattern" "replacement" --recursive directory/

# Output shows: real, user, sys time

Dry-Run Performance

Test pattern performance without file writes:

# Fast validation without disk I/O
time sr "pattern" "replacement" --dry-run --recursive .

Profiling with Count-Only

# Fast pattern validation
time sr --count-only "pattern" --recursive .

Advanced Performance Tuning

Memory Caching

Enable regex pattern caching:

export SR_CACHE_PATTERNS="true"
export SR_CACHE_SIZE="1000" # Cache up to 1000 patterns

I/O Optimization

Optimize file I/O:

export SR_USE_MMAP="true" # Use memory-mapped I/O (faster for large files)
export SR_DIRECT_IO="true" # Bypass filesystem cache

CPU Affinity

Bind workers to specific CPU cores:

export SR_CPU_AFFINITY="0,1,2,3" # Use cores 0-3

Performance Troubleshooting

Slow Performance Diagnostics

Problem: Operation takes too long

# 1. Check if pattern is matching
sr --count-only "pattern" --recursive . | head -20

# 2. Check file count
find . -type f | wc -l

# 3. Check largest files
find . -type f -exec wc -c {} \; | sort -rn | head -10

# 4. Check excluded directories
find . -type d -name "node_modules" -o -name ".git" | wc -l

Bottleneck Analysis

  1. Pattern Matching Bottleneck: Use simpler patterns
  2. Disk I/O Bottleneck: Increase buffer size, use SSD
  3. CPU Bottleneck: Enable parallel processing
  4. Memory Bottleneck: Reduce buffer size, process in batches

Common Performance Issues

Issue: Using regex on large files

Solution: Use literal strings when possible

# Slow
sr --regex "old" "new" huge_file.txt

# Fast
sr "old" "new" huge_file.txt

Issue: Processing node_modules or .git

Solution: Exclude large directories

sr "pattern" "replacement" --exclude "node_modules" --exclude ".git" --recursive .

Issue: Many small files

Solution: Enable parallel processing

export SR_MAX_PARALLEL_JOBS="8"
sr "pattern" "replacement" --recursive .

Performance Summary Table

Optimization Impact Effort Priority
Skip binary files 2-3x Low HIGH
Exclude directories 5-10x Low HIGH
Use literal strings 5-10x Medium HIGH
Parallel processing 2-4x Medium MEDIUM
Buffer tuning 1-2x Medium MEDIUM
Batch operations 1-1.5x High LOW

Next Steps

Clone this wiki locally