-
Notifications
You must be signed in to change notification settings - Fork 772
Description
Problem Description
Syft consumes 3-5+ GB of RAM when generating SBOMs for large JavaScript applications (10,000+ packages). This makes Syft unusable for many real-world projects and causes OOM errors in CI/CD environments with limited memory.
Root Causes
Memory profiling (available in MEMORY_ANALYSIS.md) identified five primary issues:
- JavaScript Lock File Parsers - Load entire documents into memory without streaming (~150-300MB)
- Dependency Resolution - O(n²) complexity with extensive string operations (~70-130MB)
- Package ID Generation - Creates large string representations of metadata (~20-100MB)
- License Scanning - Initializes massive regex structures at startup (~8-15MB)
- File Indexing - Keeps entire file trees in memory (~7-15MB)
For a typical large JavaScript project, peak memory allocation is 355-760MB+, but with GC pressure and runtime overhead, this grows to 3-5GB+.
See the complete analysis: MEMORY_ANALYSIS.md
Solution: 5-Phase Optimization Plan
Phase 1: Quick Wins ✅ (PR #4585)
Goal: Reduce memory by 15-20% (100-150MB)
Status: ✅ Submitted for review
- Optimize string operations in JS parsers
- Reduce string duplication in dependency resolution
- Implement lazy license scanner initialization
Expected Impact: 100-150MB reduction
Tracking: PR #4585
Phase 2: Parser Optimization 🚧
Goal: Reduce memory by 30-40% (200-300MB)
Status: Not started
- Stream JSON parsing for package-lock.json
- Optimize yarn.lock line-by-line parsing
- Implement streaming YAML parsing for pnpm-lock.yaml
Expected Impact: 200-300MB reduction
Cumulative Impact: 40-50% total reduction
Phase 3: Dependency Resolution 🚧
Goal: Reduce memory by 15-25% (100-150MB)
Status: Not started
- Implement incremental resolution
- Optimize set operations throughout
- Reduce memory pressure in concurrent processing
Expected Impact: 100-150MB reduction
Cumulative Impact: 55-65% total reduction
Phase 4: ID Generation Optimization 🚧
Goal: Reduce memory by 10-20% (50-100MB)
Status: Not started
- Implement selective metadata hashing
- Optimize sorting to avoid metadata stringification
- Cache ID computations where possible
Expected Impact: 50-100MB reduction
Cumulative Impact: 65-75% total reduction
Phase 5: Advanced Optimizations 🚧
Goal: Final 10-15% reduction (50-100MB)
Status: Not started
- Implement memory pooling for frequently used structures
- Add chunked processing for large datasets
- Add configuration options for memory limits
Expected Impact: 50-100MB reduction
Cumulative Impact: 75-80% total reduction
Expected Overall Impact
| Phase | Reduction | Cumulative |
|---|---|---|
| Phase 1 | 100-150MB | 15-20% (✅ Done) |
| Phase 2 | 200-300MB | 40-50% |
| Phase 3 | 100-150MB | 55-65% |
| Phase 4 | 50-100MB | 65-75% |
| Phase 5 | 50-100MB | 75-80% |
Final Goal: Reduce peak memory from 3-5+ GB to 600MB-1GB
Related PRs
Testing & Validation
Performance metrics to track:
- Peak memory allocation (heap profiling)
- Allocation rate (pprof)
- GC pause times and frequency
- Execution time (ensure no regression)
Test cases needed:
- Small JS projects (<100 packages)
- Medium JS projects (100-1,000 packages)
- Large JS projects (1,000-10,000 packages)
- Very large JS projects (10,000+ packages)
Benchmark commands:
# Run with memory profiling
go test -bench=. -benchmem -memprofile=mem.prof ./...
# Compare profiles
go tool pprof -base mem_before.prof mem_after.prof
# Visualize allocations
go tool pprof -http=:8080 mem.profAdditional Considerations
- Configuration: Add memory limits and graceful degradation
- Metrics: Expose memory usage metrics for monitoring
- Documentation: Update docs with memory requirements
- Testing: Add regression tests for memory usage
Motivation
Large JavaScript applications are common in modern development. Syft should be able to handle these without requiring excessive resources. These optimizations will:
- Make Syft viable for real-world projects
- Reduce CI/CD costs (smaller memory requirements)
- Prevent OOM errors in production
- Improve overall user experience
References
- Full analysis: See
MEMORY_ANALYSIS.mdin the codebase - Profiling data: Available in
pprof_baseline/directories - Original issue: High memory consumption for JS projects
Metadata
Metadata
Assignees
Labels
Type
Projects
Status