This repository presents a comprehensive performance study of Sparse Matrix-Vector Multiplication (SpMV) optimized through chunk size variation and compiler flags. Conducted as part of the High-Performance Computing (HPC) module at IATIC-5, this project explores the effect of compiler optimizations and MAQAO recommendations on execution efficiency.
Sparse Matrix-Vector Multiplication (SpMV) is a fundamental kernel in scientific computing. This study investigates how varying CHUNK_SIZE and using different compilers—GCC, Intel OneAPI (ICX), and AMD AOCC—impacts execution time, memory efficiency, and thread stability.
The project combines:
- Advanced compiler flags:
-O3,-Ofast - MAQAO profiling and optimization suggestions
- SIMD (AVX2) vectorization
- Loop unrolling
- Memory alignment techniques
- Best Performance: AOCC with
CHUNK_SIZE = 1200. - Worst Performance: ICX with
-Ofaston AMD hardware. - Optimal Configuration:
CHUNK_SIZE = 1200across most tests. - MAQAO Improvements: SIMD + unrolling reduced
spmv_csr()time by ~17%.
- OS: Ubuntu 24.10 (Kernel 6.11)
- CPU: AMD 12-core with NUMA (L1: 192KiB, L2: 3MiB/core, L3: 16MiB shared)
- Compilers:
- GCC 14.2.0
- Intel oneAPI ICPX 2025.0.4
- AMD AOCC 17.0.6
- Profiler: MAQAO 2.21
├── src/ # Source code for SpMV with OpenMP
├── maqao/ # MAQAO profiling reports (HTML)
├── results/ # Execution outputs & plots
├── README.md # You are here
├── report.pdf # Full HPC report
└── scripts/ # Setup, run and analysis scripts
Clone the repository:
git clone https://github.com/yourusername/spmv-chunksize-study.git
cd spmv-chunksize-studyCompile the source using different compilers:
# GCC
make CC=gcc FLAGS="-O3 -fopenmp"
# Intel oneAPI
make CC=icx FLAGS="-O3 -fopenmp"
# AMD AOCC
make CC=clang FLAGS="-O3 -fopenmp"Run benchmarks:
OMP_PLACES=cores OMP_PROC_BIND=close ./spmv_exec 1200MAQAO suggested and we implemented:
- Memory alignment:
posix_memalign((void**)&x, 32, sizeof(double) * n);
- Loop unrolling:
for (int i = 0; i < n; i += 4) { ... }
- SIMD vectorization (AVX2):
__m256d x_vals = _mm256_load_pd(...);
Plots comparing GFlops/s, execution time, affinity stability, and array access efficiency for:
- CHUNK_SIZE = 600, 1200, 2000
- Compilers: GCC, ICX, AOCC
See the /results and /report.pdf for details.
If you use this project or report in your research or academic work:
@project{SPMV,
title={Optimizing SpMV: A Comparative Chunk\_Size Study},
author={Rochdi Dardor},
institution={Université Paris-Saclay, IATIC-5},
year={2025},
note={Supervised by M. William Jalby}
}
The full source code is not publicly available in this repository for academic and licensing reasons.
However, if you are interested in the full source code or the complete PDF report, feel free to contact me directly via email:
Rochdi Dardor
Final Year Engineer @ Université Paris-Saclay
Email: rochdi.dardor@ens.uvsq.fr
This project is for academic use only. For other use cases, please contact the author.