parallel-systems-lab

This repository contains the reports and source code written for the lab of the Parallel Processing Systems course of the school of Electrical and Computer Engineering at the National Technical University of Athens.

The lab consists of 4 different exercises (currently finished the first two) :

Exercise 1 : Familiarization with the programming environment

Its main goal was to parallelize a serial version of Conway's Game Of Life on a shared memory architecture using OpenMP's API .

The Report
More info on /parlab-ex01

Exercise 2 : Algorithm Parallelization and Optimization in Shared Memory Architectures

The goal was to parallelize the K-means Clustering Algorithm and the Floyd-Warshall Algorithm on a shared memory architecture (NUMA node) using OpenMP's API.

For the K-means clustering algorithm, I was assigned to develop two parallel version, the one having shared cluster arrays (between the threads) and updating them with atomic operations and the other having copied clusters for each thread and later reducing them to one final array.
Benchmarked and compared 5 different Lock implementations on the K-means Clustering algorithm, having understood the differences in their implementations.
For the Floyd-Warshall algorithm, the goal was to parallelize its recursive version (more cache friendly in comparison to the iterative) using OpenMP's Tasks.
Benchmarked and compared the serial and parallel version in a NUMA node and observed the different tradeoffs of this architecture.
Benchmarked and compared 5 Concurrent Linked List implementations and commented on their differences in performance.
The Report
More info on /parlab-ex02

Exerise 3 : Algorithm Parallelization and Optimization on GPUs

The goal was to parallelize 4 different versions of the K-means algorithm on a GPU using Nvidia's CUDA API.

The first version is called Naive due to non-uniform memory accesses.
The second version is called Transpose due to transposing two of the arrays in order to perform uniform memory accesses.
The third version is called Shared due to placing the clusters array onto the GPU's shared memory for each thread block.
The fourth version is called Full-Offload (All-GPU) due to avoiding CPU and GPU communication between the program's loop and and instead performing the entirety of the loops on the GPU (with minimal communication between them).
Thoroughly benchmarked the 4 versions where I observed significant performance improvement to performing the algorithm on the solely on the CPU.
As expected (and for reasons explained in the report), the best performing version is the Full-Offload.
Plotted the results of the benchmarkes and explained the reasons I observed performance differences between the 4 versions through exploring the GPU's and CUDA's internals.
The Report
More info on /parlab-ex03

Exerise 4 : Algorithm Parallelization and Optimization on distributed memory architectures.

The goal was to parallelize 2 different algorithms, the K-means and the 2-d Heat Transfer, assuming a distributed memory architecture and using MPI.

The K-means algorithm was parallelized by assigning each MPI process different objects and communicating between them in each iteration.
The 2 dimensional Heat Transfer problem was solved using the Jacobi method (method for solving partial differential equations) where each MPI process was assigned different (smaller) blocks of the 2d global block and performed communication between them when needed.
Highly suggest you take a look at the source code of the jacobi mpi implementation (Link).
Benchmarked each version for different configurations and number of MPI processes and plotted the results accordingly.
The Report
More info on /parlab-ex04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parallel-systems-lab

Exercise 1 : Familiarization with the programming environment

Exercise 2 : Algorithm Parallelization and Optimization in Shared Memory Architectures

Exerise 3 : Algorithm Parallelization and Optimization on GPUs

Exerise 4 : Algorithm Parallelization and Optimization on distributed memory architectures.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
parlab-ex01		parlab-ex01
parlab-ex02		parlab-ex02
parlab-ex03		parlab-ex03
parlab-ex04		parlab-ex04
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

parallel-systems-lab

Exercise 1 : Familiarization with the programming environment

Exercise 2 : Algorithm Parallelization and Optimization in Shared Memory Architectures

Exerise 3 : Algorithm Parallelization and Optimization on GPUs

Exerise 4 : Algorithm Parallelization and Optimization on distributed memory architectures.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages