MPI
|
OpenMP
|
CUDA
|
|
Description
|
Multi-threaded API
|
Parallel Computing Platform
and API (NVIDIA GPU) |
|
Compiler
|
mpicc
|
gcc –fompenmp -Igomp
|
nvcc (LLVM-based)
|
Headers
|
#include <mpi.h>
|
#include <omp.h>
|
#include <cuda_runtime.h>
|
CFlags
|
-I/usr/lib/openmpi
-L/usr/lib/openmpi/…/lib -lmpi |
lgomp
|
-I/usr/local/cuda/include
-L/usr/local/cuda/lib
-lcudrt |
Memory Model
|
Shared memory and Distributed memory
|
Shared memory multiprocessing
|
Shared memory
|
Concepts
|
Point-to-point messaging
Broadcasting (1 to M) Scatter/Gather. Support R, C++, Java, Fortran, Python. |
An add-on in compiler
|
Designed to work with C/C++ and Fortran. More effective for parallel
computing: e.g., fast sort algorithms of large lists.
|
Variates
|
OpenMPI, MPICH2
|
SIMD, SIMT, SMT
|
|
Pros
|
More general solution;
Can run in clustering environments; Distributed memory is less expensive |
Easier to program;
Can still run the program as a serial code(no code change); |
Scattered reads;
Unified virtual memory; Fast shared memory; Full support for integer and bitwise operations. |
Cons
|
Code change from serial to parallel version;
Harder to debug; Bottleneck of network communication. |
Difficult to debug synchronization bugs and race conditions;
Requires compiler support;
Only in shared memory architecture;
Mostly used in loop parallelization.
|
Memory copy between host and device may incur performance hit due to
bandwidth and latency;
Thread group with 32+ threads for best performance; Valid C/C++ may not compile; C++ RTTI is not support(?). |
The information sources in the above table include, but not limited to, open-mpi.org, openmp.org, nvidia.com and Wikipedia.org.
No comments:
Post a Comment