Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance, speedup, software maintainability, and extensibility. We consider several solutions to implement the data exchange with the GPU focusing on the data access and transfer, and present an experimental evaluation for a cluster system with up to two GPUs per node. In particular, we compare the pinned memory and the Open-MPI approaches, which are the two most used alternatives for multi-GPU communication in a cluster environment. We find that OpenMPI turns out to be the best solution for large data transfers, while the pinned memory approach is still a good solution for small transfers between GPUs.
Cardellini, V., Fanfarillo, A., Filippone, S. (2014). Sparse matrix computations on clusters with GPGPUs. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS 2014) (pp.23-30). IEEE.
Sparse matrix computations on clusters with GPGPUs
CARDELLINI, VALERIA;FILIPPONE, SALVATORE
2014-01-01
Abstract
Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance, speedup, software maintainability, and extensibility. We consider several solutions to implement the data exchange with the GPU focusing on the data access and transfer, and present an experimental evaluation for a cluster system with up to two GPUs per node. In particular, we compare the pinned memory and the Open-MPI approaches, which are the two most used alternatives for multi-GPU communication in a cluster environment. We find that OpenMPI turns out to be the best solution for large data transfers, while the pinned memory approach is still a good solution for small transfers between GPUs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.