Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance, speedup, software maintainability, and extensibility. We consider several solutions to implement the data exchange with the GPU focusing on the data access and transfer, and present an experimental evaluation for a cluster system with up to two GPUs per node. In particular, we compare the pinned memory and the Open-MPI approaches, which are the two most used alternatives for multi-GPU communication in a cluster environment. We find that OpenMPI turns out to be the best solution for large data transfers, while the pinned memory approach is still a good solution for small transfers between GPUs.

Cardellini, V., Fanfarillo, A., Filippone, S. (2014). Sparse matrix computations on clusters with GPGPUs. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS 2014) (pp.23-30). IEEE.

Sparse matrix computations on clusters with GPGPUs

CARDELLINI, VALERIA;FILIPPONE, SALVATORE
2014-01-01

Abstract

Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance, speedup, software maintainability, and extensibility. We consider several solutions to implement the data exchange with the GPU focusing on the data access and transfer, and present an experimental evaluation for a cluster system with up to two GPUs per node. In particular, we compare the pinned memory and the Open-MPI approaches, which are the two most used alternatives for multi-GPU communication in a cluster environment. We find that OpenMPI turns out to be the best solution for large data transfers, while the pinned memory approach is still a good solution for small transfers between GPUs.
2014 International Conference on High Performance Computing and Simulation (HPCS 2014)
Bologna, Italy
2014
Rilevanza internazionale
contributo
lug-2014
2014
Settore ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
English
https://ieeexplore.ieee.org/document/6903665
Intervento a convegno
Cardellini, V., Fanfarillo, A., Filippone, S. (2014). Sparse matrix computations on clusters with GPGPUs. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS 2014) (pp.23-30). IEEE.
Cardellini, V; Fanfarillo, A; Filippone, S
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/93529
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact