General purpose computing on graphics processing units (GPGPU) is fast becoming a common feature of high performance computing centers. In this paper we discuss some implementation issues related to dense linear algebra computations on GPUs, such as the GEneral Matrix-Matrix product, as well as other kernels sharing the same computational pattern, such as the matrix form of the All-Pairs Shortest-Path problem. Our CUDA implementation has shown a significant improvement on the NVIDIA processing units over the vendor's software. We review the optimization techniques that can be employed to implement such operations, as well as outline further development work in connected application domains.
Barbieri, D., Cardellini, V., Filippone, S. (2010). Generalized GEMM Kernels on GPGPUs: experiments and applications. In Parallel Computing: From Multicores and GPU's to Petascale (pp.307-314). Fairfax, VA : IOS Press [10.3233/978-1-60750-530-3-307].
Generalized GEMM Kernels on GPGPUs: experiments and applications
CARDELLINI, VALERIA;FILIPPONE, SALVATORE
2010-04-01
Abstract
General purpose computing on graphics processing units (GPGPU) is fast becoming a common feature of high performance computing centers. In this paper we discuss some implementation issues related to dense linear algebra computations on GPUs, such as the GEneral Matrix-Matrix product, as well as other kernels sharing the same computational pattern, such as the matrix form of the All-Pairs Shortest-Path problem. Our CUDA implementation has shown a significant improvement on the NVIDIA processing units over the vendor's software. We review the optimization techniques that can be employed to implement such operations, as well as outline further development work in connected application domains.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


