Domain decomposition based on spatial locality is a classical data-parallel problem whose solution may improve by orders of magnitude when implemented on a GPU. Among the data structures involved in domain decomposition, uniform grids are widely used to speed up simulations in a number of fields, including computational physics and graphics. In this work, we present two commonly used approaches to generate uniform grids on GPUs and propose a new single-pass method that has several advantages over the previous ones. We also present some performance results of our CUDA implementation of a broad-phase collision detection algorithm for particles simulation, comparing the different methods. In some tests our method achieves a speedup of 2 compared to the fastest known method supporting a fixed maximum number of elements per cell, and a speedup of 7 compared with the fastest method without such a constraint.
Barbieri, D., Cardellini, V., Filippone, S. (2014). Fast uniform grid construction on GPGPUs using atomic operations. In Parallel Computing: Accelerating Computational Science and Engineering (CSE) (pp.295-304). Amsterdam : IOS Press [10.3233/978-1-61499-381-0-295].
Fast uniform grid construction on GPGPUs using atomic operations
CARDELLINI, VALERIA;FILIPPONE, SALVATORE
2014-01-01
Abstract
Domain decomposition based on spatial locality is a classical data-parallel problem whose solution may improve by orders of magnitude when implemented on a GPU. Among the data structures involved in domain decomposition, uniform grids are widely used to speed up simulations in a number of fields, including computational physics and graphics. In this work, we present two commonly used approaches to generate uniform grids on GPUs and propose a new single-pass method that has several advantages over the previous ones. We also present some performance results of our CUDA implementation of a broad-phase collision detection algorithm for particles simulation, comparing the different methods. In some tests our method achieves a speedup of 2 compared to the fastest known method supporting a fixed maximum number of elements per cell, and a speedup of 7 compared with the fastest method without such a constraint.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.