Improving the inter-block synchronization methods in CUDA

Document Type : Persian Original Article

Authors

1 Assistant Professor Department of Computer Engineering Faculty of Engineering Ferdowsi University Of Mashhad (FUM)

2 Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran

Abstract

Abstract- The lack of explicit support for inter-block synchronization in the CUDA programming model has weakened performance in some applications. Therefore, in such applications, inter-block synchronization must be implemented in software. Lock-based and lock-free methods have been implemented for this problem. In lock-based synchronization, the execution time increases significantly with the increase in the number of blocks, and in the lock-free methods, there is a limit to the number of blocks. In this paper, two inter-block synchronization methods are proposed. The first method is lock-based, which reduces the impact of increasing the number of blocks on the execution time by grouping the blocks. The second proposed method is lock-free synchronization, which removes the limitation of the number of blocks in synchronization by creating a tree hierarchy of blocks. These methods were used for inter-block synchronization in Smith-Waterman and Bitonic algorithms. Experimental results show that the proposed lock-based method improves the execution time of the synchronization and recorded a speedup of 1.84 in the Smith-Waterman algorithm and 2.24 in the Bitonic sorting algorithm. Also, the results show that in the proposed lock-free method, any number of blocks can be synchronized by correctly choosing the number of levels of the tree hierarchy, and therefore the limitation of the number of blocks has been removed.

Keywords