Improving GPU Performance via Coordinated Kernel Slicing and Multi CUDA Streaming
TimeWednesday, July 13th6pm - 7pm PDT
LocationLevel 2 Lobby
DescriptionDue to the lack of an effective stream scheduling solution, GPU multi-stream computing often results in low resource utilization. This paper proposes a GPU resources management architecture sCUDA to distribute the thread blocks to different CUDA streams, overlapping the data block transmission with task execution. Considering the execution overheads, we also propose an effective way to compute an optimal number of CUDA streams to be allocated to save GPU resources without performance degradation. Experimental results with widely used datasets demonstrate a performance improvement of up to 2.34 times under the Rodinia benchmark suite and custom microbenchmark.