GTuner: Tuning DNN Computations on GPU via Graph Attention Network
TimeThursday, July 14th2:37pm - 3pm PDT
Location3004, Level 3
DescriptionIt is an open problem to improve the performance of DNN models on GPU. A novel framework, GTuner, is proposed to learn from the structures of computational graphs and the statistical features jointly to find the optimal implementations. A graph attention network (GAN) is designed as the performance estimator. Convolutional layers are used to propagate the information in the graph. A multi-head attention pooling module is designed with no loss of structural information. The GPU codes are generated according to the optimal configurations found by GTuner. Experimental results demonstrate that our method outperforms the previous arts remarkably.