Mentha: Enabling Sparse-Aware Computation on Systolic Arrays
TimeTuesday, July 12th6pm - 7pm PDT
LocationLevel 2 Lobby
DescriptionGEMM is a kernel of complex operations in many domains. Unfortunately, the matrices are always sparse, which leads waste of computation and storage. However, we can take full advantage of sparsity to improve performance of matrix multiplication. Accordingly, we propose a design that enables systolic arrays to support sparse matrix multiplication by using compression algorithms without losing accuracy. Besides, we reconfigure PEs at a low cost (1.16x in area) and it performs 1.2~3.3x in SpMM and 1.3~4.4x in SpGEMM better than baseline, and at least 9.7x better than cuSPARSE. Furthermore, experimental results show roughly 3.2x FLOPs reduction in neural network.