Effective Zero Compression on ReRAM-based Sparse DNN Accelerators
TimeThursday, July 14th11:37am - 12pm PDT
Location3005, Level 3
Event Type
Research Manuscript
AI/ML Design: Circuits and Architecture
DescriptionFor DNN inference Resistive RAM (ReRAM) crossbars have emerged as a promising building block to compute matrix multiplication in an area- and power-efficient manner. To improve inference throughput sparse networks can be deployed on the ReRAM-based accelerator. While unstructured pruning maintains both high accuracy and high sparsity, it performs poorly on the crossbar architecture due to the irregular locations of pruned weights. Thus, we propose a novel weight mapping scheme to increase the weight compression ratio by effectively clustering zero weights via filter reordering. We also introduce a weight recovery scheme to further improve accuracy or compression ratio, or both.