SALO: an Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
TimeWednesday, July 13th1:53pm - 2:15pm PDT
Location3002, Level 3
AI/ML Design: Circuits and Architecture
DescriptionThe attention mechanisms of transformers effectively extract pertinent information from the sequence. However, the quadratic complexity of self-attention w.r.t the sequence length incurs heavy computational and memory burdens, especially for tasks with long sequences. Existing accelerators face performance degradation in these tasks. To this end, we propose SALO, an efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences. SALO consists of a PE array with diagonal connections, global PE rows and columns, peripheral circuits, and an associated dataflow. We show that SALO achieves 7.38X, 83.57X acceleration ratio compared to GPU and CPU implementations on LongFormer, without accuracy loss.