An In-Memory-Computing Architecture for Recommendation Systems
TimeWednesday, July 13th11:15am - 11:37am PDT
Location3002, Level 3
In-memory and Near-memory Computing
DescriptionTypical recommendation systems (RecSys) must handle large embedding tables to suggest items to users efficiently. The memory size and bandwidth of the conventional computing architecture usually restrict the performance of RecSys. This work proposes an in-memory-computing architecture (iMARS) for accelerating the two stages of DNN-based RecSyss (i.e., filtering and ranking) in a combined fashion. iMARS leverages an emerging ferroelectric FET-based fabric that can be configured to switch between a content addressable memory mode and a general-purpose compute-in-memory mode to support RecSys operations. Detailed circuit-level simulations and system-level evaluations show that achieves 17X end-to-end latency improvement compared to a GPU-based solution.