PULP-TrainLib: Enabling On-Device Training for RISC-V Multi-Core MCUs through Performance-Driven Autotuning
TimeTuesday, July 12th6pm - 7pm PDT
LocationLevel 2 Lobby
Event Type
Networking Reception
Work-in-Progress Poster
DescriptionAn open challenge in making Internet-of-Things sensor nodes “smart” and self-adaptive is to enable on-chip Deep Neural Network training on Ultra-Low-Power (ULP) microcontroller units (MCUs). We present PULP-TrainLib, a tool to deploy training tasks on RISC-V-based Parallel-ULP (PULP) MCUs. Our tool automatically selects (autotunes) the fastest design among a set of tiling options and optimized floating-point matrix multiplication primitives, layer-by-layer, according to the involved tensor shapes. Results on an 8-core RISC-V MCU show that our auto-tuned primitives improve MAC/cycle by 2.4× compared to a "one-size-fits-all" matrix multiplication, achieving up to 4.39 MAC/cycle - 36.6× better than a commercial STM32L4 MCU.