MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning
TimeWednesday, July 13th10:52am - 11:15am PDT
Location3000, Level 3
ML Algorithms and Applications
DescriptionThis work focuses on a memory and energy-efficient approach to multi-task inference called MIME. In MIME, a parent task and multiple child tasks are run on a DNN. Each child task utilizes the weight parameters of the parent task and in addition, learns task-specific threshold parameters. This way, different sub-networks within the DNN are activated for task-specific inference, causing input-dependent neuronal pruning. Our analyses on a systolic-array hardware show that MIME results in reduction in DRAM storage for task-specific parameters (weights and thresholds). In Pipelined-task-mode, MIME significantly reduces the hardware energies associated with dot-product computations and communications and increases throughput.