AxoNN: Energy-Aware Execution of Neural Network Inference on Multi-Accelerator Heterogeneous SoCs
TimeThursday, July 14th2:37pm - 3pm PDT
Location3005, Level 3
Event Type
Research Manuscript
SoC, Heterogeneous, and Reconfigurable Architectures
DescriptionThe energy and latency demands of critical workload execution in mobile heterogeneous architectures can vary based on the physical system state. Many recent mobile and autonomous system-on-chips embed a diverse range of accelerators varying power and performance characteristics, which can be utilized to achieve fine trade-off between energy and latency. Thus, we model scheduling on neural network inference at a layer-wise level by considering several optimization opportunities and transition overhead on heterogeneous systems. We then investigate our methodology with several characterizations on Xavier AGX SoCs and analyze our results on the Z3 theorem prover with several models with 98% accuracy.