Adaptive Sparsity-Aware Cloud Offloading for Edge DNN Inference
TimeTuesday, July 12th6pm - 7pm PDT
LocationLevel 2 Lobby
Event Type
Networking Reception
Work-in-Progress Poster
DescriptionEfficient AI using pruning and quantization has witnessed gigantic leaps over the past few years. However, bigger state-of-the-art models still cannot be accommodated on resource-constrained devices. Hybrid edge-cloud execution improves the capabilities of edge AI. In this work, we observe that the activations of DNN layers are inherently sparse. Accordingly, we propose a novel sparsity-aware cloud offloading technique that maximizes inference efficiency. Our technique adapts the offloading decision to the activation sparsity besides the network bandwidth and the battery condition. Our sparsity-aware offloading technique saves inference latency and energy consumption. Moreover, it improves the throughput on streaming tasks.