QuiltNet: Efficient Deep Learning Inference on Multi-Chip Accelerators Using Model Partitioning
TimeThursday, July 14th4:50pm - 5:10pm PDT
Location3002, Level 3
AI/ML Design: System and Platform
DescriptionIn this paper, we propose a scalable solution to accelerate DNN models on multiple devices by devising a new model partitioning technique. Our technique transforms a DNN model into layer-wise partitioned models using the autoencoder. Since the autoencoder encodes a tensor output into a smaller dimension, we can split the neural network model into multiple pieces while significantly reducing the communication overhead to pipeline them. Our evaluation results conducted for state-of-the-art deep learning models show that the proposed technique significantly improves performance and energy efficiency.