Neural Network Layer Assignment for Distributed Inference via Integer Programming
TimeTuesday, July 12th6pm - 7pm PDT
LocationLevel 2 Lobby
Event Type
Networking Reception
Work-in-Progress Poster
DescriptionThe rising availability of networked edge devices highlights new opportunities for distributed artificial intelligence. This work proposes an Integer Linear Programming optimization scheme to assign layers of a neural network to the available devices in a distributed settings with heterogenous devices representing edge, hub, and cloud, to minimize the overall inference latency . The assignment is optimally found subject to our latency models which capture aspects such as the external bandwidths of the devices for device-to-device communication across the network, and allowing for pre-loading of layer weights in a device as it waits to receive results of an earlier layer.