GNNIE: GNN Inference Engine with Load-balancing and Graph-specific Caching
TimeWednesday, July 13th1:30pm - 1:53pm PDT
Location3002, Level 3
AI/ML Design: Circuits and Architecture
DescriptionGraph neural networks (GNN) inferencing involves weighting vertex feature vectors, followed by aggregating weighted vectors over a vertex neighborhood. High and variable sparsity in the input vertex feature vectors, and high sparsity and power-law degree distributions in the adjacency matrix, can lead to (a) unbalanced loads and (b) inefficient random memory accesses. GNNIE ensures load-balancing by splitting features into blocks, proposing a flexible MAC architecture, and employing load (re)distribution. GNNIE's novel caching scheme bypasses the high costs of random DRAM accesses. GNNIE shows high speedups over CPUs/GPUs; it is faster and runs a broader range of GNNs than existing accelerators.