Optimizing Parallel PREM Compilation over Nested Loop Structures
TimeThursday, July 14th3:50pm - 4:10pm PDT
Location3004, Level 3
Event Type
Research Manuscript
Time-Critical System Design
Embedded Systems
DescriptionWe discuss automatic parallelization of computational kernels executed according to the Predictable Execution Model (PREM). We employ data analysis and loop tiling to split the kernel execution into segments, and schedule computation and memory phases across cores to avoid unpredictable contention in main memory. Our main observation is that properly selecting tile sizes is key to optimize the makespan of the kernel. We thus propose a heuristic that efficiently searches for optimized tile size and core assignments over deeply nested loops. We demonstrate our approach on the PolyBench-NN benchmark suite, showing that it significantly outperforms the state-of-the-art in PREM compilation.