Hardware/Software Co-design with High-Level Synthesis
TimeMonday, July 11th1:30pm - 5pm PDT
Location3005, Level 3
DescriptionSoftware has many desirable characteristics. But it is slow and very energy hungry compared to hardware. When algorithms implemented as software cannot meet performance or power requirements, moving the function to hardware will improve the speed and efficiency of the system. High-Level Synthesis (HLS) takes a description of an algorithm, typically in C or C++, and produces a synthesizable RTL description suitable for implementation. Given the similarity of software source code and an algorithmic description, HLS offers a practical and easy way to migrate software functions to hardware.
This tutorial will demonstrate this transformation on an example algorithm from a full software implementation to a high performance, efficient implementation with a mix of hardware and software elements. The example will be a neural network based “wake word” detection algorithm, one commonly used to “wake” a personal assistant or other device. Since a wake word algorithm needs to run continuously even if the device is hibernating, power consumption is critical, especially if the device is battery powered.
The algorithm will start as a full software implementation running on a RISC-V core. The base design will be constructed by leveraging the Embedded Scalable Platforms (ESP) project developed at Columbia University. ESP is an open-source platform that supports research on the design and programming of heterogeneous SoC architectures. The algorithm will be characterized for performance and power consumption when implemented in a Global Foundries silicon process technology.
The algorithm will be profiled to determine the functions with the largest computational load. Using HLS, these functions will be compiled into accelerators described as synthesizable RTL. ESP’s accelerator design flow will be used to create the hardware and software interfaces for the accelerator and integrate the accelerator into the ESP design. The accelerator designs will be taken through RTL synthesis and place and route for a detailed analysis of power, performance, and area (PPA). Post synthesis, the design will be optimized for power on GlobalFoundries silicon process technology. Given the automation in the creation, integration, and optimization of the accelerator, it will be shown how it is practical to iterate over several design alternatives. PPA metrics will be collected for the complete SoC, including the processor, memory, interconnect, and accelerators. Various architectures and their PPA metrics will be presented and compared.
All sources used in the tutorial example designs will be made available as open-source code.