Performance Optimization of Embedded FPGA
TimeWednesday, July 13th5pm - 6pm PDT
LocationLevel 2 Exhibit Hall
Engineering Track Poster
DescriptionIt has significant advantages for an ASIC chip to be integrated with embedded FPGA, which gives extra flexibility to the chip, increases its life cycle, and reduces its power. There are a sea of resources in embedded FPGAs such as lookup tables (LUTs), block memory, and hardened DSP blocks. It has always been a challenge to map applications into the FPGA properly while achieving high performance. We will use an open source encryption algorithm AES as an example to show how to map this efficiently. First, without changing the design, we can achieve 177 MHz for the worst corner in the worst condition in TSMC 16nm process. This design requires 100 dual-port block RAMs. FPGAs also have a rich set of writable LUTs, which can be used as distributed RAM. By changing the RTL slightly with adding pragmas as the hint for the synthesis tool to map the AES S-box lookup tables into the distributed RAMs, we can reduce the number of embedded FPGA tiles required and also improve the performance to 313 MHz. Furthermore, we apply placement grouping constraints to guide the tool and this can further increase the performance to 500 MHz. In this paper, we present techniques and novel methods to optimize performance for embedded FPGAs so your SoCs can leverage the benefits of eFPGAs.