A 28 nm 1.3 TFLOPS/mm2 Floating-Point SRAM-Based CIM Macro
A 28 nm 1.3 TFLOPS/mm2 Floating-Point SRAM-Based CIM Macro With Asynchronous Normalization and Parallel Sorting Alignment for AI-Edge Chip
Abstract:
State-of-the-art AI edge devices require floating-point (FP) multiply-accumulate (MAC) operations with high-energy efficiency and inference accuracy. FP computing in-memory (FP-CIM) has a broader range of applications compared to integer CIM. However, FP-CIM can incur greater power, delay, and area overheads than integer CIM due to the inherent complexity of FP computational flow. In this article, we introduce a new method for asynchronous exponent normalization and parallel mantissa alignment. This approach allows us to add expo-nents and find the maximum sum simultaneously. We also replace the traditional subtraction and shifting for mantissa align-ment with a cross-structure maximum-finding method, enabling FP-CIM to be achieved with lower delay, area, and power overheads. The macro is designed in TSMC 28 nm process, with a memory size of 6 Kb, a layout area of 0.067 mm2 , and an area efficiency of 1.3 TFLOPS/mm2 . Simulation results show that the macro computational frequency and energy efficiency can reach 150 MHz and 12.8 TFLOPS/W, respectively, at 900 mV, while performing FP-MAC operations.
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
A 28 nm 1.3 TFLOPS/mm2 Floating-Point SRAM-Based CIM Macro With Asynchronous Normalization and Parallel Sorting Alignment for AI-Edge Chip