DiP: A Scalable, Energy-Efficient Systolic Array for Matrix Multiplication Acceleration
DiP: A Scalable, Energy-Efficient Systolic Array for Matrix Multiplication Acceleration
Abstract:
Transformers are gaining increasing attention across Natural Language Processing (NLP) application domains due to their outstanding accuracy. However, these data-intensive models add significant performance demands to the existing computing architectures. Systolic array architectures, adopted by commercial AI computing platforms like Google TPUs, offer energy-efficient data reuse but face throughput and energy penalties due to input-output synchronization via First-In-First-Out (FIFO) buffers. This paper proposes a novel scalable systolic array architecture featuring Diagonal-Input and Permutated weight stationary (DiP) dataflow for matrix multiplication accel-eration. The proposed architecture eliminates the synchronization FIFOs required by state-of-the-art weight stationary systolic arrays. Beyond the area, power, and energy savings achieved by eliminating these FIFOs, DiP architecture maximizes the com-putational resource utilization, achieving up to 50% throughput improvement over conventional weight stationary architectures. Analytical models are developed for both weight stationary and DiP architectures, including latency, throughput, time to full PEs utilization (TFPU), and FIFOs overhead. A comprehensive hard-ware design space exploration using 22nm commercial technology demonstrates DiP’s scalability advantages, achieving up to a 2.02× improvement in energy efficiency per area. Furthermore, DiP outperforms TPU-like architectures on transformer work-loads from widely-used models, delivering energy improvement up to 1.81× and latency improvement up to 1.49×. At a 64 × 64 size with 4096 PEs, DiP achieves a peak throughput of 8.192 TOPS with energy efficiency 9.548 TOPS/W.
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
DiP: A Scalable, Energy-Efficient Systolic Array for Matrix Multiplication Acceleration