Factored Systolic Arrays Based on Radix-8 Multiplication for Machine Learning Acceleration
Factored Systolic Arrays Based on Radix-8 Multiplication for Machine Learning Acceleration
Abstract:
Systolic arrays (SAs) are re-gaining the attention as the heart to accelerate machine learning workloads. This article shows that a large design space exists at the logic level despite the simple structure of SAs and proposes two novel SAs based on factoring and radix-8 multipliers: The first factored SA (FSA) extracts out the booth encoding and the hard-multiple generation which is common across all processing elements (PEs), reducing the delay and the area of the whole SA. This factoring is done at the cost of an increased number of registers; however, the reduced pipeline register requirement in radix-8 offsets this effect. Our second proposed FSA compresses the interconnections further with two steps of hard-multiple addition. In the first part, carries are computed column-wise outside PEs, and in the second part, early-generated carries are used for hard-multiple final addition inside PEs. We called it hard-multiple carry portioned (HCP) FSA (HCP FSA). The first proposed factored 16-bit multiplier achieves up to 15%, 13%, and 23% better delay, area, and power, respectively, compared with the radix-4 multipliers even if the register overhead is included. And first proposed FSA architecture improves delay, area, and power up to 11%, 20%, and 31%, respectively, for different bitwidths when compared with the conventional radix-4 SA. In addition, the second HCP FSA design eliminates the additional registered overhead associated with the first proposed FSA by reducing interconnections and shows further reductions in the area up to 11.7% and power up to 16.7% with little increase in delay for various sizes of SAs.
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
Factored Systolic Arrays Based on Radix-8 Multiplication for Machine Learning Acceleration