HSA: An Efficient Sparse CNN Accelerator Based on Kernel-Aware Hybrid Pruning
HSA: An Efficient Sparse CNN Accelerator Based on Kernel-Aware Hybrid Pruning
Abstract:
The deployment of large-scale convolutional neural networks (CNNs) on hardware often results in reduced execu-tion efficiency and increased hardware overhead. Prior works have addressed this challenge by employing pruning techniques to reduce parameters and computational loads. Among these techniques, fine-grained N:M pruning methods achieve high sparsity rates but at the cost of generating substantial index-ing overhead and potential workload imbalance. In contrast, pattern-based pruning methods reduce indexing complexity and achieve workload balancing but suffer from low sparsity. To overcome these limitations, this article proposes a kernel-aware hybrid pruning (KAHP) method that simultaneously attains high sparsity, workload balancing, and reduced indexing overhead. Moreover, to accelerate the inference of the models pruned by the KAHP method, we design an efficient accelerator based on the systolic array architecture. Experimental results demonstrate that, compared to the N:M pruning method, the KAHP method achieves workload balancing and reduces the number of indices by up to 88.7% on ResNet-20. Compared to pattern-based pruning methods, the KAHP method achieves up to 31.8× sparsity. The proposed accelerator achieves up to 93.18 GOPS/W power efficiency and 0.721 GOPS/DSP computation efficiency.
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
HSA: An Efficient Sparse CNN Accelerator Based on Kernel-Aware Hybrid Pruning