TEA-SPS: A Tiny and Efficient Architecture for Softmax
TEA-SPS: A Tiny and Efficient Architecture for Softmax With Parallelism and Sparsity Adaptability
Abstract:
With the remarkable performance of Transformer-based networks in multiple fields and increasing demand for computational resources by softmax within them, it is inevitable for hardware accelerators to support softmax in Transformer. However, due to the lack of co-design of algorithm and hardware, there still remains space for optimizing the hardware architecture for softmax. Therefore, TEA-SPS is proposed as an algorithm and hardware co-designed architecture to improve softmax with two methods: Configurable Parallelism softmax with Sparse mask Strategy (CPSS) and Specific Piecewise Information Extractor (SPIE). CPSS has the advantage of supporting different through-put requirements through configurable data-level parallelism and performing sparse masking on the outputs to reduce the computational load and memory access of subsequent operations. To further explore the optimal solution set among the design space of parameters in CPSS, SPIE is proposed to achieve co-optimization of accuracy and hardware overhead. Based on them, the efficient hardware architecture of TEA-SPS is proposed. The implementation results show that at the frequency of 0.5 GHz under TSMC 90-nm technology, the peak efficiency of TEA-SPS processing 8-bit quantized data can reach up to 216.97 Gps/(mm2 · mW), with the area of 3290.21 µ m2 and the power consumption of 0.7004 mW. In addition, TEA-SPS provides support for input sequences of arbitrary length with negligible accuracy loss compared to the quantized baseline, while achieving an average sparse rate of 68.6% on the GLUE tasks.
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
TEA-SPS: A Tiny and Efficient Architecture for Softmax With Parallelism and Sparsity Adaptability