A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC
A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC
Abstract:
High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating- and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support 9× BFloat16 (BF16), 4 × half-precision (FP16), 4× TensorFloat-32 (TF32) and 1× single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72 × INT2, 36 × INT4 and 9 × INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10 × and 4 × improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC