Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computational bottleneck of having multiple high-order matrix-vector multiplications (MVMs). This article presents a generalized approach to accelerate a circulant MVM (C-MVM), and hence, it is applicable to many neural networks. The proposed scheme presents a novel low-complexity distributed arithmetic (DA) architecture for optimizing C-MVMs. Unlike conventional offset binary coding-based DA (OBC-DA), it is based on separate generation and selection of partial products. Only one partial product generator (PPG) with several partial product selectors (PPSs) is required. The complexity of PPSs is reduced by sharing the minterms across Boolean expressions. Fine-grained pipelining is employed to achieve approximately one adder delay. From the implementation results, the proposed design with 512 × 512 LSTM layer occupies 74.54% less core area, consumes 68.66% less core power, offers 2.61 times more throughput, and 3.89 times more hardware efficiency over the best existing design.
Software Implementation:
Modelsim
Xilinx
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
Low-Complexity Distributed-Arithmetic-Based
Pipelined Architecture for an LSTM Network