摘要:
For neural network (NN) applications at the edge ofAI, computing-in-memory (CIM) demonstrates promising energyefficiency. However, when the network size grows while fulfillingthe accuracy requirements of increasingly complicated applicationscenarios, significant memory consumption becomes an issue.Model pruning is a typical compression approach for solvingthis problem, but it does not fully exploit the energy efficiencyadvantage of conventional CIMs, because of the dynamic distributionof sparse weights and the increased data movement energyconsumption of reading sparsity indexes from outside the chip.Therefore, we propose a vector-wise dynamic-sparsity controllingand computing in-memory structure (DS-CIM) that accomplishesboth sparsity control and computation of weights in SRAM, toimprove the energy efficiency of the vector-wise sparse pruningmodel. Implemented in a 65 nm CMOS process, the measurementresults show that the proposed DS-CIM macro can save upto 50.4% of computational energy consumption, while ensuringthe accuracy of vector-wise pruning models. The test chip canalso achieve 87.88% accuracy on the CIFAR-10 dataset at 4-bitprecision in inputs and weights, and it achieves 530.2TOPS/W(normalized to 1 bit) energy efficiency.
访问链接