A Model of Dual Fabry-Perot Etalon-Based External-Cavity Tunable Laser Us...
Internal motion within pulsating pure-quartic soliton molecules in a fibe...
Enhanced light emission of germanium light-emitting-diode on 150 mm germa...
The Fabrication of GaN Nanostructures Using Cost-Effective Methods for Ap...
Negative-to-Positive Tunnel Magnetoresistance in van der Waals Fe3GeTe2/C...
Quantum Light Source Based on Semiconductor Quantum Dots: A Review
A High-Reliability RF MEMS Metal-Contact Switch Based on Al-Sc Alloy
Development of a Mode-Locked Fiber Laser Utilizing a Niobium Diselenide S...
Development of Multiple Fano-Resonance-Based All-Dielectric Metastructure...
Traffic Vibration Signal Analysis of DAS Fiber Optic Cables with Differen...
官方微信
友情链接

A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction

2021-12-30

 

Author(s): Mo, HY (Mo, Huiyu); Zhu, WP (Zhu, Wenping); Hu, WJ (Hu, Wenjing); Li, Q (Li, Qiang); Li, A (Li, Ang); Yin, SY (Yin, Shouyi); Wei, SJ (Wei, Shaojun); Liu, LB (Liu, Leibo)

Source: IEEE JOURNAL OF SOLID-STATE CIRCUITS DOI: 10.1109/JSSC.2021.3113569 Early Access Date: OCT 2021

Abstract: In this article, a quantized network acceleration processor (QNAP) is proposed to efficiently accelerate CNN processing by eliminating most unessential operations based on algorithm-hardware co-optimizations. First, an effective-weight-based convolution (EWC) is proposed to distinguish a group of effective weights (EWs) to replace the other unique weights. Therefore, the input activations corresponding to the same EW can be accumulated first and then multiplied by the EW to reduce amounts of multiplication operations, which is efficiently supported by the dedicated process elements in QNAP. The experimental results show that energy efficiency is improved by 1.59x-3.20x compared with different UCNN implementations. Second, an error-compensation-based prediction (ECP) method adopts trained compensated values to replace partly unimportant partial sums to further reduce potentially redundant addition operations caused by the ReLU function. Compared with SnaPEA and Pred on AlexNet, 1.23x and 1.75x higher energy efficiencies (TOPS/W) are achieved by ECP, respectively, with marginal accuracy loss. Third, the residual pipeline mode is proposed to efficiently implement residual blocks with a 1.5x lower memory footprint, a 1.18x lower power consumption, and a 13.15% higher hardware utilization on average than existing works. Finally, the QNAP processor is fabricated in the TSMC 28-nm CMOS process with a core area of 1.9 mm(2). Benchmarked with AlexNet, VGGNet, GoogLeNet, and ResNet on ImageNet at 470 MHz and 0.9 V, the processor achieves 117.4 frames per second with 131.6-mW power consumption on average, which outperforms the state-of-the-art processors by 1.77x-24.20x in energy efficiency.

Accession Number: WOS:000732394400001

ISSN: 0018-9200

eISSN: 1558-173X

Full Text: https://ieeexplore.ieee.org/document/9570111



关于我们
下载视频观看
联系方式
通信地址

北京市海淀区清华东路甲35号(林大北路中段) 北京912信箱 (100083)

电话

010-82304210/010-82305052(传真)

E-mail

semi@semi.ac.cn

交通地图
版权所有 中国科学院半导体研究所

备案号:京ICP备05085259-1号 京公网安备110402500052 中国科学院半导体所声明