A Model of Dual Fabry-Perot Etalon-Based External-Cavity Tunable Laser Us...
Internal motion within pulsating pure-quartic soliton molecules in a fibe...
Enhanced light emission of germanium light-emitting-diode on 150 mm germa...
The Fabrication of GaN Nanostructures Using Cost-Effective Methods for Ap...
Negative-to-Positive Tunnel Magnetoresistance in van der Waals Fe3GeTe2/C...
Quantum Light Source Based on Semiconductor Quantum Dots: A Review
A High-Reliability RF MEMS Metal-Contact Switch Based on Al-Sc Alloy
Development of a Mode-Locked Fiber Laser Utilizing a Niobium Diselenide S...
Development of Multiple Fano-Resonance-Based All-Dielectric Metastructure...
Traffic Vibration Signal Analysis of DAS Fiber Optic Cables with Differen...
官方微信
友情链接

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

2022-01-27

 

Author(s): Yu, ZY (Yu, Zaiyang); Li, S (Li, Shuang); Sun, LJ (Sun, Linjun); Liu, L (Liu, Liang); Wang, HN (Wang Haining)

Source: CONNECTION SCIENCE DOI: 10.1080/09540091.2021.2024510 Early Access Date: JAN 2022

Abstract: With the development of deep learning, neural networks are widely used in various fields, and the improved model performance also introduces a considerable number of parameters and computations. Model quantisation is a technique that turns floating-point computing into low-specific-point computing, which can effectively reduce model computation strength, parameter size, and memory consumption but often bring a considerable loss of accuracy. This paper mainly addresses the problem where the distribution of parameters is too concentrated during quantisation aware training (QAT). In the QAT process, we use a piecewise function to statistics the parameter distributions and simulate the effect of quantisation noise in each round of training, based on the statistical results. Experimental results show that by quantising the Transformer network, we lose less precision and significantly reduce the storage cost of the model; compared with the full precision LSTM network, our model has higher accuracy under the condition of a similar storage cost. Meanwhile, compared with other quantisation methods on language modelling task, our approach is more accurate. We validated the effectiveness of our policy on the WikiText-103 and PENN Treebank datasets. The experiments show that our method extremely compresses the storage cost and maintains high model performance.

Accession Number: WOS:000743426400001

ISSN: 0954-0091

eISSN: 1360-0494

Full Text: https://www.tandfonline.com/doi/full/10.1080/09540091.2021.2024510



关于我们
下载视频观看
联系方式
通信地址

北京市海淀区清华东路甲35号(林大北路中段) 北京912信箱 (100083)

电话

010-82304210/010-82305052(传真)

E-mail

semi@semi.ac.cn

交通地图
版权所有 中国科学院半导体研究所

备案号:京ICP备05085259-1号 京公网安备110402500052 中国科学院半导体所声明