A post-training quantization method for lightweight CNNs

doi:10.11996/JG.j.2095-302X.2025040709

Abstract

Abstract:

The current post-training quantization methods can achieve near lossless quantization at high quantization bit-width, however, for lightweight convolutional neural networks (CNN), the quantization error remains nonnegligible, especially in the case of low bit-width quantization (<4 bits). To address this, a post-training quantization method for lightweight CNN, called the block-level BatchNorm learning (BBL) method, was proposed. Unlike current post-training quantization methods that merge the batch normalization layers, this method retained the weights of the batch normalization layer on a per-block basis, and learned the quantized model parameters and batch normalization layer parameters based on the block-level feature map reconstruction loss. It also updated the mean and variance statistics of the batch normalization layer. This method mitigated the distribution shift problem caused by low-bit quantization of lightweight CNN in a simple and effective manner. Furthermore, to reduce overfitting of the post-training quantization method to the calibration dataset, the method constructed a block-level data augmentation approach by ensuring different model blocks did not learn from the same batch of calibration data. To verify the proposed method, extensive experiments on the ImageNet dataset, demonstrated that compared with current post-training quantization algorithms, the BBL method can improve the accuracy by up to 7.72 percentage points and can effectively reduce the quantization error caused by low-bit post-training quantization of lightweight CNN.

Key words: deep neural networks compression, post-training quantization, low-bit quantization, lightweight convolutional neural networks, lightweight intelligence

CLC Number:

TP183

YANG Jie, LI Cong, HU Qinghao, CHEN Xianda, WANG Yunpeng, LIU Xiaojing. A post-training quantization method for lightweight CNNs[J]. Journal of Graphics, 2025, 46(4): 709-718.

Figures/Tables 9

References 27

[1]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[2]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[3]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 3431-3440.
[4]	张明, 张芳慧, 宗佳平, 等. 基于轻量级网络的人脸检测及嵌入式实现[J]. 图学学报, 2022, 43(2): 239-246.
	ZHANG M, ZHANG F H, ZONG J P, et al. Face detection and embedded implementation of lightweight network[J]. Journal of Graphics, 2022, 43(2): 239-246 (in Chinese). DOI
[5]	皮骏, 刘宇恒, 李久昊. 基于YOLOv5s的轻量化森林火灾检测算法研究[J]. 图学学报, 2023, 44(1): 26-32. DOI
	PI J, LIU Y H, LI J H. Research on lightweight forest fire detection algorithm based on YOLOv5s[J]. Journal of Graphics, 2023, 44(1): 26-32 (in Chinese). DOI
[6]	COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect: training deep neural networks with binary weights during propagations[C]// The 29th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 3123-3131.
[7]	RASTEGARI M, ORDONEZ V, REDMON J, et al. XNOR-Net: ImageNet classification using binary convolutional neural networks[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 525-542.
[8]	CHOI J, WANG Z, VENKATARAMANI S, et al. PACT: parameterized clipping activation for quantized neural networks[EB/OL]. [2024-07-05]. https://arxiv.org/abs/1805.06085.
[9]	ZHANG D Q, YANG J L, YE D Q Z, et al. LQ-Nets: learned quantization for highly accurate and compact deep neural networks[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 373-390.
[10]	ESSER S K, MCKINSTRY J L, BABLANI D, et al. Learned step size quantization[EB/OL]. [2024-06-05]. https://arxiv.org/abs/1902.08153.
[11]	HU Q H, WANG P S, CHENG J. From hashing to CNNs: training binary weight networks via Hashing[EB/OL]. [2024-07-05]. https://ojs.aaai.org/index.php/AAAI/article/view/11660.
[12]	JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic- only inference[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2704-2713.
[13]	MIGACZ S. 8-bit Inference with TensorRT[EB/OL]. [2024-06-05]. https://www.cse.iitd.ernet.in/-rijurekha/course/tensorrt.pdf.
[14]	BANNER R, NAHSHAN Y, SOUDRY D. Post training 4-bit quantization of convolutional networks for rapid-deployment[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 7950-7958.
[15]	WU D, TANG Q, ZHAO Y L, et al. EasyQuant: post-training quantization via scale optimization[EB/OL]. [2024-06-05]. https://arxiv.org/abs/2006.16669.
[16]	WANG P S, CHEN Q, HE X Y, et al. Towards accurate post-training network quantization via bit-split and stitching[EB/OL]. [2024-06-05]. https://dl.acm.org/doi/10.5555/3524938.3525851.
[17]	NAGEL M, AMJAD R, VAN BAALEN M, et al. Up or down? adaptive rounding for post-training quantization[EB/OL]. [2024-06-05]. https://dl.acm.org/doi/10.5555/3524938.3525605.
[18]	LI Y H, GONG R H, TAN X, et al. BRECQ: pushing the limit of post-training quantization by block reconstruction[EB/OL]. [2024-07-05]. https://arxiv.org/abs/2102.05426.
[19]	WEI X Y, GONG R H, LI Y H, et al. QDrop: randomly dropping quantization for extremely low-bit post-training quantization[EB/OL]. [2024-07-05]. https://arxiv.org/abs/2203.05740.
[20]	MA Y X, LI H X, ZHENG X W, et al. Solving oscillation problem in post-training quantization through a theoretical perspective[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 7950-7959.
[21]	LEE J H, KIM J, KWON S J, et al. FlexRound: learnable rounding based on element-wise division for post-training quantization[EB/OL]. [2024-06-05]. https://dl.acm.org/doi/10.5555/3618408.3619189.
[22]	HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2024-07-05].
[23]	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 4510-4520.
[24]	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6848-6856.
[25]	MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 122-138.
[26]	RADOSAVOVIC I, KOSARAJU R P, GIRSHICK R, et al. Designing network design spaces[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10425-10433.
[27]	DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 248-255.

方法	位宽(W/A)	MBV2	SFV2	Reg600
全精度模型	32/32	72.49	69.36	73.71
AdaRound	4/4	61.52	-	68.20
BRECQ	4/4	67.51	-	70.44
QDrop	4/4	67.89	63.45	70.62
Ours+QDrop	4/4	68.13	64.80	70.74
MRECG	4/4	68.84	-	71.22
Ours+MRECG	4/4	68.97	-	71.09
AdaRound	2/4	36.31	-	57.00
BRECQ	2/4	52.30	-	61.77
QDrop	2/4	52.92	46.36	63.10
Ours+QDrop	2/4	55.16	52.11	63.72
MRECG	2/4	57.85	-	65.16
Ours+MRECG	2/4	58.84	-	65.62
AdaRound	3/3	34.55	-	58.29
BRECQ	3/3	52.03	-	62.61
QDrop	3/3	54.27	49.22	64.53
Ours+QDrop	3/3	56.33	52.88	64.58
MRECG	3/3	58.40	-	66.08
Ours+MRECG	3/3	60.51	-	66.22
BRECQ	2/2	7.03	-	28.89
QDrop	2/2	8.46	6.33	38.90
Ours+QDrop	2/2	16.18	10.50	41.29
MRECG	2/2	14.44	-	43.67
Ours+MRECG	2/2	22.03	-	45.27

方法	位宽(W/A)	MBV2	SFV2	Reg600
全精度模型	32/32	72.49	69.36	73.71
AdaRound	4/4	61.52	-	68.20
BRECQ	4/4	67.51	-	70.44
QDrop	4/4	67.89	63.45	70.62
Ours+QDrop	4/4	68.13	64.80	70.74
MRECG	4/4	68.84	-	71.22
Ours+MRECG	4/4	68.97	-	71.09
AdaRound	2/4	36.31	-	57.00
BRECQ	2/4	52.30	-	61.77
QDrop	2/4	52.92	46.36	63.10
Ours+QDrop	2/4	55.16	52.11	63.72
MRECG	2/4	57.85	-	65.16
Ours+MRECG	2/4	58.84	-	65.62
AdaRound	3/3	34.55	-	58.29
BRECQ	3/3	52.03	-	62.61
QDrop	3/3	54.27	49.22	64.53
Ours+QDrop	3/3	56.33	52.88	64.58
MRECG	3/3	58.40	-	66.08
Ours+MRECG	3/3	60.51	-	66.22
BRECQ	2/2	7.03	-	28.89
QDrop	2/2	8.46	6.33	38.90
Ours+QDrop	2/2	16.18	10.50	41.29
MRECG	2/2	14.44	-	43.67
Ours+MRECG	2/2	22.03	-	45.27

方法	位宽 (W/A)	MBV2/%		ResNet18/%
方法	位宽 (W/A)	Top1 Acc	Top5 Acc	Top1 Acc	Top5 Acc
FlexRound	4/4	66.66	87.21	69.26	88.81
Ours*	4/4	68.72	88.50	69.35	88.96
FlexRound	3/3	51.49	76.90	65.43	86.60
Ours*	3/3	56.60	80.48	66.05	87.08

方法	位宽 (W/A)	MBV2/%		ResNet18/%
方法	位宽 (W/A)	Top1 Acc	Top5 Acc	Top1 Acc	Top5 Acc
FlexRound	4/4	66.66	87.21	69.26	88.81
Ours*	4/4	68.72	88.50	69.35	88.96
FlexRound	3/3	51.49	76.90	65.43	86.60
Ours*	3/3	56.60	80.48	66.05	87.08

方法	位宽 (W/A)	块级BN 参数学习	块级数据增强	MBV2/%
Baseline	4/4			67.89
Baseline +BN参数学习	4/4	√		68.08
Ours	4/4	√	√	68.13
Baseline	2/4			52.92
Baseline +BN参数学习	2/4	√		54.86
Ours	2/4	√	√	55.16
Baseline	3/3			54.27
Baseline +BN参数学习	3/3	√		56.28
Ours	3/3	√	√	56.33
Baseline	2/2			8.46
Baseline +BN参数学习	2/2	√		15.79
Ours	2/2	√	√	16.18