Lightweight human pose estimation algorithm by integrating CA and BiFPN

doi:10.11996/JG.j.2095-302X.2023050868

Abstract

Abstract:

To address the problems of existing heatmap-based human pose estimation network models, such as high complexity, intensive computing power requirements, and challenges in deployment on embedded platforms and UAV mobile platforms, a lightweight human pose estimation network was proposed based on YOLOv5s6-Pose-ti-lite without using heatmaps. By replacing the backbone network with GhostNet, it enabled the output of more effective feature information with reduced computing resources. This resulted in faster network detection and alleviated issues related to network redundancy. Within the backbone network, a lightweight coordinate attention (CA) attention module was integrated to gather the position information of human keypoints in the picture to the channel, thus enhancing the ability of feature extraction. BiFPN (weighted bidirectional feature pyramid network) module was introduced to enhance the feature fusion ability of the model and balance the feature information across different scales. Finally, the CIoU loss function was replaced with wise-IoU (WIoU) to enhance the performance of the model for human keypoint regression. The results demonstrated that on the COCO2017 human keypoint dataset, the parameters of the optimized network model were reduced by 26.2%, the calculation was decreased by 30.0%, the average precision was increased by 1.7 percentage points, and the average recall rate was boosted by 2.7 percentage points. These improvements could enable real-time performance, verifying the feasibility and effectiveness of the proposed model.

Key words: human pose estimation, lightweight, coordinate attention, weighted bidirectional feature pyramid network, loss function

CLC Number:

TP391

PI Jun, NIU Hou-xing, GAO Zhi-yun. Lightweight human pose estimation algorithm by integrating CA and BiFPN[J]. Journal of Graphics, 2023, 44(5): 868-878.

Figures/Tables 12

References 30

[1]	冯杰, 郑建立. 基于卷积与Transformer的人体姿态估计方法对比研究[J]. 软件工程, 2023, 26(3): 18-24.
	FENG J, ZHENG J L. A comparative study of human pose estimation based on convolution and transformer[J]. Software Engineer, 2023, 26(3): 18-24. (in Chinese)
[2]	罗梦诗, 徐杨, 叶星鑫. 基于轻量型高分辨率网络的被遮挡人体姿态估计[J]. 武汉大学学报: 理学版, 2021, 67(5): 403-410.
	LUO M S, XU Y, YE X X. Human pose estimation of occlusion based on light-weight high-resolution network[J]. Journal of Wuhan University: Natural Science Edition, 2021, 67(5): 403-410. (in Chinese)
[3]	张越, 黄友锐, 刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132. DOI
	ZHANG Y, HUANG Y R, LIU P K. Research on multi-resolution human pose estimation with attention mechanism[J]. Computer Engineering and Applications, 2021, 57(8): 126-132. (in Chinese) DOI
[4]	李崤河, 刘进锋. 二维人体姿态估计研究综述[J]. 现代计算机, 2019(22): 33-37.
	LI X H, LIU J F. A survey of two dimension human pose estimation[J]. Modern Computer, 2019(22): 33-37. (in Chinese)
[5]	刘勇, 李杰, 张建林, 等. 基于深度学习的二维人体姿态估计研究进展[J]. 计算机工程, 2021, 47(3): 1-16. DOI
	LIU Y, LI J, ZHANG J L, et al. Research progress of two-dimensional human pose estimation based on deep learning[J]. Computer Engineering, 2021, 47(3): 1-16. (in Chinese) DOI
[6]	TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1653-1660.
[7]	WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4724-4732.
[8]	NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]// Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 483-499.
[9]	SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5686-5696.
[10]	曾文献, 马月, 李伟光. 轻量化二维人体骨骼关键点检测算法综述[J]. 科学技术与工程, 2022, 22(16): 6377-6392.
	ZENG W X, MA Y, LI W G. A survey of lightweight two-dimensional human skeleton key point detection algorithms[J]. Science Technology and Engineering, 2022, 22(16): 6377-6392. (in Chinese)
[11]	周燕, 刘紫琴, 曾凡智, 等. 深度学习的二维人体姿态估计综述[J]. 计算机科学与探索, 2021, 15(4): 641-657. DOI
	ZHOU Y, LIU Z Q, ZENG F Z, et al. Survey on two-dimensional human pose estimation of deep learning[J]. Journal of Frontiers of Computer Science & Technology, 2021, 15(4): 641-657. (in Chinese)
[12]	FANG H S, XIE S Q, TAI Y W, et al. RMPE: regional multi-person pose estimation[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2353-2362.
[13]	CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7103-7112.
[14]	CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1302-1310.
[15]	梁桥康, 吴樾. 基于HRNet的轻量化人体姿态估计网络[J]. 湖南大学学报: 自然科学版, 2023, 50(2): 112-121.
	LIANG Q K, WU Y. Lightweight human pose estimation network based on HRNet[J]. Journal of Hunan University: Natural Sciences, 2023, 50(2): 112-121. (in Chinese)
[16]	MAJI D, NAGORI S, MATHEW M, et al. YOLO-pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2022: 2636-2645.
[17]	廖永为, 张桂鹏, 杨振国, 等. 全卷积目标检测的改进算法[J]. 计算机工程与应用, 2022, 58(17): 158-164. DOI
	LIAO Y W, ZHANG G P, YANG Z G, et al. Improved algorithm for fully convolutional object detection[J]. Computer Engineering and Applications, 2022, 58(17): 158-164. (in Chinese) DOI
[18]	杨玉敏, 廖育荣, 林存宝, 等. 基于轻量化神经网络的空中目标检测算法[J]. 计算机仿真, 2022, 39(7): 70-73, 420.
	YANG Y M, LIAO Y R, LIN C B, et al. Aerial target detection algorithm based on lightweight neural network[J]. Computer Simulation, 2022, 39(7): 70-73, 420. (in Chinese)
[19]	皮骏, 刘宇恒, 李久昊. 基于YOLOv5s的轻量化森林火灾检测算法研究[J]. 图学学报, 2023, 44(1): 26-32. DOI
	PI J, LIU Y H, LI J H. Research on lightweight forest fire detection algorithm based on YOLOv5s[J]. Journal of Graphics, 2023, 44(1): 26-32. (in Chinese) DOI
[20]	HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1577-1586.
[21]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13708-13717.
[22]	WANG Z J, MA L Z, LIN X, et al. MSGC: a new bottom-up model for salient object detection[C]// 2018 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2018: 1-6.
[23]	LIN X, WANG Z J, MA L Z, et al. Salient object detection based on multiscale segmentation and fuzzy broad learning[J]. The Computer Journal, 2022, 65(4): 1006-1019. DOI URL
[24]	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10778-10787.
[25]	ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157. DOI URL
[26]	TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2023-05-27]. https://arxiv.org/abs/2301.10051.
[27]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[M]// Computer Vision - ECCV 2014. Cham: Springer International Publishing, 2014: 740-755.
[28]	OSOKIN D. Real-time 2D multi-person pose estimation on CPU: lightweight OpenPose[C]// The 8th International Conference on Pattern Recognition Applications and Methods. Setúbal: SCITEPRESS - Science and Technology Publications, 2019: 744-748.
[29]	NEFF C, SHETH A, FURGURSON S, et al. EfficientHRNet: efficient scaling for lightweight high-resolution multi-person pose estimation[EB/OL]. (2023-01-24) [2023-05-27]. https://arxiv.org/abs/2007.08090.
[30]	王名赫, 徐望明, 蒋昊坤. 一种改进的轻量级人体姿态估计算法[J]. 液晶与显示, 2023, 38(7): 955-963.
	WANG M H, XU W M, JIANG H K. An improved lightweight human attitude estimation algorithm[J]. Chinese Journal of Liquid Crystals and Displays, 2023, 38(7): 955-963. (in Chinese) DOI URL

Method	Backbone	Input size	Params (M)	GMACS	AP (%)	AP50 (%)	AP75 (%)	AP^L (%)	AR (%)
Lightweight OpenPose	-	368×368	4.1	18.0	42.8	-	-	-	-
EfficientHRNet-H₂	EfficientNetB2	448×448	10.3	15.4	52.9	80.5	-	-	-
EfficientHRNet-H₃	EfficientNetB3	416×416	6.9	8.4	44.8	76.7	-	-	-
EfficientHRNet-H₄	EfficientNetB4	384×384	3.7	4.2	35.7	69.6	-	-	-
baseline	Darknet_csp-d53-s	640×640	12.6	8.6	54.0	81.1	58.7	65.5	59.7
Ours-EIoU	Darknet_csp-d53-s	640×640	9.3	6.1	55.0	82.2	58.4	70.0	61.9
Ours-WIoU	Darknet_csp-d53-s	640×640	9.3	6.1	55.8	82.8	59.9	69.4	62.4

Method	Backbone	Input size	Params (M)	GMACS	AP (%)	AP50 (%)	AP75 (%)	AP^L (%)	AR (%)
Lightweight OpenPose	-	368×368	4.1	18.0	42.8	-	-	-	-
EfficientHRNet-H₂	EfficientNetB2	448×448	10.3	15.4	52.9	80.5	-	-	-
EfficientHRNet-H₃	EfficientNetB3	416×416	6.9	8.4	44.8	76.7	-	-	-
EfficientHRNet-H₄	EfficientNetB4	384×384	3.7	4.2	35.7	69.6	-	-	-
baseline	Darknet_csp-d53-s	640×640	12.6	8.6	54.0	81.1	58.7	65.5	59.7
Ours-EIoU	Darknet_csp-d53-s	640×640	9.3	6.1	55.0	82.2	58.4	70.0	61.9
Ours-WIoU	Darknet_csp-d53-s	640×640	9.3	6.1	55.8	82.8	59.9	69.4	62.4

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

Method	Params (M)	GMACS	AP (%)	AP50 (%)	AR (%)
①	12.6	8.7	54.0	81.1	59.7
②	9.0	5.8	52.7	80.0	58.1
③	9.1	5.8	54.3	81.0	59.5
④	9.3	6.1	54.1	81.5	61.0
⑤	9.3	6.1	55.0	82.2	61.9
⑥	9.3	6.1	55.8	82.8	62.4