融合CA-BiFPN的轻量化人体姿态估计算法

doi:10.11996/JG.j.2095-302X.2023050868

图学学报 ›› 2023, Vol. 44 ›› Issue (5): 868-878.DOI: 10.11996/JG.j.2095-302X.2023050868

• 图像处理与计算机视觉 • 上一篇下一篇

融合CA-BiFPN的轻量化人体姿态估计算法

皮骏(), 牛厚兴, 高志云()

中国民航大学交通科学与工程学院，天津 300300

收稿日期:2023-05-31 接受日期:2023-08-03 出版日期:2023-10-31 发布日期:2023-10-31
通讯作者: 高志云(1993-)，女，讲师，博士。主要研究方向为图像处理与模式识别。E-mail：zhiyungao@163.com
作者简介:皮骏(1973-)，男，副教授，博士。主要研究方向为目标检测、图像处理与模式识别。E-mail：jpi@cauc.edu.cn
基金资助:
中国交通教育研究会2022-2024年度教育科学研究课题(JT2022YB325)

Lightweight human pose estimation algorithm by integrating CA and BiFPN

PI Jun(), NIU Hou-xing, GAO Zhi-yun()

School of Transportation Science and Engineering, Civil Aviation University of China, Tianjin 300300, China

Received:2023-05-31 Accepted:2023-08-03 Online:2023-10-31 Published:2023-10-31
Contact: GAO Zhi-yun (1993-), lecturer, PH.D. Her main research interests cover image processing and pattern recognition. E-mail：zhiyungao@163.com
About author:PI Jun (1973-), associate professor, Ph.D. His main research interests cover object detection, image processing and pattern recognition. E-mail：jpi@cauc.edu.cn
Supported by:
China Association of Transport Education Research 2022-2024 Education Science Research Project(JT2022YB325)

摘要/Abstract

摘要：

针对现有的基于热力图的人体姿态估计网络模型复杂度高、算力需求大、不易部署至嵌入式平台和无人机移动平台等问题，提出了一种基于YOLOv5s6-Pose-ti-lite不使用热力图的轻量化人体姿态估计网络模型。通过将主干网络替换为GhostNet网络，旨在以更少的计算资源输出更有效的特征信息，提升网络检测速度，缓解网络冗余的问题；在主干网络中结合轻量化的坐标注意力CA模块，将图片的人体关键点位置信息聚集到通道上，增强特征提取能力；引入加权双向特征金字塔网络，提升模型的特征融合能力，平衡不同尺度的特征信息；最后将CIoU损失函数替换为Wise-IoU (WIoU)，进一步提升模型对人体关键点回归的性能。结果表明，在COCO2017人体关键点数据集上，优化后的网络模型参数量降低26.2%，计算量降低30.0%，平均精确度提升1.7个百分点、平均召回率提升2.7个百分点，能够满足实时性的效果，验证了所提模型的可行性和有效性。

关键词: 人体姿态估计, 轻量化, 坐标注意力, 加权双向特征金字塔网络, 损失函数

Abstract:

To address the problems of existing heatmap-based human pose estimation network models, such as high complexity, intensive computing power requirements, and challenges in deployment on embedded platforms and UAV mobile platforms, a lightweight human pose estimation network was proposed based on YOLOv5s6-Pose-ti-lite without using heatmaps. By replacing the backbone network with GhostNet, it enabled the output of more effective feature information with reduced computing resources. This resulted in faster network detection and alleviated issues related to network redundancy. Within the backbone network, a lightweight coordinate attention (CA) attention module was integrated to gather the position information of human keypoints in the picture to the channel, thus enhancing the ability of feature extraction. BiFPN (weighted bidirectional feature pyramid network) module was introduced to enhance the feature fusion ability of the model and balance the feature information across different scales. Finally, the CIoU loss function was replaced with wise-IoU (WIoU) to enhance the performance of the model for human keypoint regression. The results demonstrated that on the COCO2017 human keypoint dataset, the parameters of the optimized network model were reduced by 26.2%, the calculation was decreased by 30.0%, the average precision was increased by 1.7 percentage points, and the average recall rate was boosted by 2.7 percentage points. These improvements could enable real-time performance, verifying the feasibility and effectiveness of the proposed model.

Key words: human pose estimation, lightweight, coordinate attention, weighted bidirectional feature pyramid network, loss function

中图分类号:

TP391

皮骏, 牛厚兴, 高志云. 融合CA-BiFPN的轻量化人体姿态估计算法[J]. 图学学报, 2023, 44(5): 868-878.

PI Jun, NIU Hou-xing, GAO Zhi-yun. Lightweight human pose estimation algorithm by integrating CA and BiFPN[J]. Journal of Graphics, 2023, 44(5): 868-878.

图/表 12

图1 YOLOv5s6-Pose-ti-lite网络结构

Fig. 1 YOLOv5s6-Pose-ti-lite network structure

图2 Ghost模块

Fig. 2 Ghost modul

图3 Ghost Bottleneck模块

Fig. 3 Ghost Bottleneck modul

图4 坐标注意力模块

Fig. 4 Coordinate attention module

图5 不同特征金字塔结构对比

Fig. 5 Contrast of pyramid structure with different features ((a) FPN; (b) PANet; (c) BiFPN)

图6 WIoU示意图

Fig. 6 Chematic diagram of the WIoU

图7 改进后的轻量化YOLOv5s6-Pose-ti-lite网络结构

Fig. 7 An improved lightweight YOLOv5s6-Pose-ti-lite network structure

表1 COCO2017人体关键点数据集下各方法对比

Table 1 Comparison of various methods on the COCO2017 dataset

Method	Backbone	Input size	Params (M)	GMACS	AP (%)	AP50 (%)	AP75 (%)	AP^L (%)	AR (%)
Lightweight OpenPose	-	368×368	4.1	18.0	42.8	-	-	-	-
EfficientHRNet-H₂	EfficientNetB2	448×448	10.3	15.4	52.9	80.5	-	-	-
EfficientHRNet-H₃	EfficientNetB3	416×416	6.9	8.4	44.8	76.7	-	-	-
EfficientHRNet-H₄	EfficientNetB4	384×384	3.7	4.2	35.7	69.6	-	-	-
baseline	Darknet_csp-d53-s	640×640	12.6	8.6	54.0	81.1	58.7	65.5	59.7
Ours-EIoU	Darknet_csp-d53-s	640×640	9.3	6.1	55.0	82.2	58.4	70.0	61.9
Ours-WIoU	Darknet_csp-d53-s	640×640	9.3	6.1	55.8	82.8	59.9	69.4	62.4

图8 轻量化人体姿态估计方法可视化对比

Fig. 8 Visual comparison of lightweight human pose estimation methods ((a) Lightweight OpenPose; (b) EfficientHRNet-H2; (c) EfficientHRNet-H3; (d) EfficientHRNet-H4; (e) Ours (WIoU))

图9 COCO2017人体关键点数据集检测结果((a)密集人群；(b)障碍物遮挡；(c)暗光环境；(d)俯视角)

Fig. 9 Pose estimation results on COCO 2017 human keypoint dataset ((a) Dense crowd; (b) Obstructed by obstacles; (c) Dark light environment; (d) Overlooking angle)

表2 消融实验设计

Table 2 Ablation experimental design

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

表3 消融实验结果

Table 3 Ablation experiment results

Method	Params (M)	GMACS	AP (%)	AP50 (%)	AR (%)
①	12.6	8.7	54.0	81.1	59.7
②	9.0	5.8	52.7	80.0	58.1
③	9.1	5.8	54.3	81.0	59.5
④	9.3	6.1	54.1	81.5	61.0
⑤	9.3	6.1	55.0	82.2	61.9
⑥	9.3	6.1	55.8	82.8	62.4

参考文献 30

[1]	冯杰, 郑建立. 基于卷积与Transformer的人体姿态估计方法对比研究[J]. 软件工程, 2023, 26(3): 18-24.
	FENG J, ZHENG J L. A comparative study of human pose estimation based on convolution and transformer[J]. Software Engineer, 2023, 26(3): 18-24. (in Chinese)
[2]	罗梦诗, 徐杨, 叶星鑫. 基于轻量型高分辨率网络的被遮挡人体姿态估计[J]. 武汉大学学报: 理学版, 2021, 67(5): 403-410.
	LUO M S, XU Y, YE X X. Human pose estimation of occlusion based on light-weight high-resolution network[J]. Journal of Wuhan University: Natural Science Edition, 2021, 67(5): 403-410. (in Chinese)
[3]	张越, 黄友锐, 刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132. DOI
	ZHANG Y, HUANG Y R, LIU P K. Research on multi-resolution human pose estimation with attention mechanism[J]. Computer Engineering and Applications, 2021, 57(8): 126-132. (in Chinese) DOI
[4]	李崤河, 刘进锋. 二维人体姿态估计研究综述[J]. 现代计算机, 2019(22): 33-37.
	LI X H, LIU J F. A survey of two dimension human pose estimation[J]. Modern Computer, 2019(22): 33-37. (in Chinese)
[5]	刘勇, 李杰, 张建林, 等. 基于深度学习的二维人体姿态估计研究进展[J]. 计算机工程, 2021, 47(3): 1-16. DOI
	LIU Y, LI J, ZHANG J L, et al. Research progress of two-dimensional human pose estimation based on deep learning[J]. Computer Engineering, 2021, 47(3): 1-16. (in Chinese) DOI
[6]	TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1653-1660.
[7]	WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4724-4732.
[8]	NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]// Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 483-499.
[9]	SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5686-5696.
[10]	曾文献, 马月, 李伟光. 轻量化二维人体骨骼关键点检测算法综述[J]. 科学技术与工程, 2022, 22(16): 6377-6392.
	ZENG W X, MA Y, LI W G. A survey of lightweight two-dimensional human skeleton key point detection algorithms[J]. Science Technology and Engineering, 2022, 22(16): 6377-6392. (in Chinese)
[11]	周燕, 刘紫琴, 曾凡智, 等. 深度学习的二维人体姿态估计综述[J]. 计算机科学与探索, 2021, 15(4): 641-657. DOI
	ZHOU Y, LIU Z Q, ZENG F Z, et al. Survey on two-dimensional human pose estimation of deep learning[J]. Journal of Frontiers of Computer Science & Technology, 2021, 15(4): 641-657. (in Chinese)
[12]	FANG H S, XIE S Q, TAI Y W, et al. RMPE: regional multi-person pose estimation[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2353-2362.
[13]	CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7103-7112.
[14]	CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1302-1310.
[15]	梁桥康, 吴樾. 基于HRNet的轻量化人体姿态估计网络[J]. 湖南大学学报: 自然科学版, 2023, 50(2): 112-121.
	LIANG Q K, WU Y. Lightweight human pose estimation network based on HRNet[J]. Journal of Hunan University: Natural Sciences, 2023, 50(2): 112-121. (in Chinese)
[16]	MAJI D, NAGORI S, MATHEW M, et al. YOLO-pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2022: 2636-2645.
[17]	廖永为, 张桂鹏, 杨振国, 等. 全卷积目标检测的改进算法[J]. 计算机工程与应用, 2022, 58(17): 158-164. DOI
	LIAO Y W, ZHANG G P, YANG Z G, et al. Improved algorithm for fully convolutional object detection[J]. Computer Engineering and Applications, 2022, 58(17): 158-164. (in Chinese) DOI
[18]	杨玉敏, 廖育荣, 林存宝, 等. 基于轻量化神经网络的空中目标检测算法[J]. 计算机仿真, 2022, 39(7): 70-73, 420.
	YANG Y M, LIAO Y R, LIN C B, et al. Aerial target detection algorithm based on lightweight neural network[J]. Computer Simulation, 2022, 39(7): 70-73, 420. (in Chinese)
[19]	皮骏, 刘宇恒, 李久昊. 基于YOLOv5s的轻量化森林火灾检测算法研究[J]. 图学学报, 2023, 44(1): 26-32. DOI
	PI J, LIU Y H, LI J H. Research on lightweight forest fire detection algorithm based on YOLOv5s[J]. Journal of Graphics, 2023, 44(1): 26-32. (in Chinese) DOI
[20]	HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1577-1586.
[21]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13708-13717.
[22]	WANG Z J, MA L Z, LIN X, et al. MSGC: a new bottom-up model for salient object detection[C]// 2018 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2018: 1-6.
[23]	LIN X, WANG Z J, MA L Z, et al. Salient object detection based on multiscale segmentation and fuzzy broad learning[J]. The Computer Journal, 2022, 65(4): 1006-1019. DOI URL
[24]	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10778-10787.
[25]	ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157. DOI URL
[26]	TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2023-05-27]. https://arxiv.org/abs/2301.10051.
[27]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[M]// Computer Vision - ECCV 2014. Cham: Springer International Publishing, 2014: 740-755.
[28]	OSOKIN D. Real-time 2D multi-person pose estimation on CPU: lightweight OpenPose[C]// The 8th International Conference on Pattern Recognition Applications and Methods. Setúbal: SCITEPRESS - Science and Technology Publications, 2019: 744-748.
[29]	NEFF C, SHETH A, FURGURSON S, et al. EfficientHRNet: efficient scaling for lightweight high-resolution multi-person pose estimation[EB/OL]. (2023-01-24) [2023-05-27]. https://arxiv.org/abs/2007.08090.
[30]	王名赫, 徐望明, 蒋昊坤. 一种改进的轻量级人体姿态估计算法[J]. 液晶与显示, 2023, 38(7): 955-963.
	WANG M H, XU W M, JIANG H K. An improved lightweight human attitude estimation algorithm[J]. Chinese Journal of Liquid Crystals and Displays, 2023, 38(7): 955-963. (in Chinese) DOI URL

融合CA-BiFPN的轻量化人体姿态估计算法

Lightweight human pose estimation algorithm by integrating CA and BiFPN

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 30

相关文章 15

编辑推荐

Metrics

本文评价

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

[1]	郝帅, 赵新生, 马旭, 张旭, 何田, 侯李祥. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676.
[2]	李刚, 张运涛, 汪文凯, 张东阳. 采用DETR与先验知识融合的输电线路螺栓缺陷检测方法[J]. 图学学报, 2023, 44(3): 438-447.
[3]	孙龙飞, 刘慧, 杨奉常, 李攀. 面向医学图像层间插值的循环生成网络研究[J]. 图学学报, 2023, 44(3): 502-512.
[4]	熊举举, 徐杨, 范润泽, 孙少聪. 基于轻量化视觉Transformer的花卉识别[J]. 图学学报, 2023, 44(2): 271-279.
[5]	皮骏, 刘宇恒, 李久昊. 基于YOLOv5s的轻量化森林火灾检测算法研究[J]. 图学学报, 2023, 44(1): 26-32.
[6]	黄志勇, 韩莎莎, 陈致君, 姚玉, 熊彪, 马凯. 一种用于视频对象分割的仿U形网络[J]. 图学学报, 2023, 44(1): 104-111.
[7]	郭文, 李冬, 袁飞 . 多尺度注意力融合和抗噪声的轻量点云人脸识别模型[J]. 图学学报, 2022, 43(6): 1124-1133.
[8]	赵璐璐 , 王学营 , 张翼 , 张美月 . 基于 YOLOv5s 融合 SENet 的车辆目标检测技术研究[J]. 图学学报, 2022, 43(5): 776-782.
[9]	武历展, 王夏黎, 张倩, 王炜昊, 李超. 基于优化 YOLOv5s 的跌倒人物目标检测方法[J]. 图学学报, 2022, 43(5): 791-802.
[10]	胡海涛 , 杜昊晨 , 王素琴 , 石敏 , 朱登明 , . 改进 YOLOX 的药品泡罩铝箔表面缺陷检测方法[J]. 图学学报, 2022, 43(5): 803-814.
[11]	蔡兴泉, 霍宇晴, 李发建, 孙海燕. 面向太极拳学习的人体姿态估计及相似度计算[J]. 图学学报, 2022, 43(4): 695-706.
[12]	张运波, 易鹏飞, 周东生, 张强, 魏小鹏. 深度可分离卷积和标准卷积相结合的高效行人检测器[J]. 图学学报, 2022, 43(2): 230-238.
[13]	李妮妮, 王夏黎, 付阳阳, 郑凤仙, 何丹丹, 袁绍欣. 一种优化 YOLO 模型的交通警察目标检测方法[J]. 图学学报, 2022, 43(2): 296-305.
[14]	刘玉杰, 张敏杰, 李宗民, 李华. 基于全局姿态感知的轻量级人体姿态估计[J]. 图学学报, 2022, 43(2): 333-341.
[15]	张芳兰, 刘龙吉, 姚宛彤. 面向关键用户需求的踝足矫形器定制化设计方法[J]. 图学学报, 2021, 42(5): 841-848.

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√

Method	GhostNet	CA	BiFPN	EIoU	WIoU
①	-	-	-	-	-
②	√	-	-	-	-
③	√	√	-	-	-
④	√	√	√	-	-
⑤	√	√	√	√	-
⑥	√	√	√	-	√