Journal of Graphics ›› 2024, Vol. 45 ›› Issue (3): 516-527.DOI: 10.11996/JG.j.2095-302X.2024030516
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
HUANG Youwen(), LIN Zhiqin, ZHANG Jin, CHEN Junkuan
Received:
2023-11-17
Accepted:
2024-02-24
Online:
2024-06-30
Published:
2024-06-11
About author:
HUANG Youwen (1982-), associate professor, Ph.D. His main research interests cover computer vision, natural language processing, and machine learning. E-mail:ywhuang@jxust.edu.cn
Supported by:
CLC Number:
HUANG Youwen, LIN Zhiqin, ZHANG Jin, CHEN Junkuan. Lightweight human pose estimation algorithm combined with coordinate Transformer[J]. Journal of Graphics, 2024, 45(3): 516-527.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024030516
方法 | 输入规模 | 参数量/MB | 计算量/G | AP/% | AP50/% | AP75/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|---|
Lightweight OpenPose | 368×368 | 4.1 | 18.0 | 42.8 | - | - | - | - |
EfficientHRNet-H1 | 480×480 | 16.0 | 28.4 | 59.2 | 82.6 | 64.0 | 67.2 | 64.7 |
EfficientHRNet-H2 | 448×448 | 10.3 | 15.4 | 52.9 | 80.5 | 59.1 | 61.9 | 59.3 |
EfficientHRNet-H3 | 416×416 | 6.9 | 8.4 | 44.8 | 76.7 | 48.3 | 52.3 | 52.4 |
EfficientHRNet-H4 | 384×384 | 3.7 | 4.2 | 35.7 | 69.6 | 33.7 | 44.3 | 42.9 |
YOLOv5s6-Pose-ti-lite | 640×640 | 12.6 | 8.6 | 54.9 | 82.2 | 59.9 | 66.6 | 61.8 |
Baseline | 640×640 | 15.1 | 10.2 | 56.7 | 83.7 | 61.3 | 71.1 | 63.7 |
Ours | 640×640 | 13.4 | 9.6 | 59.2 | 85.3 | 63.3 | 73.2 | 66.3 |
Table 1 Comparison of Lightweight Bottom Up Methods on the COCO2017 dataset
方法 | 输入规模 | 参数量/MB | 计算量/G | AP/% | AP50/% | AP75/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|---|
Lightweight OpenPose | 368×368 | 4.1 | 18.0 | 42.8 | - | - | - | - |
EfficientHRNet-H1 | 480×480 | 16.0 | 28.4 | 59.2 | 82.6 | 64.0 | 67.2 | 64.7 |
EfficientHRNet-H2 | 448×448 | 10.3 | 15.4 | 52.9 | 80.5 | 59.1 | 61.9 | 59.3 |
EfficientHRNet-H3 | 416×416 | 6.9 | 8.4 | 44.8 | 76.7 | 48.3 | 52.3 | 52.4 |
EfficientHRNet-H4 | 384×384 | 3.7 | 4.2 | 35.7 | 69.6 | 33.7 | 44.3 | 42.9 |
YOLOv5s6-Pose-ti-lite | 640×640 | 12.6 | 8.6 | 54.9 | 82.2 | 59.9 | 66.6 | 61.8 |
Baseline | 640×640 | 15.1 | 10.2 | 56.7 | 83.7 | 61.3 | 71.1 | 63.7 |
Ours | 640×640 | 13.4 | 9.6 | 59.2 | 85.3 | 63.3 | 73.2 | 66.3 |
Fig. 9 Comparison of visual results of lightweight bottom-up multi-person pose estimation methods ((a) EfficientHRNet-H3; (b) EfficientHRNet-H2; (c) EfficientHRNet-H1; (d) YOLOv5s6-Pose; (e) Ours)
方法 | 参数量/MB | 计算量/G | AP/% |
---|---|---|---|
YOLOv5s6-Pose | 15.1 | 10.2 | 56.7 |
YOLOv5s6-Pose+SCConv | 12.3 | 8.3 | 53.6 |
Backbone+SCConv | 14.0 | 9.4 | 55.3 |
Neck+SCConv | 13.3 | 9.0 | 56.4 |
Table 2 Experimental comparison of SCConv module in different stages on the COCO2017 dataset
方法 | 参数量/MB | 计算量/G | AP/% |
---|---|---|---|
YOLOv5s6-Pose | 15.1 | 10.2 | 56.7 |
YOLOv5s6-Pose+SCConv | 12.3 | 8.3 | 53.6 |
Backbone+SCConv | 14.0 | 9.4 | 55.3 |
Neck+SCConv | 13.3 | 9.0 | 56.4 |
方法 | 参数量/MB | 计算量/G | AP/% |
---|---|---|---|
YOLOv5s6-Pose | 15.1 | 10.2 | 56.7 |
YOLOv5s6-Pose+CA | 15.1 | 10.2 | 56.9 |
YOLOv5s6-Pose+Swin Transformer | 15.6 | 12.5 | 57.2 |
YOLOv5s6-Pose+CT | 15.2 | 10.8 | 58.6 |
Table 3 Experimental comparison of using different attention in the model on the COCO2017 dataset
方法 | 参数量/MB | 计算量/G | AP/% |
---|---|---|---|
YOLOv5s6-Pose | 15.1 | 10.2 | 56.7 |
YOLOv5s6-Pose+CA | 15.1 | 10.2 | 56.9 |
YOLOv5s6-Pose+Swin Transformer | 15.6 | 12.5 | 57.2 |
YOLOv5s6-Pose+CT | 15.2 | 10.8 | 58.6 |
模块 | 实验 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
SCConv | - | √ | - | - | - | - | √ | √ | √ | √ | √ | √ |
CT | - | - | √ | - | - | - | - | - | √ | - | √ | √ |
UFPA | - | - | - | √ | - | √ | - | √ | - | √ | √ | √ |
MPDIoU | - | - | - | - | √ | √ | √ | - | - | √ | - | √ |
Table 4 Ablation experimental design
模块 | 实验 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
SCConv | - | √ | - | - | - | - | √ | √ | √ | √ | √ | √ |
CT | - | - | √ | - | - | - | - | - | √ | - | √ | √ |
UFPA | - | - | - | √ | - | √ | - | √ | - | √ | √ | √ |
MPDIoU | - | - | - | - | √ | √ | √ | - | - | √ | - | √ |
实验 | 参数量/MB | 计算量/G | AP/% | AP50/% | AR/% |
---|---|---|---|---|---|
1 (Baseline) | 15.1 | 10.2 | 56.7 | 83.7 | 63.7 |
2 (SCConv) | 13.3 | 9.0 | 56.4 | 83.1 | 63.5 |
3 (CT) | 15.2 | 10.8 | 58.6 | 84.8 | 65.6 |
4 (UFPA) | 15.1 | 10.2 | 57.4 | 83.9 | 64.5 |
5 (MPDIoU) | 15.1 | 10.2 | 57.2 | 83.9 | 64.3 |
6 (UFPA+MPDIoU) | 15.1 | 10.2 | 57.8 | 84.2 | 64.9 |
7 (SCConv+MPDIoU) | 13.3 | 9.0 | 56.8 | 83.7 | 63.8 |
8 (SCConv+UFPA) | 13.3 | 9.0 | 57.1 | 83.8 | 64.1 |
9 (SCConv+CT) | 13.4 | 9.6 | 58.2 | 84.6 | 65.4 |
10 (SCConv+UFPA+MPDIoU) | 13.3 | 9.0 | 57.4 | 84.0 | 64.4 |
11 (SCConv+CT+UFPA) | 13.4 | 9.6 | 58.8 | 85.1 | 65.8 |
12 (SCConv+CT+UFPA+MPDIoU) | 13.4 | 9.6 | 59.2 | 85.3 | 66.3 |
Table 5 Comparison of ablation experiment results
实验 | 参数量/MB | 计算量/G | AP/% | AP50/% | AR/% |
---|---|---|---|---|---|
1 (Baseline) | 15.1 | 10.2 | 56.7 | 83.7 | 63.7 |
2 (SCConv) | 13.3 | 9.0 | 56.4 | 83.1 | 63.5 |
3 (CT) | 15.2 | 10.8 | 58.6 | 84.8 | 65.6 |
4 (UFPA) | 15.1 | 10.2 | 57.4 | 83.9 | 64.5 |
5 (MPDIoU) | 15.1 | 10.2 | 57.2 | 83.9 | 64.3 |
6 (UFPA+MPDIoU) | 15.1 | 10.2 | 57.8 | 84.2 | 64.9 |
7 (SCConv+MPDIoU) | 13.3 | 9.0 | 56.8 | 83.7 | 63.8 |
8 (SCConv+UFPA) | 13.3 | 9.0 | 57.1 | 83.8 | 64.1 |
9 (SCConv+CT) | 13.4 | 9.6 | 58.2 | 84.6 | 65.4 |
10 (SCConv+UFPA+MPDIoU) | 13.3 | 9.0 | 57.4 | 84.0 | 64.4 |
11 (SCConv+CT+UFPA) | 13.4 | 9.6 | 58.8 | 85.1 | 65.8 |
12 (SCConv+CT+UFPA+MPDIoU) | 13.4 | 9.6 | 59.2 | 85.3 | 66.3 |
[1] | 冯杰, 郑建立. 基于卷积与Transformer的人体姿态估计方法对比研究[J]. 软件工程, 2023, 26(3): 18-24. |
FENG J, ZHENG J L. A comparative study of human pose estimation based on convolution and Transformer[J]. Software Engineering, 2023, 26(3): 18-24 (in Chinese). | |
[2] | 蔡兴泉, 霍宇晴, 李发建, 等. 面向太极拳学习的人体姿态估计及相似度计算[J]. 图学学报, 2022, 43(4): 695-706. |
CAI X Q, HUO Y Q, LI F J, et al. Human pose estimation and similarity calculation for Tai Chi learning[J]. Journal of Graphics, 2022, 43(4): 695-706 (in Chinese).
DOI |
|
[3] | 蔡敏敏, 黄继风, 林晓, 等. 基于人体姿态估计与聚类的特定运动帧获取方法[J]. 图学学报, 2022, 43(1): 44-52. |
CAI M M, HUANG J F, LIN X, et al. Acquisition method of specific motion frame based on human attitude estimation and clustering[J]. Journal of Graphics, 2022, 43(1): 44-52 (in Chinese). | |
[4] | 范溢华, 王永振, 燕雪峰, 等. 人脸识别任务驱动的低光照图像增强算法[J]. 图学学报, 2022, 43(6): 1170-1181. |
FAN Y H, WANG Y Z, YAN X F, et al. Face recognition-driven low-light image enhancement[J]. Journal of Graphics, 2022, 43(6): 1170-1181 (in Chinese). | |
[5] | 赵心驰, 胡岸明, 何为. 基于卷积神经网络和XGBoost的摔倒检测[J]. 激光与光电子学进展, 2020, 57(16): 161024. |
ZHAO X C, HU A M, HE W. Fall detection based on convolutional neural network and XGBoost[J]. Laser & Optoelectronics Progress, 2020, 57(16): 161024 (in Chinese). | |
[6] | 卢健, 杨腾飞, 赵博, 等. 基于深度学习的人体姿态估计方法综述[J]. 激光与光电子学进展, 2021, 58(24): 69-88. |
LU J, YANG T F, ZHAO B, et al. Review of deep learning- based human pose estimation[J]. Laser & Optoelectronics Progress, 2021, 58(24): 69-88 (in Chinese). | |
[7] | TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1653-1660. |
[8] | WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4724-4732. |
[9] | NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[EB/OL]. (2016-10-11) [2023-08-07]. https://link.springer.com/content/pdf/10.1007/978-3-319-46484-8_29.pdf. |
[10] | 任好盼, 王文明, 危德健, 等. 基于高分辨率网络的人体姿态估计方法[J]. 图学学报, 2021, 42(3): 432-438. |
REN H P, WANG W M, WEI D J, et al. Human pose estimation based on high-resolution net[J]. Journal of Graphics, 2021, 42(3): 432-438 (in Chinese).
DOI |
|
[11] | FANG H S, XIE S Q, TAI Y W, et al. RMPE: regional multi-person pose estimation[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2353-2362. |
[12] | CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7103-7112. |
[13] | SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5686-5696. |
[14] | 曾文献, 马月, 李伟光. 轻量化二维人体骨骼关键点检测算法综述[J]. 科学技术与工程, 2022, 22(16): 6377-6392. |
ZENG W X, MA Y, LI W G. A survey of lightweight two-dimensional human skeleton key point detection algorithms[J]. Science Technology and Engineering, 2022, 22(16): 6377-6392 (in Chinese). | |
[15] | CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[EB/OL]. (2016-11-24) [2023-08-07]. http://arxiv.org/abs/1611.08050. |
[16] | OSOKIN D. Real-time 2D multi-person pose estimation on CPU: lightweight OpenPose[EB/OL]. (2018-11-29) [2023-07- 07]. https://arxiv.longhoe.net/abs/1811.12004. |
[17] | CHENG B W, XIAO B, WANG J D, et al. Higher HRNet: scale-aware representation learning for bottom-up human pose estimation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5385-5394. |
[18] | GENG Z G, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression[EB/OL]. (2021-04-06) [2023-07-07]. http://arxiv.org/abs/2104.02300. |
[19] | MAJI D, NAGORI S, MATHEW M, et al. YOLO-pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss[EB/OL]. (2022-04-14) [2023-08-12]. http://arxiv.org/abs/2204.06806. |
[20] | LI J N, WANG Y W, ZHANG S L. PolarPose: single-stage multi-person pose estimation in polar coordinates[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2023, 32: 1108-1119. |
[21] | LI J F, WEN Y, HE L H. SCConv: spatial and channel reconstruction convolution for feature redundancy[C]// 2023 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 6153-6162. |
[22] | WANG C, ZHOU Y H, ZHANG F, et al. Unbiased feature position alignment for human pose estimation[J]. Neurocomputing, 2023, 537(C): 152-163. |
[23] | MA S L, XU Y. MPDIoU: a loss for efficient and accurate bounding box regression[EB/OL]. (2023-07-14) [2023-08-17]. http://arxiv.org/abs/2307.07662. |
[24] | HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2023-08-17]. http://arxiv.org/abs/1704.04861. |
[25] | NEFF C, SHETH A, FURGURSON S, et al. EfficientHRNet: efficient and scalable high-resolution networks for real-time multi-person 2D human pose estimation[J]. Journal of Real-Time Image Processing, 2021, 18(4): 1037-1049. |
[26] | TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10778-10787. |
[27] | WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 1571-1580. |
[28] | PIAO Y R, JIANG Y Y, ZHANG M, et al. PANet: patch-aware network for light field salient object detection[J]. IEEE Transactions on Cybernetics, 2023, 53(1): 379-391. |
[29] | WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7794-7803. |
[30] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Yorky: IEEE Press, 2018: 7132-7141. |
[31] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 3-19. |
[32] | CAO Y, XU J R, WEI F Y, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop. New York: IEEE Press, 2019: 1971-1980. |
[33] | LIU H J, LIU F Q, FAN X Y, et al. Polarized self-attention: towards high-quality pixel-wise mapping[J]. Neurocomputing, 2022, 506: 158-167. |
[34] | ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with Bi-level routing attention[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10323-10333. |
[35] | LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted Windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002. |
[36] | TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A, et al. MLP-mixer: an all-MLP architecture for vision[EB/OL]. (2021-05-04) [2023-07-23]. http://arxiv.org/abs/2105.01601. |
[37] | HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Yorke: IEEE Press, 2021: 13708-13717. |
[38] | ZHENG Z H, WANG P, REN D W, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2022, 52(8): 8574-8586. |
[39] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: common objects in context[J]. Lecture Notes in Computer Science, 2014, 8693(1): 740-755. |
[1] | ZENG Zhichao, XU Yue, WANG Jingyu, YE Yuanlong, HUANG Zhikai, WANG Huan. A water surface target detection algorithm based on SOE-YOLO lightweight network [J]. Journal of Graphics, 2024, 45(4): 736-744. |
[2] | ZHU Qiangjun, HU Bin, WANG Huilan, WANG Yang. Detection of traffic signs based on lightweight YOLOv8s [J]. Journal of Graphics, 2024, 45(3): 422-432. |
[3] | LI Yuehua, ZHONG Xin, YAO Zhangyan, HU Bin. Detection of dress code violations based on improved YOLOv5s [J]. Journal of Graphics, 2024, 45(3): 433-445. |
[4] | YUAN Chao, ZHAO Yadong, ZHANG Yao, WANG Jiaxuan, XU Dawei, ZHAI Yongjie, ZHU Songsong. Lightweight multi-modal pedestrian detection algorithm based on YOLO [J]. Journal of Graphics, 2024, 45(1): 35-46. |
[5] | ZHAI Yongjie, ZHAO Xiaoyu, WANG Luyao, WANG Yaru, SONG Xiaoke, ZHU Haoshuo. IDD-YOLOv7: a lightweight method for multiple defect detection of insulators in transmission lines [J]. Journal of Graphics, 2024, 45(1): 90-101. |
[6] | LV Heng, YANG Hongyu. A 3D human pose estimation approach based on spatio-temporal motion interaction modeling [J]. Journal of Graphics, 2024, 45(1): 159-168. |
[7] | PI Jun, NIU Hou-xing, GAO Zhi-yun. Lightweight human pose estimation algorithm by integrating CA and BiFPN [J]. Journal of Graphics, 2023, 44(5): 868-878. |
[8] | HAO Shuai, ZHAO Xin-sheng, MA Xu, ZHANG Xu, HE Tian, HOU Li-xiang. Multi-class defect target detection method for transmission lines based on TR-YOLOv5 [J]. Journal of Graphics, 2023, 44(4): 667-676. |
[9] | CAO Yi-qin, ZHOU Yi-wei, XU Lu. A real-time metallic surface defect detection algorithm based on E-YOLOX [J]. Journal of Graphics, 2023, 44(4): 677-690. |
[10] | LI Gang, ZHANG Yun-tao, WANG Wen-kai, ZHANG Dong-yang. Defect detection method of transmission line bolts based on DETR and prior knowledge fusion [J]. Journal of Graphics, 2023, 44(3): 438-447. |
[11] | MAO Ai-kun, LIU Xin-ming, CHEN Wen-zhuang, SONG Shao-lou. Improved substation instrument target detection method for YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(3): 448-455. |
[12] | SUN Long-fei, LIU Hui, YANG Feng-chang, LI Pan. Research on cyclic generative network oriented to inter-layer interpolation of medical images [J]. Journal of Graphics, 2023, 44(3): 502-512. |
[13] | XIONG Ju-ju, XU Yang, FAN Run-ze, SUN Shao-cong. Flowers recognition based on lightweight visual transformer [J]. Journal of Graphics, 2023, 44(2): 271-279. |
[14] | HUANG Zhi-yong, HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai. An imitation U-shaped network for video object segmentation [J]. Journal of Graphics, 2023, 44(1): 104-111. |
[15] | GUO Wen , LI Dong , YUAN Fei. 1. School of Information and Electronic Engineering, Shandong Technology and Business University, Yantai Shandong 264005, China; 2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100195, China [J]. Journal of Graphics, 2022, 43(6): 1124-1133. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||