欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (5): 868-878.DOI: 10.11996/JG.j.2095-302X.2023050868

• 图像处理与计算机视觉 • 上一篇    下一篇

融合CA-BiFPN的轻量化人体姿态估计算法

皮骏(), 牛厚兴, 高志云()   

  1. 中国民航大学交通科学与工程学院,天津 300300
  • 收稿日期:2023-05-31 接受日期:2023-08-03 出版日期:2023-10-31 发布日期:2023-10-31
  • 通讯作者: 高志云(1993-),女,讲师,博士。主要研究方向为图像处理与模式识别。E-mail:zhiyungao@163.com
  • 作者简介:皮骏(1973-),男,副教授,博士。主要研究方向为目标检测、图像处理与模式识别。E-mail:jpi@cauc.edu.cn
  • 基金资助:
    中国交通教育研究会2022-2024年度教育科学研究课题(JT2022YB325)

Lightweight human pose estimation algorithm by integrating CA and BiFPN

PI Jun(), NIU Hou-xing, GAO Zhi-yun()   

  1. School of Transportation Science and Engineering, Civil Aviation University of China, Tianjin 300300, China
  • Received:2023-05-31 Accepted:2023-08-03 Online:2023-10-31 Published:2023-10-31
  • Contact: GAO Zhi-yun (1993-), lecturer, PH.D. Her main research interests cover image processing and pattern recognition. E-mail:zhiyungao@163.com
  • About author:PI Jun (1973-), associate professor, Ph.D. His main research interests cover object detection, image processing and pattern recognition. E-mail:jpi@cauc.edu.cn
  • Supported by:
    China Association of Transport Education Research 2022-2024 Education Science Research Project(JT2022YB325)

摘要:

针对现有的基于热力图的人体姿态估计网络模型复杂度高、算力需求大、不易部署至嵌入式平台和无人机移动平台等问题,提出了一种基于YOLOv5s6-Pose-ti-lite不使用热力图的轻量化人体姿态估计网络模型。通过将主干网络替换为GhostNet网络,旨在以更少的计算资源输出更有效的特征信息,提升网络检测速度,缓解网络冗余的问题;在主干网络中结合轻量化的坐标注意力CA模块,将图片的人体关键点位置信息聚集到通道上,增强特征提取能力;引入加权双向特征金字塔网络,提升模型的特征融合能力,平衡不同尺度的特征信息;最后将CIoU损失函数替换为Wise-IoU (WIoU),进一步提升模型对人体关键点回归的性能。结果表明,在COCO2017人体关键点数据集上,优化后的网络模型参数量降低26.2%,计算量降低30.0%,平均精确度提升1.7个百分点、平均召回率提升2.7个百分点,能够满足实时性的效果,验证了所提模型的可行性和有效性。

关键词: 人体姿态估计, 轻量化, 坐标注意力, 加权双向特征金字塔网络, 损失函数

Abstract:

To address the problems of existing heatmap-based human pose estimation network models, such as high complexity, intensive computing power requirements, and challenges in deployment on embedded platforms and UAV mobile platforms, a lightweight human pose estimation network was proposed based on YOLOv5s6-Pose-ti-lite without using heatmaps. By replacing the backbone network with GhostNet, it enabled the output of more effective feature information with reduced computing resources. This resulted in faster network detection and alleviated issues related to network redundancy. Within the backbone network, a lightweight coordinate attention (CA) attention module was integrated to gather the position information of human keypoints in the picture to the channel, thus enhancing the ability of feature extraction. BiFPN (weighted bidirectional feature pyramid network) module was introduced to enhance the feature fusion ability of the model and balance the feature information across different scales. Finally, the CIoU loss function was replaced with wise-IoU (WIoU) to enhance the performance of the model for human keypoint regression. The results demonstrated that on the COCO2017 human keypoint dataset, the parameters of the optimized network model were reduced by 26.2%, the calculation was decreased by 30.0%, the average precision was increased by 1.7 percentage points, and the average recall rate was boosted by 2.7 percentage points. These improvements could enable real-time performance, verifying the feasibility and effectiveness of the proposed model.

Key words: human pose estimation, lightweight, coordinate attention, weighted bidirectional feature pyramid network, loss function

中图分类号: