Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (3): 516-527.DOI: 10.11996/JG.j.2095-302X.2024030516

• Computer Graphics and Virtual Reality • Previous Articles     Next Articles

Lightweight human pose estimation algorithm combined with coordinate Transformer

HUANG Youwen(), LIN Zhiqin, ZHANG Jin, CHEN Junkuan   

  1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou Jiangxi 341000, China
  • Received:2023-11-17 Accepted:2024-02-24 Online:2024-06-30 Published:2024-06-11
  • About author:

    HUANG Youwen (1982-), associate professor, Ph.D. His main research interests cover computer vision, natural language processing, and machine learning. E-mail:ywhuang@jxust.edu.cn

  • Supported by:
    Jiangxi Provincial Department of Education(GJJ180443)

Abstract:

Addressing issues such as large model size, high computational costs, and limited compatibility with edge devices in most existing bottom-up human pose estimation algorithms, this study proposed a lightweight multi-person pose estimation network model named YOLOv5s6-Pose-CT based on YOLOv5s6-Pose. In order to reduce feature redundancy across both spatial and channel dimensions, the network model introduced spatial and channel reconstruction convolution in the neck network. Simultaneously, a coordinate Transformer was incorporated into the backbone network to enhance long-distance dependence while maintaining efficient local feature extraction ability. Furthermore, unbiased feature position alignment was employed to resolve feature dislocation during multi-scale fusion. Finally, this study redefined the regression loss of bounding boxes using the MPDIoU (minimum point distance-based IoU) loss function. Experimental results on the COCO 2017 dataset demonstrated that compared with EfficientHRNet-H1 (a mainstream lightweight network), our optimized network model reduced parameters by 16.2% and computation by 66.1%, respectively, while maintaining comparable accuracy levels. Moreover, compared with the baseline approach, our proposed model achieved parameter and computation reductions of 11.2% and 5.8%, respectively, along with improvements of 2.5% in average detection accuracy and 2.6% in recall rate.

Key words: human pose estimation, lightweight, coordinate Transformer, unbiased feature position alignment, loss function

CLC Number: