Journal of Graphics ›› 2025, Vol. 46 ›› Issue (4): 837-846.DOI: 10.11996/JG.j.2095-302X.2025040837
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
LIAO Guoqiong1,2(), HUANG Longjie1, LI Qingxin2, GU Yong3, LI Haibo1,4
Received:
2024-10-12
Revised:
2025-02-18
Online:
2025-08-30
Published:
2025-08-11
About author:
First author contact:LIAO Guoqiong (1969-), professor, Ph.D. His main research interest covers human-computer interaction. E-mail:liaoguoqiong@163.com
Supported by:
CLC Number:
LIAO Guoqiong, HUANG Longjie, LI Qingxin, GU Yong, LI Haibo. Adaptive two-hand reconstruction network for monocular visible light environments[J]. Journal of Graphics, 2025, 46(4): 837-846.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025040837
翻转 左手 | 双手特征 交互器 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
13.23 | 14.05 | 13.68 | 36.14 | ||
√ | 12.86 | 13.75 | 13.28 | 33.78 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
Table 1 Comparison of different modules in adaptive hand reconstruction network
翻转 左手 | 双手特征 交互器 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
13.23 | 14.05 | 13.68 | 36.14 | ||
√ | 12.86 | 13.75 | 13.28 | 33.78 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
网络 | Box IOU | 参数量/M |
---|---|---|
ResNet50 | 86.35 | 25.60 |
多尺度残差骨干网络 | 84.98 | 8.63 |
Table 2 Comparison of parameter quantities between multi-scale residual backbone network and ResNet50
网络 | Box IOU | 参数量/M |
---|---|---|
ResNet50 | 86.35 | 25.60 |
多尺度残差骨干网络 | 84.98 | 8.63 |
融合 特征 适应到 每只手 | 交互与 单手特征 自适应 融合 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
12.67 | 13.65 | 13.57 | 30.28 | ||
√ | 11.23 | 12.98 | 13.01 | 29.24 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
Table 3 Comparison of adaptive fusion mechanisms for two hand feature interactors
融合 特征 适应到 每只手 | 交互与 单手特征 自适应 融合 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
12.67 | 13.65 | 13.57 | 30.28 | ||
√ | 11.23 | 12.98 | 13.01 | 29.24 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
注意力 层数 | 模型参 数量/M | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
4 | 58.91 | 10.03 | 12.72 | 12.31 | 26.90 |
6 | 65.20 | 9.77 | 12.39 | 11.80 | 26.03 |
8 | 73.41 | 9.73 | 12.31 | 11.74 | 26.01 |
12 | 81.69 | 9.68 | 12.28 | 11.71 | 25.98 |
Table 4 The relationship between the number of attention stacking layers, hand estimation error, and parameter quantity in TFormer
注意力 层数 | 模型参 数量/M | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
4 | 58.91 | 10.03 | 12.72 | 12.31 | 26.90 |
6 | 65.20 | 9.77 | 12.39 | 11.80 | 26.03 |
8 | 73.41 | 9.73 | 12.31 | 11.74 | 26.01 |
12 | 81.69 | 9.68 | 12.28 | 11.71 | 25.98 |
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 29.18 | 32.66 | 30.68 | 76.82 |
Keypoint | 46.96 | 42.39 | 45.12 | 127.31 |
InterWild | 15.53 | 15.98 | 15.83 | 30.39 |
IntagHand | - | 50.13 | - | - |
AHRNet | 15.12 | 15.59 | 15.32 | 30.02 |
Table 5 Error comparison on HIC dataset/mm
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 29.18 | 32.66 | 30.68 | 76.82 |
Keypoint | 46.96 | 42.39 | 45.12 | 127.31 |
InterWild | 15.53 | 15.98 | 15.83 | 30.39 |
IntagHand | - | 50.13 | - | - |
AHRNet | 15.12 | 15.59 | 15.32 | 30.02 |
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 8.61 | 10.23 | 9.72 | 31.29 |
Keypoint | 12.16 | 15.01 | 13.54 | 32.96 |
InterWild | 10.09 | 12.46 | 11.91 | 27.71 |
IntagHand | - | 9.48 | - | - |
AHRNet | 9.77 | 12.39 | 11.80 | 26.03 |
Table 6 Error comparison on InterHand2.6M test set/mm
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 8.61 | 10.23 | 9.72 | 31.29 |
Keypoint | 12.16 | 15.01 | 13.54 | 32.96 |
InterWild | 10.09 | 12.46 | 11.91 | 27.71 |
IntagHand | - | 9.48 | - | - |
AHRNet | 9.77 | 12.39 | 11.80 | 26.03 |
IH2.6M | COCO | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
√ | 8.53 | 10.77 | 10.25 | 25.33 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
Table 7 Comparison of the impact of introducing real-world datasets for training on laboratory scenarios
IH2.6M | COCO | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
√ | 8.53 | 10.77 | 10.25 | 25.33 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
[1] |
毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639.
DOI |
BI C Y, LIU Y. A survey of video human action recognition based on deep learning[J]. Journal of Graphics, 2023, 44(4): 625-639 (in Chinese). | |
[2] |
黄友文, 林志钦, 章劲, 等. 结合坐标Transformer的轻量级人体姿态估计算法[J]. 图学学报, 2024, 45(3): 516-527.
DOI |
HUANG Y W, LIN Z Q, ZHANG J, et al. Lightweight human pose estimation algorithm combined with coordinate Transformer[J]. Journal of Graphics, 2024, 45(3): 516-527 (in Chinese).
DOI |
|
[3] |
郝帅, 赵新生, 马旭, 等. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676.
DOI |
HAO S, ZHAO X S, MA X, et al. Multi-class defect target detection method for transmission lines based on TR-YOLOv5[J]. Journal of Graphics, 2023, 44(4): 667-676 (in Chinese).
DOI |
|
[4] | CHEN L J, LIN S Y, XIE Y S, et al. MVHM: a large-scale multi-view hand mesh benchmark for accurate 3D hand pose estimation[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 836-845. |
[5] | KHALEGHI L, SEPAS-MOGHADDAM A, MARSHALL J, et al. Multiview video-based 3-D hand pose estimation[J]. IEEE Transactions on Artificial Intelligence, 2023, 4(4): 896-909. |
[6] |
薛皓玮, 王美丽. 融合生物力学约束与多模态数据的手部重建[J]. 图学学报, 2023, 44(4): 794-800.
DOI |
XUE H W, WANG M L. Hand reconstruction incorporating biomechanical constraints and multi-modal data[J]. Journal of Graphics, 2023, 44(4): 794-800 (in Chinese). | |
[7] | REHG J M, KANADE T. DigitEyes: vision-based hand tracking for human-computer interaction[C]// 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects. New York: IEEE Press, 1994: 16-22. |
[8] |
STENGER B, THAYANANTHAN A, TORR P H S, et al. Model-based hand tracking using a hierarchical Bayesian filter[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(9): 1372-1384.
PMID |
[9] | CAO Z, RADOSAVOVIC I, KANAZAWA A, et al. Reconstructing hand-object interactions in the wild[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 12397-12406. |
[10] | GRADY P, TANG C C, TWIGG C D, et al. ContactOpt: optimizing contact to improve grasps[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1471-1481. |
[11] | LIU S W, JIANG H W, XU J R, et al. Semi-supervised 3D hand-object poses estimation with interactions in time[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14682-14692. |
[12] | CAI Y J, GE L H, CAI J F, et al. Weakly-supervised 3D hand pose estimation from monocular RGB images[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 678-694. |
[13] | ZIMMERMANN C, BROX T. Learning to estimate 3D hand pose from single RGB images[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 4913-4921. |
[14] | ROMERO J, TZIONAS D, BLACK M J. Embodied hands: modeling and capturing hands and bodies together[J]. ACM Transactions on Graphics, 2017, 36(6): 245. |
[15] | BOUKHAYMA A, DE BEM R, TORR P H S. 3D hand shape and pose from images in the wild[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10835-10844. |
[16] | GU J X, WANG Z H, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377. |
[17] | ZHANG B W, WANG Y G, DENG X M, et al. Interacting two-hand 3D pose and shape reconstruction from single color image[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11334-11343. |
[18] | REN Z, YUAN J S, MENG J J, et al. Robust part-based hand gesture recognition using Kinect sensor[J]. IEEE Transactions on Multimedia, 2013, 15(5): 1110-1120. |
[19] | MUELLER F, BERNARD F, SOTNYCHENKO O, et al. GANerated hands for real-time 3D hand tracking from monocular RGB[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 49-59. |
[20] | DIBRA E, WOLF T, OZTIRELI C, et al. How to refine 3D hand pose estimation from unlabelled depth data?[C]//2017 International Conference on 3D Vision (3DV). New York: IEEE Press, 2017: 135-144. |
[21] | LI M C, AN L, ZHANG H W, et al. Interacting attention graph for single image two-hand reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2751-2760. |
[22] | PARK J, JUNG D S, MOON G, et al. Extract-and-adaptation network for 3D interacting hand mesh recovery[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 4202-4211. |
[23] | ESHRATIFAR A E, ESMAILI A, PEDRAM M. BottleNet: a deep learning architecture for intelligent mobile cloud computing services[C]// 2019 IEEE/ACM International Symposium on Low Power Electronics and Design. New York: IEEE Press, 2019: 1-6. |
[24] | LIN F Q, WILHELM C, MARTINEZ T. Two-hand global 3D pose estimation using monocular RGB[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 2372-2380. |
[25] | HUANG H B, ZHOU X Q, CAO J, et al. Vision transformer with super token sampling[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22690-22699. |
[26] | MOON G, YU S, WEN H, et al. InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image[EB/OL]. [2024-06-07]. https://dblp.org/rec/journals/corr/abs-2008-09309.html. |
[27] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// The 13th European Conference on Computer Vision. Cham: Springer, 2014: 740-755. |
[28] | TZIONAS D, BALLAN L, SRIKANTHA A, et al. Capturing hands in action using discriminative salient points and physics simulation[J]. International Journal of Computer Vision, 2016, 118(2): 172-193. |
[29] | HAMPALI S, SARKAR S D, RAD M, et al. Keypoint transformer: solving joint identification in challenging hands and object interactions for accurate 3D pose estimation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 11080-11090. |
[30] | MOON G. Bringing inputs to shared domains for 3D interacting hands recovery in the wild[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17028-17037. |
[1] | GUO Ruidong, LAN Guiwen, FAN Donglin, ZHONG Zhan, XU Zirui, REN Xinyue. An object detection algorithm for powerline inspection based on the feature focus & diffusion network [J]. Journal of Graphics, 2025, 46(4): 719-726. |
[2] | YAN Zhuoyue, LIU Li, FU Xiaodong, LIU Lijun, PENG Wei. Hierarchical attention spatial-temporal feature fusion algorithm for 3D human pose and shape estimation [J]. Journal of Graphics, 2025, 46(4): 746-755. |
[3] | NIU Hang, GE Xinyu, ZHAO Xiaoyu, YANG Ke, WANG Qianming, ZHAI Yongjie. Vibration damper defect detection algorithm based on improved YOLOv8 [J]. Journal of Graphics, 2025, 46(3): 532-541. |
[4] | DONG Jiale, DENG Zhengjie, LI Xiyan, WANG Shiyun. Deepfake detection method based on multi-feature fusion of frequency domain and spatial domain [J]. Journal of Graphics, 2025, 46(1): 104-113. |
[5] | LU Yang, CHEN Linhui, JIANG Xiaoheng, XU Mingliang. SDENet: a synthetic defect data evaluation network based on multi-scale attention quality perception [J]. Journal of Graphics, 2025, 46(1): 94-103. |
[6] | YAN Jianhong, RAN Tongxiao. Lightweight UAV image target detection algorithm based on YOLOv8 [J]. Journal of Graphics, 2024, 45(6): 1328-1337. |
[7] | WU Peichen, YUAN Lining, HU Hao, LIU Zhao, GUO Fang. Video anomaly detection based on attention feature fusion [J]. Journal of Graphics, 2024, 45(5): 922-929. |
[8] | LIU Li, ZHANG Qifan, BAI Yuang, HUANG Kaiye. Research on multi-scale remote sensing image change detection using Swin Transformer [J]. Journal of Graphics, 2024, 45(5): 941-956. |
[9] | ZHANG Dongping, WEI Yangyue, HE Shuji, XU Yunchao, HU Haimiao, HUANG Wenjun. Feature fusion and inter-layer transmission: an improved object detection method based on Anchor DETR [J]. Journal of Graphics, 2024, 45(5): 968-978. |
[10] | LUO Zhihui, HU Haitao, MA Xiaofeng, CHENG Wengang. A network based on the homogeneous middle modality for cross-modality person re-identification [J]. Journal of Graphics, 2024, 45(4): 670-682. |
[11] | NIU Weihua, GUO Xun. Rotating target detection algorithm in ship remote sensing images based on YOLOv8 [J]. Journal of Graphics, 2024, 45(4): 726-735. |
[12] | AI Liefu, TAO Yong, JIANG Changyu. Orthogonal fusion image descriptor based on global attention [J]. Journal of Graphics, 2024, 45(3): 472-481. |
[13] | CUI Kebin, JIAO Jingyi. Steel surface defect detection algorithm based on MCB-FAH-YOLOv8 [J]. Journal of Graphics, 2024, 45(1): 112-125. |
[14] | ZHANG Li-yuan, ZHAO Hai-rong, HE Wei, TANG Xiong-feng. Knee cysts detection algorithm based on Mask R-CNN integrating global-local attention module [J]. Journal of Graphics, 2023, 44(6): 1183-1190. |
[15] | SHI Jia-hao, YAO Li. Video captioning based on semantic guidance [J]. Journal of Graphics, 2023, 44(6): 1191-1201. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||