图学学报 ›› 2025, Vol. 46 ›› Issue (4): 837-846.DOI: 10.11996/JG.j.2095-302X.2025040837
廖国琼1,2(), 黄龙杰1, 李清新2, 辜勇3, 李海波1,4
收稿日期:
2024-10-12
修回日期:
2025-02-18
出版日期:
2025-08-30
发布日期:
2025-08-11
第一作者:
廖国琼(1969-),男,教授,博士。主要研究方向为人机交互。E-mail:liaoguoqiong@163.com
基金资助:
LIAO Guoqiong1,2(), HUANG Longjie1, LI Qingxin2, GU Yong3, LI Haibo1,4
Received:
2024-10-12
Revised:
2025-02-18
Published:
2025-08-30
Online:
2025-08-11
First author:
LIAO Guoqiong (1969-), professor, Ph.D. His main research interest covers human-computer interaction. E-mail:liaoguoqiong@163.com
Supported by:
摘要:
准确重建双手手部网格对于自然的人机交互体验来说是一个至关重要的过程,但由于双手的遮挡、户外收集双手交互数据集的复杂性和复杂的光照环境干扰等因素导致双手手部重建任务仍极具挑战性。目前已有的工作大多是在环境干扰比较小的实验室等场景下取得的的良好效果,而在复杂的光照场景中的重建效果仍不佳。为了解决上述问题,提出一种面向单目可见光环境的自适应手部重建网络。通过引入单手检测框和使用2D复杂光照场景数据集进行弱监督等策略使得模型得以对复杂光照场景产生泛化性;设计的双手特征交互器得以有效建立左右手特征的远距离依赖关系,缓解了单手检测框缺乏双手交互信息的问题;针对如何有效融合交互特征与单手特征的问题,设计了自适应融合的策略,增强了模型的鲁棒性。实验结果表明,在包含多个复杂光照场景的HIC数据集中取得了最佳的效果。
中图分类号:
廖国琼, 黄龙杰, 李清新, 辜勇, 李海波. 面向单目可见光环境的自适应双手重建网络[J]. 图学学报, 2025, 46(4): 837-846.
LIAO Guoqiong, HUANG Longjie, LI Qingxin, GU Yong, LI Haibo. Adaptive two-hand reconstruction network for monocular visible light environments[J]. Journal of Graphics, 2025, 46(4): 837-846.
图1 现有方法在双手交互过程中的网格重建((a) 输入图片;(b) 建模效果)
Fig. 1 The existing methods for mesh reconstruction during the process of two-handed interaction ((a) Input image; (b) Modeling results)
翻转 左手 | 双手特征 交互器 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
13.23 | 14.05 | 13.68 | 36.14 | ||
√ | 12.86 | 13.75 | 13.28 | 33.78 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
表1 自适应手部重建网络不同模块对比
Table 1 Comparison of different modules in adaptive hand reconstruction network
翻转 左手 | 双手特征 交互器 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
13.23 | 14.05 | 13.68 | 36.14 | ||
√ | 12.86 | 13.75 | 13.28 | 33.78 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
图6 手部翻转策略对比((a) 原始图片;(b) 采用翻转策略;(c) 未采用翻转策略)
Fig. 6 Comparison of hand flip strategies ((a) Original image; (b) Flip strategy; (c) Without flipping strategy)
网络 | Box IOU | 参数量/M |
---|---|---|
ResNet50 | 86.35 | 25.60 |
多尺度残差骨干网络 | 84.98 | 8.63 |
表2 多尺度残差骨干网络与ResNet50参数量对比
Table 2 Comparison of parameter quantities between multi-scale residual backbone network and ResNet50
网络 | Box IOU | 参数量/M |
---|---|---|
ResNet50 | 86.35 | 25.60 |
多尺度残差骨干网络 | 84.98 | 8.63 |
融合 特征 适应到 每只手 | 交互与 单手特征 自适应 融合 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
12.67 | 13.65 | 13.57 | 30.28 | ||
√ | 11.23 | 12.98 | 13.01 | 29.24 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
表3 双手特征交互器的自适应融合机制对比
Table 3 Comparison of adaptive fusion mechanisms for two hand feature interactors
融合 特征 适应到 每只手 | 交互与 单手特征 自适应 融合 | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
12.67 | 13.65 | 13.57 | 30.28 | ||
√ | 11.23 | 12.98 | 13.01 | 29.24 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
注意力 层数 | 模型参 数量/M | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
4 | 58.91 | 10.03 | 12.72 | 12.31 | 26.90 |
6 | 65.20 | 9.77 | 12.39 | 11.80 | 26.03 |
8 | 73.41 | 9.73 | 12.31 | 11.74 | 26.01 |
12 | 81.69 | 9.68 | 12.28 | 11.71 | 25.98 |
表4 TFormer中注意力堆叠层数与手部估计误差和参数量之间的关系
Table 4 The relationship between the number of attention stacking layers, hand estimation error, and parameter quantity in TFormer
注意力 层数 | 模型参 数量/M | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
4 | 58.91 | 10.03 | 12.72 | 12.31 | 26.90 |
6 | 65.20 | 9.77 | 12.39 | 11.80 | 26.03 |
8 | 73.41 | 9.73 | 12.31 | 11.74 | 26.01 |
12 | 81.69 | 9.68 | 12.28 | 11.71 | 25.98 |
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 29.18 | 32.66 | 30.68 | 76.82 |
Keypoint | 46.96 | 42.39 | 45.12 | 127.31 |
InterWild | 15.53 | 15.98 | 15.83 | 30.39 |
IntagHand | - | 50.13 | - | - |
AHRNet | 15.12 | 15.59 | 15.32 | 30.02 |
表5 在HIC数据集上的误差对比/mm
Table 5 Error comparison on HIC dataset/mm
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 29.18 | 32.66 | 30.68 | 76.82 |
Keypoint | 46.96 | 42.39 | 45.12 | 127.31 |
InterWild | 15.53 | 15.98 | 15.83 | 30.39 |
IntagHand | - | 50.13 | - | - |
AHRNet | 15.12 | 15.59 | 15.32 | 30.02 |
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 8.61 | 10.23 | 9.72 | 31.29 |
Keypoint | 12.16 | 15.01 | 13.54 | 32.96 |
InterWild | 10.09 | 12.46 | 11.91 | 27.71 |
IntagHand | - | 9.48 | - | - |
AHRNet | 9.77 | 12.39 | 11.80 | 26.03 |
表6 在InterHand2.6M测试集上的误差对比/mm
Table 6 Error comparison on InterHand2.6M test set/mm
方法 | MPVPE | MRRPE | ||
---|---|---|---|---|
Single | Two | All | ||
EANet | 8.61 | 10.23 | 9.72 | 31.29 |
Keypoint | 12.16 | 15.01 | 13.54 | 32.96 |
InterWild | 10.09 | 12.46 | 11.91 | 27.71 |
IntagHand | - | 9.48 | - | - |
AHRNet | 9.77 | 12.39 | 11.80 | 26.03 |
IH2.6M | COCO | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
√ | 8.53 | 10.77 | 10.25 | 25.33 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
表7 引入真实场景数据集进行训练对实验室场景下的影响对比
Table 7 Comparison of the impact of introducing real-world datasets for training on laboratory scenarios
IH2.6M | COCO | MPVPE | MRRPE | ||
---|---|---|---|---|---|
Single | Two | All | |||
√ | 8.53 | 10.77 | 10.25 | 25.33 | |
√ | √ | 9.77 | 12.39 | 11.80 | 26.03 |
图8 在实际场景下的手部建模对比((a) 严重遮挡场景;(b) 手部复杂姿态场景)
Fig. 8 Comparison of hand modeling in actual scenarios ((a) Scenarios with severe occlusion; (b) Scenarios with complex hand postures)
图9 在复杂背景下的手部网络渲染结果对比((a) 原始图片;(b) 本文模型;(c) EANet;(d) IntagHand)
Fig. 9 Comparison of hand network rendering results in complex backgrounds ((a) Original image; (b) Ours; (c) EANet; (d) IntagHand)
[1] |
毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639.
DOI |
BI C Y, LIU Y. A survey of video human action recognition based on deep learning[J]. Journal of Graphics, 2023, 44(4): 625-639 (in Chinese). | |
[2] |
黄友文, 林志钦, 章劲, 等. 结合坐标Transformer的轻量级人体姿态估计算法[J]. 图学学报, 2024, 45(3): 516-527.
DOI |
HUANG Y W, LIN Z Q, ZHANG J, et al. Lightweight human pose estimation algorithm combined with coordinate Transformer[J]. Journal of Graphics, 2024, 45(3): 516-527 (in Chinese).
DOI |
|
[3] |
郝帅, 赵新生, 马旭, 等. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676.
DOI |
HAO S, ZHAO X S, MA X, et al. Multi-class defect target detection method for transmission lines based on TR-YOLOv5[J]. Journal of Graphics, 2023, 44(4): 667-676 (in Chinese).
DOI |
|
[4] | CHEN L J, LIN S Y, XIE Y S, et al. MVHM: a large-scale multi-view hand mesh benchmark for accurate 3D hand pose estimation[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 836-845. |
[5] | KHALEGHI L, SEPAS-MOGHADDAM A, MARSHALL J, et al. Multiview video-based 3-D hand pose estimation[J]. IEEE Transactions on Artificial Intelligence, 2023, 4(4): 896-909. |
[6] |
薛皓玮, 王美丽. 融合生物力学约束与多模态数据的手部重建[J]. 图学学报, 2023, 44(4): 794-800.
DOI |
XUE H W, WANG M L. Hand reconstruction incorporating biomechanical constraints and multi-modal data[J]. Journal of Graphics, 2023, 44(4): 794-800 (in Chinese). | |
[7] | REHG J M, KANADE T. DigitEyes: vision-based hand tracking for human-computer interaction[C]// 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects. New York: IEEE Press, 1994: 16-22. |
[8] |
STENGER B, THAYANANTHAN A, TORR P H S, et al. Model-based hand tracking using a hierarchical Bayesian filter[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(9): 1372-1384.
PMID |
[9] | CAO Z, RADOSAVOVIC I, KANAZAWA A, et al. Reconstructing hand-object interactions in the wild[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 12397-12406. |
[10] | GRADY P, TANG C C, TWIGG C D, et al. ContactOpt: optimizing contact to improve grasps[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1471-1481. |
[11] | LIU S W, JIANG H W, XU J R, et al. Semi-supervised 3D hand-object poses estimation with interactions in time[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14682-14692. |
[12] | CAI Y J, GE L H, CAI J F, et al. Weakly-supervised 3D hand pose estimation from monocular RGB images[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 678-694. |
[13] | ZIMMERMANN C, BROX T. Learning to estimate 3D hand pose from single RGB images[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 4913-4921. |
[14] | ROMERO J, TZIONAS D, BLACK M J. Embodied hands: modeling and capturing hands and bodies together[J]. ACM Transactions on Graphics, 2017, 36(6): 245. |
[15] | BOUKHAYMA A, DE BEM R, TORR P H S. 3D hand shape and pose from images in the wild[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10835-10844. |
[16] | GU J X, WANG Z H, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377. |
[17] | ZHANG B W, WANG Y G, DENG X M, et al. Interacting two-hand 3D pose and shape reconstruction from single color image[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11334-11343. |
[18] | REN Z, YUAN J S, MENG J J, et al. Robust part-based hand gesture recognition using Kinect sensor[J]. IEEE Transactions on Multimedia, 2013, 15(5): 1110-1120. |
[19] | MUELLER F, BERNARD F, SOTNYCHENKO O, et al. GANerated hands for real-time 3D hand tracking from monocular RGB[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 49-59. |
[20] | DIBRA E, WOLF T, OZTIRELI C, et al. How to refine 3D hand pose estimation from unlabelled depth data?[C]//2017 International Conference on 3D Vision (3DV). New York: IEEE Press, 2017: 135-144. |
[21] | LI M C, AN L, ZHANG H W, et al. Interacting attention graph for single image two-hand reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2751-2760. |
[22] | PARK J, JUNG D S, MOON G, et al. Extract-and-adaptation network for 3D interacting hand mesh recovery[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 4202-4211. |
[23] | ESHRATIFAR A E, ESMAILI A, PEDRAM M. BottleNet: a deep learning architecture for intelligent mobile cloud computing services[C]// 2019 IEEE/ACM International Symposium on Low Power Electronics and Design. New York: IEEE Press, 2019: 1-6. |
[24] | LIN F Q, WILHELM C, MARTINEZ T. Two-hand global 3D pose estimation using monocular RGB[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 2372-2380. |
[25] | HUANG H B, ZHOU X Q, CAO J, et al. Vision transformer with super token sampling[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22690-22699. |
[26] | MOON G, YU S, WEN H, et al. InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image[EB/OL]. [2024-06-07]. https://dblp.org/rec/journals/corr/abs-2008-09309.html. |
[27] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// The 13th European Conference on Computer Vision. Cham: Springer, 2014: 740-755. |
[28] | TZIONAS D, BALLAN L, SRIKANTHA A, et al. Capturing hands in action using discriminative salient points and physics simulation[J]. International Journal of Computer Vision, 2016, 118(2): 172-193. |
[29] | HAMPALI S, SARKAR S D, RAD M, et al. Keypoint transformer: solving joint identification in challenging hands and object interactions for accurate 3D pose estimation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 11080-11090. |
[30] | MOON G. Bringing inputs to shared domains for 3D interacting hands recovery in the wild[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17028-17037. |
[1] | 郭瑞东, 蓝贵文, 范冬林, 钟展, 徐梓睿, 任新月. 基于特征聚焦扩散网络的电力巡检目标检测算法[J]. 图学学报, 2025, 46(4): 719-726. |
[2] | 闫卓越, 刘骊, 付晓东, 刘利军, 彭玮. 三维人体姿态和形状估计的分层注意力时空特征融合算法[J]. 图学学报, 2025, 46(4): 746-755. |
[3] | 董佳乐, 邓正杰, 李喜艳, 王诗韵. 基于频域和空域多特征融合的深度伪造检测方法[J]. 图学学报, 2025, 46(1): 104-113. |
[4] | 卢洋, 陈林慧, 姜晓恒, 徐明亮. SDENet:基于多尺度注意力质量感知的合成缺陷数据评价网络[J]. 图学学报, 2025, 46(1): 94-103. |
[5] | 闫建红, 冉同霄. 基于YOLOv8的轻量化无人机图像目标检测算法[J]. 图学学报, 2024, 45(6): 1328-1337. |
[6] | 吴沛宸, 袁立宁, 胡皓, 刘钊, 郭放. 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929. |
[7] | 刘丽, 张起凡, 白宇昂, 黄凯烨. 结合Swin Transformer的多尺度遥感图像变化检测研究[J]. 图学学报, 2024, 45(5): 941-956. |
[8] | 章东平, 魏杨悦, 何数技, 徐云超, 胡海苗, 黄文君. 特征融合与层间传递:一种基于Anchor DETR改进的目标检测方法[J]. 图学学报, 2024, 45(5): 968-978. |
[9] | 罗智徽, 胡海涛, 马潇峰, 程文刚. 基于同质中间模态的跨模态行人再识别方法[J]. 图学学报, 2024, 45(4): 670-682. |
[10] | 牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735. |
[11] | 艾列富, 陶勇, 蒋常玉. 基于全局注意力的正交融合图像描述符[J]. 图学学报, 2024, 45(3): 472-481. |
[12] | 崔克彬, 焦静颐. 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法[J]. 图学学报, 2024, 45(1): 112-125. |
[13] | 张丽媛, 赵海蓉, 何巍, 唐雄风. 融合全局-局部注意模块的Mask R-CNN膝关节囊肿检测方法[J]. 图学学报, 2023, 44(6): 1183-1190. |
[14] | 石佳豪, 姚莉. 基于语义引导的视频描述生成[J]. 图学学报, 2023, 44(6): 1191-1201. |
[15] | 李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||