图学学报 ›› 2025, Vol. 46 ›› Issue (2): 393-401.DOI: 10.11996/JG.j.2095-302X.2025020393
收稿日期:
2024-07-05
接受日期:
2024-11-27
出版日期:
2025-04-30
发布日期:
2025-04-24
通讯作者:
王康侃(1988-),男,副教授,博士。主要研究方向为计算机视觉、虚拟现实、三维重建等。E-mail:wangkangkan@njust.edu.cn第一作者:
方程浩(1999-),男,硕士研究生。主要研究方向为计算机图形学、计算机视觉、三维重建。E-mail:121106022661@njust.edu.cn
基金资助:
FANG Chenghao(), WANG Kangkan(
)
Received:
2024-07-05
Accepted:
2024-11-27
Published:
2025-04-30
Online:
2025-04-24
First author:
FANG Chenghao (1999-), master student. His main research interests cover computer graphics, computer vision and 3D reconstruction. E-mail:121106022661@njust.edu.cn
Supported by:
摘要:
在有限标签样本的条件下,单视角点云的三维人体姿态和形状估计一直存在模型估计精度低、泛化能力弱等问题。现有的方法通常采用微调方法优化模型,但对新样本的微调步骤大大增加了运行复杂度,本质上没有提高模型的泛化能力。为解决以上问题,提出了一种基于半监督学习的三维人体姿态与形状估计方法,在有限的标签数据条件下,利用大量无标签人体点云数据提高模型估计精度和泛化能力。具体地,首先对无标签数据进行弱增强和强增强,同时估计2种增强样本的三维人体参数模型。然后对弱增强样本的预测结果进行伪标签准确性判断,并基于一致性正则化思想约束强增强样本的预测结果,以迭代方式逐步优化伪标签质量和增加用于训练的伪标签数量,进而提升模型的估计精度。该算法在多种公开数据集上做了充分的定量和定性实验,实验结果证明该算法在有限标签样本的条件下提高了三维人体姿态和形状的估计精度,并增强了模型的泛化性能。
中图分类号:
方程浩, 王康侃. 基于半监督学习的单视角点云三维人体姿态与形状估计[J]. 图学学报, 2025, 46(2): 393-401.
FANG Chenghao, WANG Kangkan. 3D human pose and shape estimation from single-view point clouds with semi-supervised learning[J]. Journal of Graphics, 2025, 46(2): 393-401.
图1 本文基于半监督学习的单视角点云三维人体姿态与形状估计算法框架
Fig. 1 The framework for 3D human pose and shape estimation from single-view point clouds with semi-supervised learning
方法 | 无增强 | 随机 平移 | 平均 降采样 | 随机 噪声 | 不均匀 密度 | 随机 去块 | 多次 随机去块 | 平均降采样+ 噪声 | 多次随机去块+ 噪声 |
---|---|---|---|---|---|---|---|---|---|
CAPE | 21.32 | 22.03 | 24.75 | 28.34 | 36.33 | 43.01 | 51.07 | 30.04 | 60.82 |
SURREAL | 23.07 | 23.68 | 26.16 | 29.04 | 37.84 | 42.63 | 56.47 | 30.94 | 57.91 |
Kungfu | 22.20 | 23.11 | 27.19 | 28.51 | 30.84 | 37.57 | 47.33 | 35.20 | 54.96 |
表1 不同点云增强方法的人体模型估计误差/mm
Table 1 Human model estimation errors for different point cloud augmentation methods/mm
方法 | 无增强 | 随机 平移 | 平均 降采样 | 随机 噪声 | 不均匀 密度 | 随机 去块 | 多次 随机去块 | 平均降采样+ 噪声 | 多次随机去块+ 噪声 |
---|---|---|---|---|---|---|---|---|---|
CAPE | 21.32 | 22.03 | 24.75 | 28.34 | 36.33 | 43.01 | 51.07 | 30.04 | 60.82 |
SURREAL | 23.07 | 23.68 | 26.16 | 29.04 | 37.84 | 42.63 | 56.47 | 30.94 | 57.91 |
Kungfu | 22.20 | 23.11 | 27.19 | 28.51 | 30.84 | 37.57 | 47.33 | 35.20 | 54.96 |
图3 不同增强样本的重建误差热力图((a)随机平移;(b)平均降采样;(c)随机噪声;(d)不均匀密度;(e)随机去块;(f)多次随机去块;(g)平均降采样+随机噪声;(h)多次随机去块+随机噪声
Fig. 3 Heat map for the reconstruction error of different augmented Samples ((a) Random translation; (b) Average down-sampling; (c) Random noise; (d) Uneven density; (e) Random drop; (f) Multiple random drops; (g) Average down-sampling with random noise; (h) Multiple random drops with random noise)
帧序号 | 本文方法 | MAVE | 倒角距离 |
---|---|---|---|
0 | 0.43 | 0.00 | 0.03 |
10 | 1.75 | 24.03 | 0.64 |
20 | 3.35 | 47.55 | 2.01 |
30 | 5.18 | 62.34 | 15.92 |
40 | 6.97 | 105.70 | 49.42 |
表2 不同伪标签评估方法计算的误差结果
Table 2 The errors calculated by different evaluation methods for pseudo-label
帧序号 | 本文方法 | MAVE | 倒角距离 |
---|---|---|---|
0 | 0.43 | 0.00 | 0.03 |
10 | 1.75 | 24.03 | 0.64 |
20 | 3.35 | 47.55 | 2.01 |
30 | 5.18 | 62.34 | 15.92 |
40 | 6.97 | 105.70 | 49.42 |
图4 连续帧的人体点云与SMPL模型的对齐效果(上下2行为2个不同视角展示结果)
Fig. 4 Alignment results of the human body point cloud in successive frames with the SMPL model (two views are shown in the top and bottom rows)
不同阈值 | 合成数据集/mm | 伪标签 利用率/% | ||
---|---|---|---|---|
CAPE | SURREAL | DFAUST | ||
固定3.5 | 27.46 | 37.03 | 41.10 | 56 |
固定2.0 | 23.92 | 27.79 | 29.93 | 11 |
固定1.5 | 22.74 | 28.37 | 31.90 | 5 |
动态阈值 | 21.83 | 24.79 | 23.58 | 32 |
表3 不同阈值训练的模型在各数据集的重建误差以及伪标签利用率
Table 3 Reconstruction error and pseudo-label utilisation on different datasets for models trained with different thresholds
不同阈值 | 合成数据集/mm | 伪标签 利用率/% | ||
---|---|---|---|---|
CAPE | SURREAL | DFAUST | ||
固定3.5 | 27.46 | 37.03 | 41.10 | 56 |
固定2.0 | 23.92 | 27.79 | 29.93 | 11 |
固定1.5 | 22.74 | 28.37 | 31.90 | 5 |
动态阈值 | 21.83 | 24.79 | 23.58 | 32 |
方法 | CAPE | SURREAL | DFAUST |
---|---|---|---|
Point-based HMR[ | 44.18 | 49.98 | 47.01 |
文献[13] | 25.51 | 29.33 | 28.35 |
IPNet[ | 30.56 | 34.52 | 37.76 |
本文方法 | 22.83 | 24.79 | 23.57 |
表4 不同方法在各类合成数据集的重建误差/mm
Table 4 Reconstruction errors of different methods on various synthetic datasets/mm
方法 | CAPE | SURREAL | DFAUST |
---|---|---|---|
Point-based HMR[ | 44.18 | 49.98 | 47.01 |
文献[13] | 25.51 | 29.33 | 28.35 |
IPNet[ | 30.56 | 34.52 | 37.76 |
本文方法 | 22.83 | 24.79 | 23.57 |
图5 不同方法在合成数据集上的重建误差热力图((a)输入点云;(b) Point-based HMR[3];(c)文献[13];(d) IPNet[11];(e)本文方法
Fig. 5 Heat map of reconstruction errors of different methods on synthetic datasets ((a) Input point cloud; (b) Point-based HMR[3]; (c) References [13]; (d) IPNet[11]; (e) Ours)
方法 | Crouching | Kungfu | Girl |
---|---|---|---|
IPNet[ | 47.55 | 64.18 | 52.79 |
PTF[ | 40.82 | 53.66 | 44.07 |
文献[13] | 28.33 | 30.93 | 31.49 |
本文方法 | 26.30 | 28.62 | 28.14 |
表5 不同方法在各类真实数据集的重建误差/mm
Table 5 Reconstruction errors of different methods on various real datasets/mm
方法 | Crouching | Kungfu | Girl |
---|---|---|---|
IPNet[ | 47.55 | 64.18 | 52.79 |
PTF[ | 40.82 | 53.66 | 44.07 |
文献[13] | 28.33 | 30.93 | 31.49 |
本文方法 | 26.30 | 28.62 | 28.14 |
图6 不同方法在真实数据集上的重建和对齐效果((a)输入点云;(b) IPNet[11];(c) PTF[12];(d)文献[13];(e)本文方法
Fig. 6 Reconstruction and alignment results of different methods on the real dataset ((a) Input point cloud; (b) IPNet[11]; (c) PTF[12]; (d) References [13]; (e) Ours)
[1] | LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. Seminal Graphics Papers: Pushing the Boundaries, 2023, 2: 88. |
[2] | BOGO F, KANAZAWA A, LASSNER C, et al. Keep it SMPL: automatic estimation of 3D human pose and shape from a single image[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 561-578. |
[3] | KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7122-7131. |
[4] | KOLOTOUROS N, PAVLAKOS G, BLACK M J, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261. |
[5] | ZHANG H W, TIAN Y T, ZHOU X C, et al. PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11426-11436. |
[6] | RONG Y, SHIRATORI T, JOO H. FrankMocap: a monocular 3D whole-body pose estimation system via regression and integration[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 1749-1759. |
[7] | JIANG H Y, CAI J F, ZHENG J M. Skeleton-aware 3D human shape reconstruction from point clouds[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5430-5440. |
[8] | JANG H, KIM M, BAE J, et al. Dynamic mesh recovery from partial point cloud sequence[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 15028-15038. |
[9] | GUO K W, XU F, YU T, et al. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera[J]. ACM Transactions on Graphics, 2017, 36(3): 32. |
[10] | SAITO S, HUANG Z, NATSUME R, et al. PIFu: pixel-aligned implicit function for high-resolution clothed human digitization[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2304-2314. |
[11] | BHATNAGAR B L, SMINCHISESCU C, THEOBALT C, et al. Combining implicit function learning and parametric models for 3D human reconstruction[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 311-329. |
[12] | WANG S F, GEIGER A, TANG S Y. Locally aware piecewise transformation fields for 3D human mesh registration[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7635-7644. |
[13] | WANG K K, ZHENG H Y, ZHANG G F, et al. Parametric model estimation for 3D clothed humans from point clouds[C]// 2021 IEEE International Symposium on Mixed and Augmented Reality. New York: IEEE Press, 2021: 156-165. |
[14] | LIU G Z, RONG Y, SHENG L. VoteHMR: occlusion-aware voting network for robust 3D human mesh recovery from partial point clouds[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 955-964. |
[15] |
张小蒙, 方贤勇, 汪粼波, 等. 基于改进分段铰链变换的人体重建技术[J]. 图学学报, 2020, 41(1): 108-115.
DOI |
ZHANG X M, FANG X Y, WANG L B, et al. Human body reconstruction based on improved piecewise hinge transformation[J]. Journal of Graphics, 2020, 41(1): 108-115 (in Chinese). | |
[16] | 韩凯, 庞宗强, 王龙, 等. 基于深度扫描仪的高辨识度三维人体模型重建方法[J]. 图学学报, 2015, 36(4): 503-510. |
HAN K, PANG Z Q, WANG L, et al. High identification 3D human body model reconstruction method based on the depth scanner[J]. Journal of Graphics, 2015, 36(4): 503-510 (in Chinese). | |
[17] | CAI Z A, PAN L, WEI C, et al. PointHPS:cascaded 3D human pose and shape estimation from point clouds[EB/OL]. (2023-08-28) [2024-10-25]. http://arxiv.org/pdf/2308.14492.pdf. |
[18] | SOHN K, BERTHELOT D, LI C L, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 51. |
[19] | LEE D H. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks[EB/OL]. [2024-05-05]. https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks. |
[20] | BACHMAN P, ALSHARIF O, PRECUP D. Learning with pseudo-ensembles[C]// The 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 3365-3373. |
[21] | QI C R, SU H, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85. |
[22] | QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114. |
[23] | LORENSEN W E, CLINE H E. Marching cubes: a high resolution 3D surface construction algorithm[J]. Computer Graphics, 1987, 21(4): 347-353. |
[24] | CHIBANE J, ALLDIECK T, PONS-MOLL G. Implicit functions in feature space for 3D shape reconstruction and completion[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 6968-6979. |
[25] | KINGMA D P, REZENDE D J, MOHAMED S, et al. Semi-supervised learning with deep generative models[C]// The 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 3581-3589. |
[26] | ZHU X J, GHAHRAMANI Z B. Learning from labeled and unlabeled data with label propagation[R]. Technical Report CMU-CALD-02-107, 2002. |
[27] | GRANDVALET Y, BENGIO Y. Semi-supervised learning by entropy minimization[C]// The 17th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2004: 529-536. |
[28] | BERTHELOT D, CARLINI N, GOODFELLOW I, et al. MixMatch: a holistic approach to semi-supervised learning[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 454. |
[29] | BERTHELOT D, CARLINI N, CUBUK E D, et al. ReMixMatch: semi-supervised learning with distribution alignment and augmentation anchoring[EB/OL]. (2019-11-21) [2024-10-25]. http://arxiv.org/pdf/1911.09785.pdf. |
[30] | HUANG C, CAO Z J, WANG Y B, et al. MetaSets: meta-learning on point sets for generalizable representations[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8859-8868. |
[31] | MA Q L, YANG J L, RANJAN A, et al. Learning to dress 3D people in generative clothing[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 6468-6477. |
[32] | VAROL G, ROMERO J, MARTIN X, et al. Learning from synthetic humans[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4627-4635. |
[33] | BOGO F, ROMERO J, PONS-MOLL G, et al. Dynamic FAUST: Registering human bodies in motion[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5573-5582. |
[34] | WANG K K, ZHANG G F, ZHENG H Y, et al. Learning dense correspondences for non-rigid point clouds with two-stage regression[J]. IEEE Transactions on Image Processing, 2021, 30: 8468-8482. |
[35] | GUO K W, XU F, WANG Y G, et al. Robust non-rigid motion tracking and surface reconstruction using l0 regularization[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 3083-3091. |
[36] | LI J F, BIAN S Y, XU C, et al. HybrIK-X: hybrid analytical-neural inverse kinematics for whole-body mesh recovery[EB/OL]. (2023-04-12)[2024-10-25]. http://arxiv.org/pdf/2304.05690.pdf. |
[37] | LIN J, ZENG A L, WANG H Q, et al. One-stage 3D whole-body mesh recovery with component aware transformer[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 21159-21168. |
[1] | 程旭东, 史彩娟, 高炜翔, 王森, 段昌钰, 闫晓东. 面向域自适应目标检测的一致无偏教师模型[J]. 图学学报, 2025, 46(1): 114-125. |
[2] | 杨绪兵 1, 葛彦齐 1, 张福全 1, 范习健 1, 姚宏亮 2 . 基于矩阵模式的林火图像半监督学习算法[J]. 图学学报, 2019, 40(5): 835-842. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||