基于半监督学习的单视角点云三维人体姿态与形状估计

doi:10.11996/JG.j.2095-302X.2025020393

图学学报 ›› 2025, Vol. 46 ›› Issue (2): 393-401.DOI: 10.11996/JG.j.2095-302X.2025020393

• 计算机图形学与虚拟现实 • 上一篇下一篇

基于半监督学习的单视角点云三维人体姿态与形状估计

方程浩(), 王康侃()

南京理工大学高维信息智能感知与系统教育部重点实验室，江苏南京 210094

收稿日期:2024-07-05 接受日期:2024-11-27 出版日期:2025-04-30 发布日期:2025-04-24
通讯作者:王康侃(1988-)，男，副教授，博士。主要研究方向为计算机视觉、虚拟现实、三维重建等。E-mail：wangkangkan@njust.edu.cn
第一作者:方程浩(1999-)，男，硕士研究生。主要研究方向为计算机图形学、计算机视觉、三维重建。E-mail：121106022661@njust.edu.cn
基金资助:
国家自然科学基金(62472224);中央高校基础研究基金(NJ2023032);浙江大学计算机辅助设计与图形系统全国重点实验室开放课题(A2311);南京大学计算机软件新技术全国重点实验室开放课题(KFKT2024B37)

3D human pose and shape estimation from single-view point clouds with semi-supervised learning

FANG Chenghao(), WANG Kangkan()

Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing Jiangsu 210094, China

Received:2024-07-05 Accepted:2024-11-27 Published:2025-04-30 Online:2025-04-24
First author：FANG Chenghao (1999-), master student. His main research interests cover computer graphics, computer vision and 3D reconstruction. E-mail：121106022661@njust.edu.cn
Supported by:
The Natural Science Foundation of China(62472224);The Fundamental Research Funds for the Central Universities(NJ2023032);The Open Project Program of the State Key Laboratory of CAD&CG of Zhejiang University(A2311);The Open Project Program of the State Key Laboratory of Novel Software Technology of Nanjing University(KFKT2024B37)

摘要/Abstract

摘要：

在有限标签样本的条件下，单视角点云的三维人体姿态和形状估计一直存在模型估计精度低、泛化能力弱等问题。现有的方法通常采用微调方法优化模型，但对新样本的微调步骤大大增加了运行复杂度，本质上没有提高模型的泛化能力。为解决以上问题，提出了一种基于半监督学习的三维人体姿态与形状估计方法，在有限的标签数据条件下，利用大量无标签人体点云数据提高模型估计精度和泛化能力。具体地，首先对无标签数据进行弱增强和强增强，同时估计2种增强样本的三维人体参数模型。然后对弱增强样本的预测结果进行伪标签准确性判断，并基于一致性正则化思想约束强增强样本的预测结果，以迭代方式逐步优化伪标签质量和增加用于训练的伪标签数量，进而提升模型的估计精度。该算法在多种公开数据集上做了充分的定量和定性实验，实验结果证明该算法在有限标签样本的条件下提高了三维人体姿态和形状的估计精度，并增强了模型的泛化性能。

关键词: 三维人体姿态与形状估计, 单视角点云, 半监督学习, 伪标签, 点云数据增强

Abstract:

Under the condition of limited labeled samples, estimating 3D human pose and shape from single-view point clouds has consistently encountered issues such as low model estimation accuracy and weak generalization capability. Existing methods typically use a fine-tuning step to optimize the models for limited labeled samples, but this fine-tuning process significantly increases computational complexity and without fundamentally enhancing model generalization. To address these issues, a semi-supervised learning-based method was proposed for 3D human pose and shape estimation. Under conditions of limited labeled data, the proposed method utilized a large amount of unlabeled human point clouds to improve model accuracy and generalization capability. Specifically, weak and strong augmentations were applied to the unlabeled data, and 3D human parameter models were estimated for both types of augmented samples. Then, the accuracy of pseudo-labels for weakly-augmented samples was evaluated, and the predictions of strongly augmented samples were constrained based on consistency regularization. The procedure above was applied iteratively to gradually refine the quality of pseudo-labels and increase the number of pseudo-labels for training, thereby enhancing the model’s estimation accuracy. Extensive quantitative and qualitative experiments on various public datasets demonstrate that the proposed method enhanced the accuracy of 3D human pose and shape estimation under conditions of limited labeled samples and enhanced model generalization performance.

Key words: 3D human pose and shape estimation, single-view point clouds, semi-supervised learning, pseudo-label, data augmentation of point cloud

中图分类号:

TP391

方程浩, 王康侃. 基于半监督学习的单视角点云三维人体姿态与形状估计[J]. 图学学报, 2025, 46(2): 393-401.

FANG Chenghao, WANG Kangkan. 3D human pose and shape estimation from single-view point clouds with semi-supervised learning[J]. Journal of Graphics, 2025, 46(2): 393-401.

图/表 11

图1 本文基于半监督学习的单视角点云三维人体姿态与形状估计算法框架

Fig. 1 The framework for 3D human pose and shape estimation from single-view point clouds with semi-supervised learning

图2 不同人体点云增强方法示例

Fig. 2 Examples of various human point cloud augmentation methods

表1 不同点云增强方法的人体模型估计误差/mm

Table 1 Human model estimation errors for different point cloud augmentation methods/mm

方法	无增强	随机平移	平均降采样	随机噪声	不均匀密度	随机去块	多次随机去块	平均降采样+ 噪声	多次随机去块+ 噪声
CAPE	21.32	22.03	24.75	28.34	36.33	43.01	51.07	30.04	60.82
SURREAL	23.07	23.68	26.16	29.04	37.84	42.63	56.47	30.94	57.91
Kungfu	22.20	23.11	27.19	28.51	30.84	37.57	47.33	35.20	54.96

图3 不同增强样本的重建误差热力图((a)随机平移；(b)平均降采样；(c)随机噪声；(d)不均匀密度；(e)随机去块；(f)多次随机去块；(g)平均降采样+随机噪声；(h)多次随机去块+随机噪声

Fig. 3 Heat map for the reconstruction error of different augmented Samples ((a) Random translation; (b) Average down-sampling; (c) Random noise; (d) Uneven density; (e) Random drop; (f) Multiple random drops; (g) Average down-sampling with random noise; (h) Multiple random drops with random noise)

表2 不同伪标签评估方法计算的误差结果

Table 2 The errors calculated by different evaluation methods for pseudo-label

帧序号	本文方法	MAVE	倒角距离
0	0.43	0.00	0.03
10	1.75	24.03	0.64
20	3.35	47.55	2.01
30	5.18	62.34	15.92
40	6.97	105.70	49.42

图4 连续帧的人体点云与SMPL模型的对齐效果(上下2行为2个不同视角展示结果)

Fig. 4 Alignment results of the human body point cloud in successive frames with the SMPL model (two views are shown in the top and bottom rows)

表3 不同阈值训练的模型在各数据集的重建误差以及伪标签利用率

Table 3 Reconstruction error and pseudo-label utilisation on different datasets for models trained with different thresholds

不同阈值	合成数据集/mm			伪标签利用率/%
不同阈值	CAPE	SURREAL	DFAUST	伪标签利用率/%
固定3.5	27.46	37.03	41.10	56
固定2.0	23.92	27.79	29.93	11
固定1.5	22.74	28.37	31.90	5
动态阈值	21.83	24.79	23.58	32

表4 不同方法在各类合成数据集的重建误差/mm

Table 4 Reconstruction errors of different methods on various synthetic datasets/mm

方法	CAPE	SURREAL	DFAUST
Point-based HMR^[3]	44.18	49.98	47.01
文献[13]	25.51	29.33	28.35
IPNet^[11]	30.56	34.52	37.76
本文方法	22.83	24.79	23.57

图5 不同方法在合成数据集上的重建误差热力图((a)输入点云；(b) Point-based HMR[3]；(c)文献[13]；(d) IPNet[11]；(e)本文方法

Fig. 5 Heat map of reconstruction errors of different methods on synthetic datasets ((a) Input point cloud; (b) Point-based HMR[3]; (c) References [13]; (d) IPNet[11]; (e) Ours)

表5 不同方法在各类真实数据集的重建误差/mm

Table 5 Reconstruction errors of different methods on various real datasets/mm

方法	Crouching	Kungfu	Girl
IPNet^[11]	47.55	64.18	52.79
PTF^[12]	40.82	53.66	44.07
文献[13]	28.33	30.93	31.49
本文方法	26.30	28.62	28.14

图6 不同方法在真实数据集上的重建和对齐效果((a)输入点云；(b) IPNet[11]；(c) PTF[12]；(d)文献[13]；(e)本文方法

Fig. 6 Reconstruction and alignment results of different methods on the real dataset ((a) Input point cloud; (b) IPNet[11]; (c) PTF[12]; (d) References [13]; (e) Ours)

参考文献 37

[1]	LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. Seminal Graphics Papers: Pushing the Boundaries, 2023, 2: 88.
[2]	BOGO F, KANAZAWA A, LASSNER C, et al. Keep it SMPL: automatic estimation of 3D human pose and shape from a single image[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 561-578.
[3]	KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7122-7131.
[4]	KOLOTOUROS N, PAVLAKOS G, BLACK M J, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261.
[5]	ZHANG H W, TIAN Y T, ZHOU X C, et al. PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11426-11436.
[6]	RONG Y, SHIRATORI T, JOO H. FrankMocap: a monocular 3D whole-body pose estimation system via regression and integration[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 1749-1759.
[7]	JIANG H Y, CAI J F, ZHENG J M. Skeleton-aware 3D human shape reconstruction from point clouds[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5430-5440.
[8]	JANG H, KIM M, BAE J, et al. Dynamic mesh recovery from partial point cloud sequence[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 15028-15038.
[9]	GUO K W, XU F, YU T, et al. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera[J]. ACM Transactions on Graphics, 2017, 36(3): 32.
[10]	SAITO S, HUANG Z, NATSUME R, et al. PIFu: pixel-aligned implicit function for high-resolution clothed human digitization[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2304-2314.
[11]	BHATNAGAR B L, SMINCHISESCU C, THEOBALT C, et al. Combining implicit function learning and parametric models for 3D human reconstruction[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 311-329.
[12]	WANG S F, GEIGER A, TANG S Y. Locally aware piecewise transformation fields for 3D human mesh registration[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7635-7644.
[13]	WANG K K, ZHENG H Y, ZHANG G F, et al. Parametric model estimation for 3D clothed humans from point clouds[C]// 2021 IEEE International Symposium on Mixed and Augmented Reality. New York: IEEE Press, 2021: 156-165.
[14]	LIU G Z, RONG Y, SHENG L. VoteHMR: occlusion-aware voting network for robust 3D human mesh recovery from partial point clouds[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 955-964.
[15]	张小蒙, 方贤勇, 汪粼波, 等. 基于改进分段铰链变换的人体重建技术[J]. 图学学报, 2020, 41(1): 108-115. DOI
	ZHANG X M, FANG X Y, WANG L B, et al. Human body reconstruction based on improved piecewise hinge transformation[J]. Journal of Graphics, 2020, 41(1): 108-115 (in Chinese).
[16]	韩凯, 庞宗强, 王龙, 等. 基于深度扫描仪的高辨识度三维人体模型重建方法[J]. 图学学报, 2015, 36(4): 503-510.
	HAN K, PANG Z Q, WANG L, et al. High identification 3D human body model reconstruction method based on the depth scanner[J]. Journal of Graphics, 2015, 36(4): 503-510 (in Chinese).
[17]	CAI Z A, PAN L, WEI C, et al. PointHPS:cascaded 3D human pose and shape estimation from point clouds[EB/OL]. (2023-08-28) [2024-10-25]. http://arxiv.org/pdf/2308.14492.pdf.
[18]	SOHN K, BERTHELOT D, LI C L, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 51.
[19]	LEE D H. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks[EB/OL]. [2024-05-05]. https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks.
[20]	BACHMAN P, ALSHARIF O, PRECUP D. Learning with pseudo-ensembles[C]// The 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 3365-3373.
[21]	QI C R, SU H, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[22]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114.
[23]	LORENSEN W E, CLINE H E. Marching cubes: a high resolution 3D surface construction algorithm[J]. Computer Graphics, 1987, 21(4): 347-353.
[24]	CHIBANE J, ALLDIECK T, PONS-MOLL G. Implicit functions in feature space for 3D shape reconstruction and completion[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 6968-6979.
[25]	KINGMA D P, REZENDE D J, MOHAMED S, et al. Semi-supervised learning with deep generative models[C]// The 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 3581-3589.
[26]	ZHU X J, GHAHRAMANI Z B. Learning from labeled and unlabeled data with label propagation[R]. Technical Report CMU-CALD-02-107, 2002.
[27]	GRANDVALET Y, BENGIO Y. Semi-supervised learning by entropy minimization[C]// The 17th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2004: 529-536.
[28]	BERTHELOT D, CARLINI N, GOODFELLOW I, et al. MixMatch: a holistic approach to semi-supervised learning[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 454.
[29]	BERTHELOT D, CARLINI N, CUBUK E D, et al. ReMixMatch: semi-supervised learning with distribution alignment and augmentation anchoring[EB/OL]. (2019-11-21) [2024-10-25]. http://arxiv.org/pdf/1911.09785.pdf.
[30]	HUANG C, CAO Z J, WANG Y B, et al. MetaSets: meta-learning on point sets for generalizable representations[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8859-8868.
[31]	MA Q L, YANG J L, RANJAN A, et al. Learning to dress 3D people in generative clothing[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 6468-6477.
[32]	VAROL G, ROMERO J, MARTIN X, et al. Learning from synthetic humans[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4627-4635.
[33]	BOGO F, ROMERO J, PONS-MOLL G, et al. Dynamic FAUST: Registering human bodies in motion[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5573-5582.
[34]	WANG K K, ZHANG G F, ZHENG H Y, et al. Learning dense correspondences for non-rigid point clouds with two-stage regression[J]. IEEE Transactions on Image Processing, 2021, 30: 8468-8482.
[35]	GUO K W, XU F, WANG Y G, et al. Robust non-rigid motion tracking and surface reconstruction using l0 regularization[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 3083-3091.
[36]	LI J F, BIAN S Y, XU C, et al. HybrIK-X: hybrid analytical-neural inverse kinematics for whole-body mesh recovery[EB/OL]. (2023-04-12)[2024-10-25]. http://arxiv.org/pdf/2304.05690.pdf.
[37]	LIN J, ZENG A L, WANG H Q, et al. One-stage 3D whole-body mesh recovery with component aware transformer[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 21159-21168.

基于半监督学习的单视角点云三维人体姿态与形状估计

3D human pose and shape estimation from single-view point clouds with semi-supervised learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 37

相关文章 2

编辑推荐

Metrics

本文评价

[1]	程旭东, 史彩娟, 高炜翔, 王森, 段昌钰, 闫晓东. 面向域自适应目标检测的一致无偏教师模型[J]. 图学学报, 2025, 46(1): 114-125.
[2]	杨绪兵 1，葛彦齐 1，张福全 1，范习健 1，姚宏亮 2 . 基于矩阵模式的林火图像半监督学习算法[J]. 图学学报, 2019, 40(5): 835-842.