Journal of Graphics ›› 2023, Vol. 44 ›› Issue (6): 1218-1226.DOI: 10.11996/JG.j.2095-302X.2023061218
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
WANG Ji1(), WANG Sen1, JIANG Zhi-wen1, XIE Zhi-feng1,2(
), LI Meng-tian1,2
Received:
2023-06-29
Accepted:
2023-09-07
Online:
2023-12-31
Published:
2023-12-17
Contact:
XIE Zhi-feng (1982-), associate professor, Ph.D. His main research interests cover graphic image processing, computer vision, etc. E-mail:About author:
WANG Ji (1999-), master student. Her main research interests cover computer vison, computer graphics. E-mail:wang_ji357@shu.edu.cn
CLC Number:
WANG Ji, WANG Sen, JIANG Zhi-wen, XIE Zhi-feng, LI Meng-tian. Zero-shot text-driven avatar generation based on depth-conditioned diffusion model[J]. Journal of Graphics, 2023, 44(6): 1218-1226.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023061218
Fig. 2 Method comparison ((a) DreamFusion[8]; (b) SJC[24]; (c) AvatarCLIP[6]; (d) The final result of this paper; (e) The results of the first stage of this paper; (f) Geometric result of this article)
方法 | 一致性(↑) | 几何质量(↑) | 纹理质量(↑) |
---|---|---|---|
DreamFusion[ | 2.8 | 3.2 | 4.2 |
SJC[ | 2.9 | 3.6 | 4.4 |
AvatarCLIP[ | 4.2 | 4.1 | 3.2 |
本文 | 4.6 | 4.2 | 4.6 |
平均 | 3.6 | 3.8 | 4.1 |
Table 1 Different methods of user survey score in avatar generation task
方法 | 一致性(↑) | 几何质量(↑) | 纹理质量(↑) |
---|---|---|---|
DreamFusion[ | 2.8 | 3.2 | 4.2 |
SJC[ | 2.9 | 3.6 | 4.4 |
AvatarCLIP[ | 4.2 | 4.1 | 3.2 |
本文 | 4.6 | 4.2 | 4.6 |
平均 | 3.6 | 3.8 | 4.1 |
方法 | CLIP score |
---|---|
DreamFusion[ | 31.03 |
SJC[ | 31.59 |
AvatarCLIP[ | 32.18 |
本文 | 32.37 |
Table 2 Comparation of Different methods of running time
方法 | CLIP score |
---|---|
DreamFusion[ | 31.03 |
SJC[ | 31.59 |
AvatarCLIP[ | 32.18 |
本文 | 32.37 |
方法 | 生成时间(↓) |
---|---|
DreamFusion[ | 52 min |
SJC[ | 30 min |
AvatarCLIP[ | 1 h 40 min |
本文 | 1 h 15 min |
Table 3 Comparation of Different methods of running time
方法 | 生成时间(↓) |
---|---|
DreamFusion[ | 52 min |
SJC[ | 30 min |
AvatarCLIP[ | 1 h 40 min |
本文 | 1 h 15 min |
Fig. 6 Iterative texture refinement process ((a) The updating process of texture guide map under different iterations; (b) The rendering e results of the optimized model in the corresponding period)
[1] | REED S, AKATA Z, YAN X C, et al. Generative adversarial text to image synthesis[EB/OL]. [2023-01-23]. https://arxiv.org/pdf/1605.05396.pdf. |
[2] | CHEN X, JIANG T J, SONG J, et al. gDNA: towards generative detailed neural avatars[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 20427-2043. |
[3] | HONG F Z, CHEN Z X, LAN Y S, et al. EVA3D: compositional 3D human generation from 2D image collections[EB/OL]. [2023-01-23]. https://arxiv.org/abs/2210.04888. |
[4] | RAMESH A, PAVLOV M, GOH G, et al. Zero-shot text-to-image generation[EB/OL]. [2023-01-20]. https://arxiv.org/abs/2102.12092v1. |
[5] | SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[EB/OL]. [2023-01-23]. https://arxiv.org/abs/2205.11487. |
[6] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10684-10695. |
[7] | HONG F Z, ZHANG M Y, PAN L A, et al. AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars[J]. ACM Transactions on Graphics, 2022, 41(4): 1-19. |
[8] | POOLE B, JAIN A, ARRON J B, et al. DreamFusion: text-to-3D using 2D diffusion[EB/OL]. [2023-01-23]. https://www.aminer.cn/pub/63365e7f90e50fcafd1a3612/. |
[9] | WANG P, LIU L J, LIU Y, et al. NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction[EB/OL]. [2023-01-23]. https://arxiv.org/abs/2106.10689. |
[10] | LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 34(6): 248: 1-248: 16. |
[11] | 蔡兴泉, 霍宇晴, 李发建, 等. 面向太极拳学习的人体姿态估计及相似度计算[J]. 图学学报, 2022, 43(4): 695-706. |
CAI X Q, HUO Y Q, LI F J, et al. Human pose estimation and similarity calculation for Tai Chi learning[J]. Journal of Graphics, 2022, 43(4): 695-706 (in Chinese).
DOI |
|
[12] |
王玉萍, 曾毅, 李胜辉, 等. 一种基于Transformer的三维人体姿态估计方法[J]. 图学学报, 2023, 44(1): 139-145.
DOI |
WANG Y P, ZENG Y, LI S H, et al. A Transformer-based 3D human pose estimation method[J]. Journal of Graphics, 2023, 44(1): 139-145 (in Chinese). | |
[13] |
张小蒙, 方贤勇, 汪粼波, 等. 基于改进分段铰链变换的人体重建技术[J]. 图学学报, 2020, 41(1): 108-115.
DOI |
ZHANG X M, FANG X Y, WANG L B, et al. Human body reconstruction based on improved piecewise hinge transformation[J]. Journal of Graphics, 2020, 41(1): 108-115 (in Chinese). | |
[14] | BHATNAGAR B L, SMINCHISESCU C, THEOBALT C, et al. Combining implicit function learning and parametric models for 3D human reconstruction[C]// European Conference on Computer Vision. Cham: Springer, 2020: 311-329. |
[15] | MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[C]// European Conference on Computer Vision. Cham: Springer, 2020: 405-421. |
[16] | GROPP A, YARIV L, HAIM N, et al. Implicit geometric regularization for learning shapes[C]// The 37th International Conference on Machine Learning. New York: ACM, 2020: 3789-3799. |
[17] | GRIGOREV A, ISKAKOV K, IANINA A, et al. StylePeople: a generative model of fullbody human avatars[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 5151-5160. |
[18] | RAMEEN A, YIPENG Q, PETER W. Image2stylegan: How to embed images into the stylegan latent space?[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4432-4441. |
[19] | CORONA E, PUMAROLA A, ALENYA G, et al. SMPLicit: topology-aware generative model for clothed people[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11875-11885. |
[20] | JAIN A, MILDENHALL B, BARRON J T, et al. Zero-shot text-guided object generation with dream fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 857-866. |
[21] | KHALID N M, XIE T H, BELILOVSKY E, et al. CLIP-Mesh:generating textured meshes from text using pretrained image-text models[C]// SA'22: SIGGRAPH Asia 2022 Conference Papers. New York: ACM, 2022: 1-8. |
[22] | LEE H H, CHANG A X. Understanding pure clip guidance for voxel grid NeRF models[EB/OL]. (2022-09-30) [2023-02-22]. https://arxiv.org/abs/2209.15172. |
[23] | LIN C H, GAO J, TANG L, et al. Magic3D: high resolution text-to-3D content creation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 300-309. |
[24] | WANG H C, DU X D, LI J H, et al. Score Jacobian chaining: lifting pretrained 2D diffusion models for 3D generation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12619-12629. |
[25] | Stability AI. Stable diffusion[EB/OL]. (2022-08-22) [2023- 02-22]. https://stability.ai/blog/stable-diffusion-public-release. |
[26] | WANG Y Q, SKOROKHODOV I, WONKA P. HF-NeuS: improved surface reconstruction using high-frequency details[EB/OL]. [2023-01-23]. https://arxiv.org/abs/2206.07850. |
[27] | KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. [2023-02-12]. https://arxiv.org/pdf/1412.6980.pdf. |
[1] |
WANG Zhiru, CHANG Yuan, LU Peng, PAN Chengwei.
A review on neural radiance fields acceleration
[J]. Journal of Graphics, 2024, 45(1): 1-13.
|
[2] |
WANG Xinyu, LIU Hui, ZHU Jicheng, SHENG Yurui, ZHANG Caiming.
Deep multimodal medical image fusion network based on high-low frequency feature decomposition
[J]. Journal of Graphics, 2024, 45(1): 65-77.
|
[3] |
LI Jiaqi, WANG Hui, GUO Yu.
Classification and segmentation network based on Transformer for triangular mesh
[J]. Journal of Graphics, 2024, 45(1): 78-89.
|
[4] |
HAN Yazhen, YIN Mengxiao, MA Weizhao, YANG Shigeng, HU Jinfei, ZHU Congyang.
DGOA: point cloud upsampling based on dynamic graph and offset attention
[J]. Journal of Graphics, 2024, 45(1): 219-229.
|
[5] |
WANG Jiang’an, HUANG Le, PANG Dawei, QIN Linzhen, LIANG Wenqian.
Dense point cloud reconstruction network based on adaptive aggregation recurrent recursion
[J]. Journal of Graphics, 2024, 45(1): 230-239.
|
[6] | ZHOU Rui-chuang, TIAN Jin, YAN Feng-ting, ZHU Tian-xiao, ZHANG Yu-jin. Point cloud classification model incorporating external attention and graph convolution [J]. Journal of Graphics, 2023, 44(6): 1162-1172. |
[7] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. |
[8] | DANG Hong-she, XU Huai-biao, ZHANG Xuan-de. Deep learning stereo matching algorithm fusing structural information [J]. Journal of Graphics, 2023, 44(5): 899-906. |
[9] | ZHAI Yong-jie, GUO Cong-bin, WANG Qian-ming, ZHAO Kuan, BAI Yun-shan, ZHANG Ji. Multi-fitting detection method for transmission lines based on implicit spatial knowledge fusion [J]. Journal of Graphics, 2023, 44(5): 918-927. |
[10] | YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan. A local optimization generation model for image inpainting [J]. Journal of Graphics, 2023, 44(5): 955-965. |
[11] | BI Chun-yan, LIU Yue. A survey of video human action recognition based on deep learning [J]. Journal of Graphics, 2023, 44(4): 625-639. |
[12] | CAO Yi-qin, ZHOU Yi-wei, XU Lu. A real-time metallic surface defect detection algorithm based on E-YOLOX [J]. Journal of Graphics, 2023, 44(4): 677-690. |
[13] | SHAO Jun-qi, QIAN Wen-hua, XU Qi-hao. Landscape image generation based on conditional residual generative adversarial network [J]. Journal of Graphics, 2023, 44(4): 710-717. |
[14] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. |
[15] | GUO Yin-hong, WANG Li-chun, LI Shuang. Image feature matching based on repeatability and specificity constraints [J]. Journal of Graphics, 2023, 44(4): 739-746. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||