图学学报 ›› 2024, Vol. 45 ›› Issue (4): 814-826.DOI: 10.11996/JG.j.2095-302X.2024040814
收稿日期:
2024-03-14
接受日期:
2024-06-24
出版日期:
2024-08-31
发布日期:
2024-09-03
通讯作者:
刘漫丹(1973-),女,教授,博士。主要研究方向为大数据智能分析与处理、智能计算及应用等。E-mail:liumandan@ecust.edu.cn第一作者:
朱宝旭(2000-),男,硕士研究生。主要研究方向为计算机视觉、虚拟人生成。E-mail:Y30221036@mail.ecust.edu.cn
基金资助:
ZHU Baoxu(), LIU Mandan(
), ZHANG Wenting, XIE Lizhi
Received:
2024-03-14
Accepted:
2024-06-24
Published:
2024-08-31
Online:
2024-09-03
Contact:
LIU Mandan (1973-), professor, Ph.D. Her main research interests cover big data intelligent analysis and processing, intelligent computing and its applications, etc. E-mail:liumandan@ecust.edu.cnFirst author:
ZHU Baoxu (2000-), master student. His main research interests cover computer vision, virtual human generation. E-mail:Y30221036@mail.ecust.edu.cn
Supported by:
摘要:
针对人脸纹理生成相关研究大部分聚焦于低分辨率纹理生成的问题,将图像翻译运用到高分辨率纹理图的生成中,提出一种以图像翻译网络为核心的1024×1024纹理图的全流程生成方法。在快速高效生成的同时,有效缓解了生成人脸UV纹理分辨率低的问题。在图像翻译网络中,由卷积神经网络作为骨干网络,嵌入统计纹理学习网络(STLNet),并采用软自适应层实例规范化(Soft-AdaLIN)的归一化方法共同构成生成器,同时采用多尺度判别来指导高分辨率纹理图像的生成,最后进行颜色转换与泊松融合完成纹理校正。在FFHQ数据集随机抽取图像并进行人脸归一化后进行测试,通过一系列评价指标进行定量评估、同近年相关研究方法进行定性及定量比较,验证了该全流程生成方法在生成1024×1024人脸UV纹理图像上的优势。
中图分类号:
朱宝旭, 刘漫丹, 张雯婷, 谢立志. 高分辨率人脸纹理图全流程生成方法[J]. 图学学报, 2024, 45(4): 814-826.
ZHU Baoxu, LIU Mandan, ZHANG Wenting, XIE Lizhi. Full process generation method of high-resolution face texture map[J]. Journal of Graphics, 2024, 45(4): 814-826.
图1 生成人脸UV纹理图全流程示意图((a)输入人脸图像;(b)模版纹理图;(c)关键点映射后图像;(d)翻译网络生成图像;(e)可见性遮罩图;(f)可见性遮罩提取后图像;(g)颜色变换后模版纹理图;(h)最终纹理图像)
Fig. 1 Schematic diagram of the entire process for generating facial UV texture map ((a) Input facial image; (b) Template texture map; (c) Keypoint-mapped image; (d) Translation network generate image; (e) Visibility mask map; (f) Image after extracting visibility mask; (g) Template texture image after color transformation; (h) Final texture map)
图2 原图关键点检测及映射示意图((a)输入人脸图像;(b)关键点检测后图像;(c)模板纹理图;(d)关键点检测后模板纹理图;(e)映射过程中的图像;(f)关键点映射后图像)
Fig. 2 Schematic diagram of key point detection and mapping of the original image ((a) Input facial image; (b) Image after keypoint detection; (c) Template texture map; (d) Template texture map after keypoint detection; (e) Image during the mapping process; (f) Keypoint-mapped image)
图3 目标图关键点检测及映射示意图((a)输入人脸图像;(b)关键点检测后图像;(c)模板纹理图;(d)关键点检测后模板纹理图;(e)映射过程中的图像;(f)关键点映射后图像)
Fig. 3 Schematic diagram of key point detection and mapping of the target map ((a) Input facial image; (b) Image after keypoint detection; (c) Template texture map; (d) Template texture map after keypoint detection; (e) Image during the mapping process; (f) Keypoint-mapped image)
图8 整体网络结构((a)输入图像;(b)翻译网络生成图像(1024×1024);(b1)生成图像下采样所得图像(512×512);(b2)生成图像下采样所得图像(256×256);(c)翻译网络目标图像(1024×1024);(c1)目标图像下采样所得图像(512×512);(c2)目标图像下采样所得图像(256×256))
Fig. 8 Overall network structure ((a) Input image; (b) Translate network generate image (1024×1024); (b1) The image obtained by downsampling the generate image (512×512); (b2) The image obtained by downsampling the generate image (256×256); (c) Translate network target image (1024×1024); (c1) The image obtained by downsampling the target image (512×512); (c2) The image obtained by downsampling the target image (256×256))
图9 UV纹理生成全流程可视化图像((a)输入人脸图像;(b)关键点映射后图像;(c)翻译网络生成图像;(d)最终纹理图像)
Fig. 9 UV texture generates a visualization image of the whole process ((a) Input facial images; (b) Keypoint-mapped images; (c) Translation network generate images; (d) Final texture map)
图10 不同人脸UV纹理图三维可视化((a)输入人脸图像;(b)人脸正面三维可视化;(c)人脸左侧三维可视化;(d)人脸右侧三维可视化)
Fig. 10 3D visualization of UV texture map of different faces ((a) Input facial images; (b) Frontal face 3D visualization; (c) 3D visualization of the left side of the face; (d) 3D visualization of the right side of the face)
图11 有无STLNet及Soft-AdaLIN示意图((a1~a3)无Soft-AdaLIN and STLNet;(b1~b3)嵌入Soft-AdaLIN;(c1~c3)嵌入STLNet;(d1~d3)嵌入Soft-AdaLIN和STLNet)
Fig. 11 (Have/No) Schematic diagram of STLNet and Soft-AdaLIN ((a1~a3) No Soft-AdaLIN and STLNet; (b1~b3) Embed Soft-AdaLIN; (c1~c3) Embed STLNet; (d1~d3) Embed Soft-AdaLIN and STLNet)
图12 成对偏转人脸--UV纹理示意图((a)输入人脸(头部左偏转);(b)纹理图像(头部左偏转);(c)输入人脸(头部右偏转);(d)纹理图像(头部右偏转))
Fig. 12 Pairs of deflected faces -- UV texture diagram ((a) Input face (head left deflection); (b) Texture image (head left deflection); (c) Input face (head right deflection); (d) Texture image (head right deflection))
网络 | 网络(a) (无Soft-AdaLIN及STLNet) | 网络(b) (嵌入Soft-AdaLIN) | 网络(c) (嵌入STLNet) | 网络(d) (嵌入Soft-AdaLIN及STLNet) |
---|---|---|---|---|
PSNR | 34.460 3 | 34.610 2 | 34.231 4 | 34.687 5 |
SSIM(source) | 0.909 7 | 0.911 9 | 0.911 4 | 0.912 7 |
SSIM(target) | 0.973 6 | 0.974 9 | 0.974 9 | 0.975 7 |
LPIPS | 0.079 3 | 0.082 2 | 0.080 0 | 0.075 7 |
表1 有无STLNet及Soft-AdaLIN网络的定量比较
Table 1 Quantitative comparison of networks with and without STLNet and Soft-AdaLIN
网络 | 网络(a) (无Soft-AdaLIN及STLNet) | 网络(b) (嵌入Soft-AdaLIN) | 网络(c) (嵌入STLNet) | 网络(d) (嵌入Soft-AdaLIN及STLNet) |
---|---|---|---|---|
PSNR | 34.460 3 | 34.610 2 | 34.231 4 | 34.687 5 |
SSIM(source) | 0.909 7 | 0.911 9 | 0.911 4 | 0.912 7 |
SSIM(target) | 0.973 6 | 0.974 9 | 0.974 9 | 0.975 7 |
LPIPS | 0.079 3 | 0.082 2 | 0.080 0 | 0.075 7 |
方法 | OsTec | FFHQ-UV | 本文 |
---|---|---|---|
PSNR | 12.577 0 | 12.274 8 | 12.586 7 |
SSIM | 0.730 5 | 0.693 4 | 0.748 6 |
LPIPS | 0.743 6 | 0.705 8 | 0.692 8 |
表2 各方法的平均PSNR、SSIM、LPIPS定量比较
Table 2 Quantitative comparison of average PSNR, SSIM and LPIPS of each method
方法 | OsTec | FFHQ-UV | 本文 |
---|---|---|---|
PSNR | 12.577 0 | 12.274 8 | 12.586 7 |
SSIM | 0.730 5 | 0.693 4 | 0.748 6 |
LPIPS | 0.743 6 | 0.705 8 | 0.692 8 |
图13 输入人脸至最终人脸UV纹理各阶段表示((a)输入人脸;(b)关键点映射后图像;(c)翻译网络生成纹理图;(d)最终人脸UV纹理图)
Fig. 13 Diagram of each stage from inputting a face to generating the final face UV texture ((a) Input facial image; (b) Keypoint-mapped image; (c) Translation networks generate texture map; (d) Final face UV texture map)
图14 与Normalized Avatar Synthesis及FFHQ-UV的可视化比较((a)输入人脸;(b)文献[28];(c)文献[14];(d)本文;(e)对应纹理图像)
Fig. 14 Visual comparison with Normalized Avatar Synthesis and FFHQ-UV ((a) Input facial images; (b) Reference [28]; (c) Reference [14]; (d) Ours; (e) Corresponding texture image)
图15 与OsTec的可视化比较((a)输入人脸;(b) OsTec方法;(c)本文方法)
Fig. 15 Visual comparison with OsTec ((a) Input facial images; (b) OsTec methods generate textures; (c) The proposed method generates textures)
[1] | 岳芙蓉. 基于深度学习的单视图三维人脸重建模型研究[D]. 成都: 电子科技大学, 2022. |
YUE F R. Research on single-view 3D face reconstruction model based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2022 (in Chinese). | |
[2] | 彭政夫. 基于单张图片的真实感三维人脸纹理重建研究[D]. 广州: 华南理工大学, 2022. |
PENG Z F. Research on realistic 3D face texture reconstruction based on single image[D]. Guangzhou: South China University of Technology, 2022 (in Chinese). | |
[3] | 韩漾宏. 基于单张图片的三维人脸重建[D]. 重庆: 西南大学, 2022. |
HAN Y H. Single image based 3D face reconstruction[D]. Chongqing: Southwest University, 2022 (in Chinese). | |
[4] | BLANZ V, VETTER T. A morphable model for the synthesis of 3D faces[M]// Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023: 157-164. |
[5] | HUBER P, HU G S, TENA R, et al. A multiresolution 3D morphable face model and fitting framework[C]// The 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Setúbal: SCITEPRESS - Science and Technology Publications, 2016, 5: 79-86. |
[6] | LI T Y, BOLKART T, BLACK M J, et al. Learning a model of facial shape and expression from 4D scans[J]. ACM Transactions on Graphics, 2017, 36(6): 1-17. |
[7] | DENG J K, CHENG S Y, XUE N N, et al. UV-GAN: adversarial facial UV map completion for pose-invariant face recognition[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7093-7102. |
[8] | YIN X N, HUANG D, FU Z H, et al. Weakly-supervised photo-realistic texture generation for 3D face reconstruction[C]// 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition. New York: IEEE Press, 2023: 1-8. |
[9] | YANG M X, GUO J W, CHENG Z L, et al. Self-supervised re-renderable facial albedo reconstruction from single image[EB/OL]. [2024-01-18]. http://arxiv.org/abs/2111.08282. |
[10] | 刘洋, 樊养余, 郭哲, 等. 单幅人脸图像的全景纹理图生成方法[J]. 中国图象图形学报, 2022, 27(2): 602-613. |
LIU Y, FAN Y Y, GUO Z, et al. Single face image-based panoramic texture map generation[J]. Journal of Image and Graphics, 2022, 27(2): 602-613 (in Chinese). | |
[11] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. |
[12] | GECER B, PLOUMPIS S, KOTSIA I, et al. GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1155-1164. |
[13] | GECER B, DENG J K, ZAFEIRIOU S. OSTeC: one-shot texture completion[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7624-7634. |
[14] | BAI H R, KANG D, ZHANG H X, et al. FFHQ-UV: normalized facial UV-texture dataset for 3D face reconstruction[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 362-371. |
[15] | KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 8110-8119. |
[16] | PÉREZ P, GANGNET M, BLAKE A. Poisson image editing[M]// Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023: 577-582. |
[17] | KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4401-4410. |
[18] | KARTYNNIK Y, ABLAVATSKI A, GRISHCHENKO I, et al. Real-time facial surface geometry from monocular video on mobile GPUs[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1907.06724. |
[19] | ZHU L Y, JI D Y, ZHU S P, et al. Learning statistical texture for semantic segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12537-12546. |
[20] | HARALICK R M, SHANMUGAM K, DINSTEIN I. Textural features for image classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1973, SMC-3(6): 610-621. |
[21] | KIM J, KIM M, KANG H, et al. U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1907.10830. |
[22] | WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8798-8807. |
[23] | ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5967-5976. |
[24] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1409.1556. |
[25] | HORÉ A, ZIOU D. Image quality metrics:PSNR vs. SSIM[C]// 2010 20th International Conference on Pattern Recognition. New York: IEEE Press, 2010: 2366-2369. |
[26] | WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2004, 13(4): 600-612. |
[27] | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595. |
[28] | LUO H W, NAGANO K, KUNG H W, et al. Normalized avatar synthesis using StyleGAN and perceptual refinement[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11657-11667. |
[1] | 刘宗明 , 洪唯 , 龙睿 , 祝越 , 张小宇 . 基于自注意机制的乳源瑶绣自动生成与应用研究[J]. 图学学报, 2024, 45(5): 1096-1105. |
[2] | 古天骏, 熊苏雅, 林晓. 基于SASGAN的戏剧脸谱多样化生成[J]. 图学学报, 2024, 45(1): 102-111. |
[3] | 石敏, 王炳祺, 李兆歆, 朱登明. 一种带高光处理的无缝纹理映射方法[J]. 图学学报, 2024, 45(1): 148-158. |
[4] | 徐祯东, 张天宇, 张世恒, 姚从荣, 王道累. 基于YUV颜色空间GAN网络的图像去雾算法研究[J]. 图学学报, 2023, 44(5): 928-936. |
[5] | 邵俊棋, 钱文华, 徐启豪. 基于条件残差生成对抗网络的风景图生成[J]. 图学学报, 2023, 44(4): 710-717. |
[6] | 张立峰, 李晶. 基于级联密集残差网络的温度场高分辨率重建[J]. 图学学报, 2023, 44(2): 216-224. |
[7] | 廖仕敏, 刘仰川, 朱叶晨, 王艳玲, 高欣 . 一种基于 CycleGAN 改进的低剂量 CT 图像增强网络[J]. 图学学报, 2022, 43(4): 570-578. |
[8] | 方洪波, 万广, 陈忠辉, 黄以卫, 张文勇, 谢本亮. 基于改进 YOLOv5s 的离线手写数学符号识别[J]. 图学学报, 2022, 43(3): 387-395. |
[9] | 汪玉金, 谢 诚, 余蓓蓓, 向鸿鑫, 柳 青. 属性语义与图谱语义融合增强的 零次学习图像识别[J]. 图学学报, 2021, 42(6): 899-907. |
[10] | 林 森 , 刘 旭 . 门控融合对抗网络的水下图像增强 [J]. 图学学报, 2021, 42(6): 948-956. |
[11] | 杨勇, 刘惠义. 极端低光情况下的图像增强方法[J]. 图学学报, 2020, 41(4): 520-528. |
[12] | 李 桂, 李 腾. 基于姿态引导的场景保留人物视频生成[J]. 图学学报, 2020, 41(4): 539-547. |
[13] | 刘云彪, 陈纯毅, 胡小娟, 邢琦玮, 杨华民 . 基于贪心策略的多结点并行光线跟踪负载均衡算法[J]. 图学学报, 2020, 41(2): 237-245. |
[14] | 罗琪彬 1,2, 蔡 强 1,2 . 采用双框架生成对抗网络的图像运动模糊盲去除[J]. 图学学报, 2019, 40(6): 1056-1063. |
[15] | 郑顾平, 邢 玥, 张荣华. 一种基于GPU 的弹坑实时绘制方法[J]. 图学学报, 2016, 37(4): 451-456. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||