高分辨率人脸纹理图全流程生成方法

doi:10.11996/JG.j.2095-302X.2024040814

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 814-826.DOI: 10.11996/JG.j.2095-302X.2024040814

• 计算机图形学与虚拟现实 • 上一篇下一篇

高分辨率人脸纹理图全流程生成方法

朱宝旭(), 刘漫丹(), 张雯婷, 谢立志

华东理工大学能源化工过程智能制造教育部重点实验室，上海 200237

收稿日期:2024-03-14 接受日期:2024-06-24 出版日期:2024-08-31 发布日期:2024-09-03
通讯作者:刘漫丹(1973-)，女，教授，博士。主要研究方向为大数据智能分析与处理、智能计算及应用等。E-mail：liumandan@ecust.edu.cn
第一作者:朱宝旭(2000-)，男，硕士研究生。主要研究方向为计算机视觉、虚拟人生成。E-mail：Y30221036@mail.ecust.edu.cn
基金资助:
中央高校基本科研业务费专项资金资助项目(222201917006)

Full process generation method of high-resolution face texture map

ZHU Baoxu(), LIU Mandan(), ZHANG Wenting, XIE Lizhi

Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China

Received:2024-03-14 Accepted:2024-06-24 Published:2024-08-31 Online:2024-09-03
Contact: LIU Mandan (1973-), professor, Ph.D. Her main research interests cover big data intelligent analysis and processing, intelligent computing and its applications, etc. E-mail：liumandan@ecust.edu.cn
First author：ZHU Baoxu (2000-), master student. His main research interests cover computer vision, virtual human generation. E-mail：Y30221036@mail.ecust.edu.cn
Supported by:
Fundamental Research Funds for the Central Universities(222201917006)

摘要/Abstract

摘要：

针对人脸纹理生成相关研究大部分聚焦于低分辨率纹理生成的问题，将图像翻译运用到高分辨率纹理图的生成中，提出一种以图像翻译网络为核心的1024×1024纹理图的全流程生成方法。在快速高效生成的同时，有效缓解了生成人脸UV纹理分辨率低的问题。在图像翻译网络中，由卷积神经网络作为骨干网络，嵌入统计纹理学习网络(STLNet)，并采用软自适应层实例规范化(Soft-AdaLIN)的归一化方法共同构成生成器，同时采用多尺度判别来指导高分辨率纹理图像的生成，最后进行颜色转换与泊松融合完成纹理校正。在FFHQ数据集随机抽取图像并进行人脸归一化后进行测试，通过一系列评价指标进行定量评估、同近年相关研究方法进行定性及定量比较，验证了该全流程生成方法在生成1024×1024人脸UV纹理图像上的优势。

关键词: 人脸图像翻译, 人脸纹理图, 高分辨率, 生成对抗网络, 统计纹理学习, 纹理映射

Abstract:

Most research on face texture generation focuses on low-resolution generation. To address this, the image translation was applied to the generation of high-resolution texture maps, proposing a whole-process method for generating 1024*1024 texture maps using an image translation network as the main part. This method effectively alleviated the problem of low resolution of ultraviolet texture generation, while ensuring rapid and efficient generation. In the image translation network, the convolutional neural networks served as the backbone network, combined with the statistical texture learning network (STLNet) and the normalization method of soft adaptive layer-instance normalization (Soft-AdaLIN) to form the generator. Meanwhile, multi-scale discrimination was employed to guide the generation of high-resolution texture images, and finally color conversion and Poisson fusion were performed to complete texture correction. Images were randomly extracted from the FFHQ dataset for face normalization and tested. Through a series of evaluation indexes for quantitative evaluation, qualitative and quantitative comparisons with recent relevant research methods, the advantages of this whole-process generation method in generating 1024×1024 face UV texture images were verified.

Key words: face image translation, face texture map, high resolution, generative adversarial network, statistical texture learning, texture mapping

中图分类号:

TP391

朱宝旭, 刘漫丹, 张雯婷, 谢立志. 高分辨率人脸纹理图全流程生成方法[J]. 图学学报, 2024, 45(4): 814-826.

ZHU Baoxu, LIU Mandan, ZHANG Wenting, XIE Lizhi. Full process generation method of high-resolution face texture map[J]. Journal of Graphics, 2024, 45(4): 814-826.

图/表 17

图1 生成人脸UV纹理图全流程示意图((a)输入人脸图像；(b)模版纹理图；(c)关键点映射后图像；(d)翻译网络生成图像；(e)可见性遮罩图；(f)可见性遮罩提取后图像；(g)颜色变换后模版纹理图；(h)最终纹理图像)

Fig. 1 Schematic diagram of the entire process for generating facial UV texture map ((a) Input facial image; (b) Template texture map; (c) Keypoint-mapped image; (d) Translation network generate image; (e) Visibility mask map; (f) Image after extracting visibility mask; (g) Template texture image after color transformation; (h) Final texture map)

图2 原图关键点检测及映射示意图((a)输入人脸图像；(b)关键点检测后图像；(c)模板纹理图；(d)关键点检测后模板纹理图；(e)映射过程中的图像；(f)关键点映射后图像)

Fig. 2 Schematic diagram of key point detection and mapping of the original image ((a) Input facial image; (b) Image after keypoint detection; (c) Template texture map; (d) Template texture map after keypoint detection; (e) Image during the mapping process; (f) Keypoint-mapped image)

图3 目标图关键点检测及映射示意图((a)输入人脸图像；(b)关键点检测后图像；(c)模板纹理图；(d)关键点检测后模板纹理图；(e)映射过程中的图像；(f)关键点映射后图像)

Fig. 3 Schematic diagram of key point detection and mapping of the target map ((a) Input facial image; (b) Image after keypoint detection; (c) Template texture map; (d) Template texture map after keypoint detection; (e) Image during the mapping process; (f) Keypoint-mapped image)

图4 生成器网络结构图

Fig. 4 Generator network structure diagram

图5 STLNet结构图

Fig. 5 Specific structure diagram of STLNet

图6 TEM结构图

Fig. 6 Specific structure diagram of TEM

图7 PTFEM具体结构图

Fig. 7 Specific structure diagram of PTFEM

图8 整体网络结构((a)输入图像；(b)翻译网络生成图像(1024×1024)；(b1)生成图像下采样所得图像(512×512)；(b2)生成图像下采样所得图像(256×256)；(c)翻译网络目标图像(1024×1024)；(c1)目标图像下采样所得图像(512×512)；(c2)目标图像下采样所得图像(256×256))

Fig. 8 Overall network structure ((a) Input image; (b) Translate network generate image (1024×1024); (b1) The image obtained by downsampling the generate image (512×512); (b2) The image obtained by downsampling the generate image (256×256); (c) Translate network target image (1024×1024); (c1) The image obtained by downsampling the target image (512×512); (c2) The image obtained by downsampling the target image (256×256))

图9 UV纹理生成全流程可视化图像((a)输入人脸图像；(b)关键点映射后图像；(c)翻译网络生成图像；(d)最终纹理图像)

Fig. 9 UV texture generates a visualization image of the whole process ((a) Input facial images; (b) Keypoint-mapped images; (c) Translation network generate images; (d) Final texture map)

图10 不同人脸UV纹理图三维可视化((a)输入人脸图像；(b)人脸正面三维可视化；(c)人脸左侧三维可视化；(d)人脸右侧三维可视化)

Fig. 10 3D visualization of UV texture map of different faces ((a) Input facial images; (b) Frontal face 3D visualization; (c) 3D visualization of the left side of the face; (d) 3D visualization of the right side of the face)

图11 有无STLNet及Soft-AdaLIN示意图((a1~a3)无Soft-AdaLIN and STLNet；(b1~b3)嵌入Soft-AdaLIN；(c1~c3)嵌入STLNet；(d1~d3)嵌入Soft-AdaLIN和STLNet)

Fig. 11 (Have/No) Schematic diagram of STLNet and Soft-AdaLIN ((a1~a3) No Soft-AdaLIN and STLNet; (b1~b3) Embed Soft-AdaLIN; (c1~c3) Embed STLNet; (d1~d3) Embed Soft-AdaLIN and STLNet)

图12 成对偏转人脸--UV纹理示意图((a)输入人脸(头部左偏转)；(b)纹理图像(头部左偏转)；(c)输入人脸(头部右偏转)；(d)纹理图像(头部右偏转))

Fig. 12 Pairs of deflected faces -- UV texture diagram ((a) Input face (head left deflection); (b) Texture image (head left deflection); (c) Input face (head right deflection); (d) Texture image (head right deflection))

表1 有无STLNet及Soft-AdaLIN网络的定量比较

Table 1 Quantitative comparison of networks with and without STLNet and Soft-AdaLIN

网络	网络(a) (无Soft-AdaLIN及STLNet)	网络(b) (嵌入Soft-AdaLIN)	网络(c) (嵌入STLNet)	网络(d) (嵌入Soft-AdaLIN及STLNet)
PSNR	34.460 3	34.610 2	34.231 4	34.687 5
SSIM(source)	0.909 7	0.911 9	0.911 4	0.912 7
SSIM(target)	0.973 6	0.974 9	0.974 9	0.975 7
LPIPS	0.079 3	0.082 2	0.080 0	0.075 7

表2 各方法的平均PSNR、SSIM、LPIPS定量比较

Table 2 Quantitative comparison of average PSNR, SSIM and LPIPS of each method

方法	OsTec	FFHQ-UV	本文
PSNR	12.577 0	12.274 8	12.586 7
SSIM	0.730 5	0.693 4	0.748 6
LPIPS	0.743 6	0.705 8	0.692 8

图13 输入人脸至最终人脸UV纹理各阶段表示((a)输入人脸；(b)关键点映射后图像；(c)翻译网络生成纹理图；(d)最终人脸UV纹理图)

Fig. 13 Diagram of each stage from inputting a face to generating the final face UV texture ((a) Input facial image; (b) Keypoint-mapped image; (c) Translation networks generate texture map; (d) Final face UV texture map)

图14 与Normalized Avatar Synthesis及FFHQ-UV的可视化比较((a)输入人脸；(b)文献[28]；(c)文献[14]；(d)本文；(e)对应纹理图像)

Fig. 14 Visual comparison with Normalized Avatar Synthesis and FFHQ-UV ((a) Input facial images; (b) Reference [28]; (c) Reference [14]; (d) Ours; (e) Corresponding texture image)

图15 与OsTec的可视化比较((a)输入人脸；(b) OsTec方法；(c)本文方法)

Fig. 15 Visual comparison with OsTec ((a) Input facial images; (b) OsTec methods generate textures; (c) The proposed method generates textures)

参考文献 28

[1]	岳芙蓉. 基于深度学习的单视图三维人脸重建模型研究[D]. 成都: 电子科技大学, 2022.
	YUE F R. Research on single-view 3D face reconstruction model based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2022 (in Chinese).
[2]	彭政夫. 基于单张图片的真实感三维人脸纹理重建研究[D]. 广州: 华南理工大学, 2022.
	PENG Z F. Research on realistic 3D face texture reconstruction based on single image[D]. Guangzhou: South China University of Technology, 2022 (in Chinese).
[3]	韩漾宏. 基于单张图片的三维人脸重建[D]. 重庆: 西南大学, 2022.
	HAN Y H. Single image based 3D face reconstruction[D]. Chongqing: Southwest University, 2022 (in Chinese).
[4]	BLANZ V, VETTER T. A morphable model for the synthesis of 3D faces[M]// Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023: 157-164.
[5]	HUBER P, HU G S, TENA R, et al. A multiresolution 3D morphable face model and fitting framework[C]// The 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Setúbal: SCITEPRESS - Science and Technology Publications, 2016, 5: 79-86.
[6]	LI T Y, BOLKART T, BLACK M J, et al. Learning a model of facial shape and expression from 4D scans[J]. ACM Transactions on Graphics, 2017, 36(6): 1-17.
[7]	DENG J K, CHENG S Y, XUE N N, et al. UV-GAN: adversarial facial UV map completion for pose-invariant face recognition[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7093-7102.
[8]	YIN X N, HUANG D, FU Z H, et al. Weakly-supervised photo-realistic texture generation for 3D face reconstruction[C]// 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition. New York: IEEE Press, 2023: 1-8.
[9]	YANG M X, GUO J W, CHENG Z L, et al. Self-supervised re-renderable facial albedo reconstruction from single image[EB/OL]. [2024-01-18]. http://arxiv.org/abs/2111.08282.
[10]	刘洋, 樊养余, 郭哲, 等. 单幅人脸图像的全景纹理图生成方法[J]. 中国图象图形学报, 2022, 27(2): 602-613.
	LIU Y, FAN Y Y, GUO Z, et al. Single face image-based panoramic texture map generation[J]. Journal of Image and Graphics, 2022, 27(2): 602-613 (in Chinese).
[11]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[12]	GECER B, PLOUMPIS S, KOTSIA I, et al. GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1155-1164.
[13]	GECER B, DENG J K, ZAFEIRIOU S. OSTeC: one-shot texture completion[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7624-7634.
[14]	BAI H R, KANG D, ZHANG H X, et al. FFHQ-UV: normalized facial UV-texture dataset for 3D face reconstruction[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 362-371.
[15]	KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 8110-8119.
[16]	PÉREZ P, GANGNET M, BLAKE A. Poisson image editing[M]// Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023: 577-582.
[17]	KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4401-4410.
[18]	KARTYNNIK Y, ABLAVATSKI A, GRISHCHENKO I, et al. Real-time facial surface geometry from monocular video on mobile GPUs[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1907.06724.
[19]	ZHU L Y, JI D Y, ZHU S P, et al. Learning statistical texture for semantic segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12537-12546.
[20]	HARALICK R M, SHANMUGAM K, DINSTEIN I. Textural features for image classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1973, SMC-3(6): 610-621.
[21]	KIM J, KIM M, KANG H, et al. U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1907.10830.
[22]	WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8798-8807.
[23]	ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5967-5976.
[24]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1409.1556.
[25]	HORÉ A, ZIOU D. Image quality metrics:PSNR vs. SSIM[C]// 2010 20th International Conference on Pattern Recognition. New York: IEEE Press, 2010: 2366-2369.
[26]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2004, 13(4): 600-612.
[27]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595.
[28]	LUO H W, NAGANO K, KUNG H W, et al. Normalized avatar synthesis using StyleGAN and perceptual refinement[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11657-11667.

高分辨率人脸纹理图全流程生成方法

Full process generation method of high-resolution face texture map

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 28

相关文章 15

编辑推荐

Metrics

本文评价

[1]	刘宗明 , 洪唯 , 龙睿 , 祝越 , 张小宇 . 基于自注意机制的乳源瑶绣自动生成与应用研究[J]. 图学学报, 2024, 45(5): 1096-1105.
[2]	古天骏, 熊苏雅, 林晓. 基于SASGAN的戏剧脸谱多样化生成[J]. 图学学报, 2024, 45(1): 102-111.
[3]	石敏, 王炳祺, 李兆歆, 朱登明. 一种带高光处理的无缝纹理映射方法[J]. 图学学报, 2024, 45(1): 148-158.
[4]	徐祯东, 张天宇, 张世恒, 姚从荣, 王道累. 基于YUV颜色空间GAN网络的图像去雾算法研究[J]. 图学学报, 2023, 44(5): 928-936.
[5]	邵俊棋, 钱文华, 徐启豪. 基于条件残差生成对抗网络的风景图生成[J]. 图学学报, 2023, 44(4): 710-717.
[6]	张立峰, 李晶. 基于级联密集残差网络的温度场高分辨率重建[J]. 图学学报, 2023, 44(2): 216-224.
[7]	廖仕敏, 刘仰川, 朱叶晨, 王艳玲, 高欣 . 一种基于 CycleGAN 改进的低剂量 CT 图像增强网络[J]. 图学学报, 2022, 43(4): 570-578.
[8]	方洪波, 万广, 陈忠辉, 黄以卫, 张文勇, 谢本亮. 基于改进 YOLOv5s 的离线手写数学符号识别[J]. 图学学报, 2022, 43(3): 387-395.
[9]	汪玉金, 谢诚, 余蓓蓓, 向鸿鑫, 柳青. 属性语义与图谱语义融合增强的零次学习图像识别[J]. 图学学报, 2021, 42(6): 899-907.
[10]	林森 , 刘旭 . 门控融合对抗网络的水下图像增强 [J]. 图学学报, 2021, 42(6): 948-956.
[11]	杨勇，刘惠义. 极端低光情况下的图像增强方法[J]. 图学学报, 2020, 41(4): 520-528.
[12]	李桂，李腾. 基于姿态引导的场景保留人物视频生成[J]. 图学学报, 2020, 41(4): 539-547.
[13]	刘云彪，陈纯毅，胡小娟，邢琦玮，杨华民 . 基于贪心策略的多结点并行光线跟踪负载均衡算法[J]. 图学学报, 2020, 41(2): 237-245.
[14]	罗琪彬 1,2，蔡强 1,2 . 采用双框架生成对抗网络的图像运动模糊盲去除[J]. 图学学报, 2019, 40(6): 1056-1063.
[15]	郑顾平，邢玥，张荣华. 一种基于GPU 的弹坑实时绘制方法[J]. 图学学报, 2016, 37(4): 451-456.