Full process generation method of high-resolution face texture map

doi:10.11996/JG.j.2095-302X.2024040814

Abstract

Abstract:

Most research on face texture generation focuses on low-resolution generation. To address this, the image translation was applied to the generation of high-resolution texture maps, proposing a whole-process method for generating 1024*1024 texture maps using an image translation network as the main part. This method effectively alleviated the problem of low resolution of ultraviolet texture generation, while ensuring rapid and efficient generation. In the image translation network, the convolutional neural networks served as the backbone network, combined with the statistical texture learning network (STLNet) and the normalization method of soft adaptive layer-instance normalization (Soft-AdaLIN) to form the generator. Meanwhile, multi-scale discrimination was employed to guide the generation of high-resolution texture images, and finally color conversion and Poisson fusion were performed to complete texture correction. Images were randomly extracted from the FFHQ dataset for face normalization and tested. Through a series of evaluation indexes for quantitative evaluation, qualitative and quantitative comparisons with recent relevant research methods, the advantages of this whole-process generation method in generating 1024×1024 face UV texture images were verified.

Key words: face image translation, face texture map, high resolution, generative adversarial network, statistical texture learning, texture mapping

CLC Number:

TP391

ZHU Baoxu, LIU Mandan, ZHANG Wenting, XIE Lizhi. Full process generation method of high-resolution face texture map[J]. Journal of Graphics, 2024, 45(4): 814-826.

Figures/Tables 17

Fig. 1 Schematic diagram of the entire process for generating facial UV texture map ((a) Input facial image; (b) Template texture map; (c) Keypoint-mapped image; (d) Translation network generate image; (e) Visibility mask map; (f) Image after extracting visibility mask; (g) Template texture image after color transformation; (h) Final texture map)

Fig. 2 Schematic diagram of key point detection and mapping of the original image ((a) Input facial image; (b) Image after keypoint detection; (c) Template texture map; (d) Template texture map after keypoint detection; (e) Image during the mapping process; (f) Keypoint-mapped image)

Fig. 3 Schematic diagram of key point detection and mapping of the target map ((a) Input facial image; (b) Image after keypoint detection; (c) Template texture map; (d) Template texture map after keypoint detection; (e) Image during the mapping process; (f) Keypoint-mapped image)

Fig. 4 Generator network structure diagram

Fig. 5 Specific structure diagram of STLNet

Fig. 6 Specific structure diagram of TEM

Fig. 7 Specific structure diagram of PTFEM

Fig. 8 Overall network structure ((a) Input image; (b) Translate network generate image (1024×1024); (b1) The image obtained by downsampling the generate image (512×512); (b2) The image obtained by downsampling the generate image (256×256); (c) Translate network target image (1024×1024); (c1) The image obtained by downsampling the target image (512×512); (c2) The image obtained by downsampling the target image (256×256))

Fig. 9 UV texture generates a visualization image of the whole process ((a) Input facial images; (b) Keypoint-mapped images; (c) Translation network generate images; (d) Final texture map)

Fig. 10 3D visualization of UV texture map of different faces ((a) Input facial images; (b) Frontal face 3D visualization; (c) 3D visualization of the left side of the face; (d) 3D visualization of the right side of the face)

Fig. 11 (Have/No) Schematic diagram of STLNet and Soft-AdaLIN ((a1~a3) No Soft-AdaLIN and STLNet; (b1~b3) Embed Soft-AdaLIN; (c1~c3) Embed STLNet; (d1~d3) Embed Soft-AdaLIN and STLNet)

Fig. 12 Pairs of deflected faces -- UV texture diagram ((a) Input face (head left deflection); (b) Texture image (head left deflection); (c) Input face (head right deflection); (d) Texture image (head right deflection))

Table 1 Quantitative comparison of networks with and without STLNet and Soft-AdaLIN

网络	网络(a) (无Soft-AdaLIN及STLNet)	网络(b) (嵌入Soft-AdaLIN)	网络(c) (嵌入STLNet)	网络(d) (嵌入Soft-AdaLIN及STLNet)
PSNR	34.460 3	34.610 2	34.231 4	34.687 5
SSIM(source)	0.909 7	0.911 9	0.911 4	0.912 7
SSIM(target)	0.973 6	0.974 9	0.974 9	0.975 7
LPIPS	0.079 3	0.082 2	0.080 0	0.075 7

Table 2 Quantitative comparison of average PSNR, SSIM and LPIPS of each method

方法	OsTec	FFHQ-UV	本文
PSNR	12.577 0	12.274 8	12.586 7
SSIM	0.730 5	0.693 4	0.748 6
LPIPS	0.743 6	0.705 8	0.692 8

Fig. 13 Diagram of each stage from inputting a face to generating the final face UV texture ((a) Input facial image; (b) Keypoint-mapped image; (c) Translation networks generate texture map; (d) Final face UV texture map)

Fig. 14 Visual comparison with Normalized Avatar Synthesis and FFHQ-UV ((a) Input facial images; (b) Reference [28]; (c) Reference [14]; (d) Ours; (e) Corresponding texture image)

Fig. 15 Visual comparison with OsTec ((a) Input facial images; (b) OsTec methods generate textures; (c) The proposed method generates textures)

References 28

[1]	岳芙蓉. 基于深度学习的单视图三维人脸重建模型研究[D]. 成都: 电子科技大学, 2022.
	YUE F R. Research on single-view 3D face reconstruction model based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2022 (in Chinese).
[2]	彭政夫. 基于单张图片的真实感三维人脸纹理重建研究[D]. 广州: 华南理工大学, 2022.
	PENG Z F. Research on realistic 3D face texture reconstruction based on single image[D]. Guangzhou: South China University of Technology, 2022 (in Chinese).
[3]	韩漾宏. 基于单张图片的三维人脸重建[D]. 重庆: 西南大学, 2022.
	HAN Y H. Single image based 3D face reconstruction[D]. Chongqing: Southwest University, 2022 (in Chinese).
[4]	BLANZ V, VETTER T. A morphable model for the synthesis of 3D faces[M]// Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023: 157-164.
[5]	HUBER P, HU G S, TENA R, et al. A multiresolution 3D morphable face model and fitting framework[C]// The 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Setúbal: SCITEPRESS - Science and Technology Publications, 2016, 5: 79-86.
[6]	LI T Y, BOLKART T, BLACK M J, et al. Learning a model of facial shape and expression from 4D scans[J]. ACM Transactions on Graphics, 2017, 36(6): 1-17.
[7]	DENG J K, CHENG S Y, XUE N N, et al. UV-GAN: adversarial facial UV map completion for pose-invariant face recognition[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7093-7102.
[8]	YIN X N, HUANG D, FU Z H, et al. Weakly-supervised photo-realistic texture generation for 3D face reconstruction[C]// 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition. New York: IEEE Press, 2023: 1-8.
[9]	YANG M X, GUO J W, CHENG Z L, et al. Self-supervised re-renderable facial albedo reconstruction from single image[EB/OL]. [2024-01-18]. http://arxiv.org/abs/2111.08282.
[10]	刘洋, 樊养余, 郭哲, 等. 单幅人脸图像的全景纹理图生成方法[J]. 中国图象图形学报, 2022, 27(2): 602-613.
	LIU Y, FAN Y Y, GUO Z, et al. Single face image-based panoramic texture map generation[J]. Journal of Image and Graphics, 2022, 27(2): 602-613 (in Chinese).
[11]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[12]	GECER B, PLOUMPIS S, KOTSIA I, et al. GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1155-1164.
[13]	GECER B, DENG J K, ZAFEIRIOU S. OSTeC: one-shot texture completion[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7624-7634.
[14]	BAI H R, KANG D, ZHANG H X, et al. FFHQ-UV: normalized facial UV-texture dataset for 3D face reconstruction[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 362-371.
[15]	KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 8110-8119.
[16]	PÉREZ P, GANGNET M, BLAKE A. Poisson image editing[M]// Seminal Graphics Papers: Pushing the Boundaries, Volume 2. New York: ACM, 2023: 577-582.
[17]	KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4401-4410.
[18]	KARTYNNIK Y, ABLAVATSKI A, GRISHCHENKO I, et al. Real-time facial surface geometry from monocular video on mobile GPUs[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1907.06724.
[19]	ZHU L Y, JI D Y, ZHU S P, et al. Learning statistical texture for semantic segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12537-12546.
[20]	HARALICK R M, SHANMUGAM K, DINSTEIN I. Textural features for image classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1973, SMC-3(6): 610-621.
[21]	KIM J, KIM M, KANG H, et al. U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1907.10830.
[22]	WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8798-8807.
[23]	ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5967-5976.
[24]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1409.1556.
[25]	HORÉ A, ZIOU D. Image quality metrics:PSNR vs. SSIM[C]// 2010 20th International Conference on Pattern Recognition. New York: IEEE Press, 2010: 2366-2369.
[26]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2004, 13(4): 600-612.
[27]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595.
[28]	LUO H W, NAGANO K, KUNG H W, et al. Normalized avatar synthesis using StyleGAN and perceptual refinement[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11657-11667.