考虑用户感知的立体图像生成

doi:10.11996/JG.j.2095-302X.2023050966

摘要/Abstract

摘要：

近年来，三维(3D)显示器由于其优越的沉浸式体验而受到越来越多的关注。然而3D内容的缺乏限制了3D显示器的发展。为了获得稀缺的3D内容，二维(2D)到3D转换是一种有前途且有效的方法。转换需要向2D内容添加额外的深度信息。然而，现有的深度估计方法由于其不稳定性，不能满足2D到3D转换的要求。为此提出一种立体图像呈现系统，其在考虑人类感知的同时，将单目图像转换为一对用于3D显示的立体图像。该系统的核心步骤提出了一种考虑人类感知的深度优化算法(DOCHP)，以语义分割图作为输入，通过考虑人类感知(包括注意力机制和深度感知)来生成优化的深度图，增强立体图像的立体效果。实验结果表明，采用系统优化的深度图生成立体图像，可以让用户感受到较强的3D效果。此结果显示了立体图像制作中考虑人类感知特征的必要性，也将支持裸眼立体图像的推广应用。

关键词: 2D-to-3D, 3D显示, 人类感知, 单目图像, 立体感增强

Abstract:

In recent years, three-dimensional (3D) displays have garnered increasing attention for their superior immersive experience. However, the lack of 3D content poses a challenge to the development of 3D displays. To obtain scarce 3D content, two-dimensional (2D)-to-3D conversion has emerged as a promising and effective approach. The conversion involves adding extra depth information to 2D content. However, existing depth estimation methods cannot satisfy the requirements of 2D-to-3D conversion because of their instability. This paper presented a stereoscopic image presentation system, which was designed to transfer a monocular image to a pair of stereoscopic images for 3D displays while considering human perception. The core step of the system proposed an algorithm called depth optimization considering human perception (DOCHP), using semantic segmentation images as input and considering human perception, including attentional mechanisms and depth perception to enhance the stereoscopic effect of the stereoscopic images. The experimental results demonstrated that the stereoscopic images, which were generated through the deep map optimized by the system, provided users with a strong sense of 3D effect. This article demonstrated the necessity of incorporating human perceptual characteristics in the production of autostereoscopic images and bolstered the promotion and application of autostereoscopic images.

Key words: 2D-to-3D, 3D displays, human perception, monocular images, stereoscopic sensation enhancement

中图分类号:

TP391

陈鹏, 江浩, 向为. 考虑用户感知的立体图像生成[J]. 图学学报, 2023, 44(5): 966-977.

CHEN Peng, JIANG Hao, XIANG Wei. Stereoscopic image generation considering human perception[J]. Journal of Graphics, 2023, 44(5): 966-977.

图/表 11

图1 本文系统的3个步骤

Fig. 1 Proposed system consists of three steps

图2 立体设备将2D图像及其深度图作为输入以创建立体效果

Fig. 2 Stereoscopic device takes a 2D image and its depth map as input

图3 DOCHP概述

Fig. 3 Overview of DOCHP

表1 注意力计算阶段使用的特征

Table 1 The features used in the stage of attention calculation

特点		样例
高级特征	识别率：由模型识别分割块的类型和精度
高级特征	显著性图：显示注意力分布的灰度图 (亮区域表示浓度高)
低级特征	颜色：分割块与其他分割块之间的颜色对比的总和
	布局：线段中心与图像中心之间的距离
	深度：深度图中分割块的平均深度

图4 深度优化((a)深度冲突示例；(b) 2只长颈鹿的深度分布图(不是真实数据，但提供了解释)；(c) 2次操作案例)

Fig. 4 Depth optimization ((a) Example of a depth conflict; (b) Depth distribution of two giraffes in the example (not real data, but provided for explanation); (c) Examples of two operations)

图5 分数可视化((a)彩色图；(b)基于高级特征的得分；(c)基于低级特征的得分；(d)基于深度信息的得分；(e)结合所有3个得分的注意力得分)

Fig. 5 Visualization of scores ((a) Original color image；(b) Scores based on high-level features; (c) Scores based on low-level features; (d) Scores based on depth information; (e) Attention scores combining all three scores)

图6 迭代调整阶段前后的结果((a) 2D图像；(b)之前；(c)之后)

Fig. 6 Results before and after the stage of iterative adjustment ((a) 2D image; (b) Before iterative adjustment; (c) After iterative adjustment)

图7 深度图的比较((a)彩色图；(b)生成的估计深度图[31]；(c)优化的增强深度图[35]；(d)手动深度图；(e) DOCHP优化的深度图)

Fig. 7 Comparison of depth maps ((a) Original color image; (b) Shows the estimated depth maps generated by[31]; (c) The enhanced depth maps optimized[35]; (d) The manual depth maps; (e) The depth maps optimized by DOCHP)

表2 基于模型的分析

Table 2 Model-based analysis

模型	P(M)	P(M\|data)	BF₁₀
Null model (包含被试、图像)	0.333	5.932×10^-11	1.000
生成方法	0.333	0.510	8.598×10⁹
生成方法、图像	0.333	0.490	8.260×10⁹

表3 模型后验总结

Table 3 Model summary

变量	水平	均值	方差	95%置信度-低	95%置信度-高
生成方法	本文方法	0.404	0.102	0.198	0.609
	手工	0.484	0.104	0.276	0.692
	估计	-0.610	0.105	-0.820	-0.403
	传统优化	-0.278	0.101	-0.484	-0.080
图	图1	-0.328	0.166	-0.661	-0.002
	图2	0.240	0.162	-0.080	0.564
	图3	-0.307	0.163	-0.634	0.010
	图4	0.239	0.163	-0.089	0.563
	图5	-0.205	0.161	-0.530	0.117
	图6	0.513	0.169	0.181	0.852
	图7	0.314	0.165	-0.014	0.638
	图8	-0.329	0.164	-0.670	-0.014
	图9	-0.108	0.162	-0.433	0.211
	图10	-0.028	0.161	-0.352	0.283

图8 4种方法的效果

Fig. 8 The effect of the 4 methods

参考文献 40

[1]	李艳莉, 徐若锋. 基于统计特性的DIBR图像的无参考质量评价[J]. 激光与光电子学进展, 2022, 59(8): 228-236.
	LI Y L, XU R F. No-reference image quality assessment of DIBR-synthesized images based on statistical characteristics[J]. Laser & Optoelectronics Progress, 2022, 59(8): 228-236. (in Chinese)
[2]	PATIL S, CHARLES P. Review on 2D-to-3D image and video conversion methods[C]// 2015 International Conference on Computing Communication Control and Automation. New York: IEEE Press, 2015: 728-732.
[3]	沙浩, 刘越. 基于深度学习的图像本征属性预测方法综述[J]. 图学学报, 2021, 42(3): 385-397.
	SHA H, LIU Y. Review on deep learning based prediction of image intrinsic properties[J]. Journal of Graphics, 2021, 42(3): 385-397. (in Chinese)
[4]	LI Z Q, SNAVELY N. MegaDepth: learning single-view depth prediction from Internet photos[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2041-2050.
[5]	DE SILVA D V S X, FERNANDO W A C, WORRALL S T, et al. Just noticeable difference in depth model for stereoscopic 3D displays[C]// 2010 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2010: 1219-1224.
[6]	TERZIĆ K, HANSARD M. Methods for reducing visual discomfort in stereoscopic 3D: a review[J]. Signal Processing: Image Communication, 2016, 47: 402-416. DOI URL
[7]	牟琦, 张寒, 何志强, 等. 基于深度估计和特征融合的尺度自适应目标跟踪算法[J]. 图学学报, 2021, 42(4): 563-571.
	MU Q, ZHANG H, HE Z Q, et al. Scale adaptive target tracking algorithm based on depth estimation and feature fusion[J]. Journal of Graphics, 2021, 42(4): 563-571. (in Chinese)
[8]	JI P, WANG L H, LI D X, et al. An automatic 2D to 3D conversion algorithm using multi-depth cues[C]// 2012 International Conference on Audio, Language and Image Processing. New York: IEEE Press, 2012: 546-550.
[9]	KONRAD J, WANG M, ISHWAR P. 2D-to-3D image conversion by learning depth from examples[C]// 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2012: 16-22.
[10]	李林鑫. 2D视频到双目3D视频转换关键技术研究[D]. 南京: 东南大学, 2021.
	LI L X. Research on key technologies of 2D video to binocular 3D video conversion[D]. Nanjing: Southeast University, 2021. (in Chinese)
[11]	黄军, 王聪, 刘越, 等. 单目深度估计技术进展综述[J]. 中国图象图形学报, 2019, 24(12): 2081-2097.
	HUANG J, WANG C, LIU Y, et al. The progress of monocular depth estimation technology[J]. Journal of Image and Graphics, 2019, 24(12): 2081-2097. (in Chinese)
[12]	LADICKÝ L, SHI J B, POLLEFEYS M. Pulling things out of perspective[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 89-96.
[13]	SHI J P, TAO X, XU L, et al. Break Ames room illusion: depth from general single images[J]. ACM Transactions on Graphics, 2015, 34(6): 225: 1-225:11.
[14]	ZHOU W, SALZMANN M, HE X M, et al. Indoor scene structure analysis for single image depth estimation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 614-622.
[15]	KARSCH K, LIU C, KANG S B. Depth transfer: depth extraction from video using non-parametric sampling[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(11): 2144-2158. DOI PMID
[16]	KIM S, PARK K, SOHN K, et al. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields[C]// European Conference on Computer Vision. Cham: Springer, 2016: 143-159.
[17]	张晓娜. 基于深度学习的单目图像深度估计[D]. 石家庄: 河北师范大学, 2020.
	ZHANG X N. Deep learning based monocular depth estimation[D]. Shijiazhuang: Hebei Normal University, 2020. (in Chinese)
[18]	GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 3354-3361.
[19]	SAXENA A, SUN M, NG A Y. Learning 3-D scene structure from a single still image[C]// 2007 IEEE 11th International Conference on Computer Vision. New York: IEEE Press, 2007: 1-8.
[20]	LIN L X, HUANG G H, CHEN Y J, et al. Efficient and high-quality monocular depth estimation via gated multi-scale network[J]. IEEE Access, 2020, 8: 7709-7718. DOI URL
[21]	FU J W, LIANG J, WANG Z Y. Monocular depth estimation based on multi-scale graph convolution networks[J]. IEEE Access, 2019, 8: 997-1009. DOI URL
[22]	KUZNIETSOV Y, STÜCKLER J, LEIBE B. Semi-supervised deep learning for monocular depth map prediction[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2215-2223.
[23]	GOLDMAN M, HASSNER T, AVIDAN S. Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 2886-2895.
[24]	CHEN W F, FU Z, YANG D W, et al. Single-image depth perception in the wild[C]// The 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 730-738.
[25]	JUNG Y J, SOHN H, LEE S I, et al. Visual comfort improvement in stereoscopic 3D displays using perceptually plausible assessment metric of visual comfort[J]. IEEE Transactions on Consumer Electronics, 2014, 60(1): 1-9. DOI URL
[26]	OH C, HAM B, CHOI S, et al. Visual fatigue relaxation for stereoscopic video via nonlinear disparity remapping[J]. IEEE Transactions on Broadcasting, 2015, 61(2): 142-153. DOI URL
[27]	LEI J J, ZHANG C C, FANG Y M, et al. Depth sensation enhancement for multiple virtual view rendering[J]. IEEE Transactions on Multimedia, 2015, 17(4): 457-469. DOI URL
[28]	ISLAM M B, WONG L-K, LOW K-L, et al. Aesthetics-driven stereoscopic 3-D image recomposition with depth adaptation[J]. IEEE Transactions on Multimedia, 2018, 20(11): 2964-2979. DOI URL
[29]	SHAO F, LIN W C, LIN W S, et al. QoE-guided warping for stereoscopic image retargeting[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2017, 26(10): 4790-4805. DOI URL
[30]	CHUN M M. Contextual cueing of visual attention[J]. Trends in Cognitive Sciences, 2000, 4(5): 170-178. PMID
[31]	HOU Q B, CHENG M M, HU X W, et al. Deeply supervised salient object detection with short connections[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5300-5309.
[32]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988.
[33]	YAN Q, XU L, SHI J P, et al. Hierarchical saliency detection[C]// 2013 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2013: 1155-1162.
[34]	HIBBARD P B, HAINES A E, HORNSEY R L. Magnitude, precision, and realism of depth perception in stereoscopic vision[J]. Cognitive Research: Principles and Implications, 2017, 2: 25: 1-25:11. DOI URL
[35]	JUNG S W, KO S J. Depth sensation enhancement using the just noticeable depth difference[J]. IEEE Transactions on Image Processing, 2012, 21(8): 3624-3637. DOI URL
[36]	KAY M, NELSON G L, HEKLER E B. Researcher-centered design of statistics: why Bayesian statistics better fit the culture and incentives of HCI[C]// 2016 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2016: 4521-4532.
[37]	MOREY R D, ROMEIJN J W, ROUDER J N. The philosophy of Bayes factors and the quantification of statistical evidence[J]. Journal of Mathematical Psychology, 2016, 72: 6-18. DOI URL
[38]	WAGENMAKERS E J, MARSMAN M, JAMIL T, et al. Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications[J]. Psychonomic Bulletin & Review, 2018, 25(1): 35-57. DOI URL
[39]	SCHÖNBRODT F D, WAGENMAKERS E J. Bayes factor design analysis: planning for compelling evidence[J]. Psychonomic Bulletin & Review, 2018, 25(1): 128-142. DOI URL
[40]	ROUDER J N, MOREY R D, VERHAGEN J, et al. Bayesian analysis of factorial designs[J]. Psychological Methods, 2017, 22(2): 304-321. DOI PMID

特点		样例
高级特征	识别率：由模型识别分割块的类型和精度
高级特征	显著性图：显示注意力分布的灰度图 (亮区域表示浓度高)
低级特征	颜色：分割块与其他分割块之间的颜色对比的总和
	布局：线段中心与图像中心之间的距离
	深度：深度图中分割块的平均深度

特点		样例
高级特征	识别率：由模型识别分割块的类型和精度
高级特征	显著性图：显示注意力分布的灰度图 (亮区域表示浓度高)
低级特征	颜色：分割块与其他分割块之间的颜色对比的总和
	布局：线段中心与图像中心之间的距离
	深度：深度图中分割块的平均深度