Stereoscopic image generation considering human perception

doi:10.11996/JG.j.2095-302X.2023050966

Abstract

Abstract:

In recent years, three-dimensional (3D) displays have garnered increasing attention for their superior immersive experience. However, the lack of 3D content poses a challenge to the development of 3D displays. To obtain scarce 3D content, two-dimensional (2D)-to-3D conversion has emerged as a promising and effective approach. The conversion involves adding extra depth information to 2D content. However, existing depth estimation methods cannot satisfy the requirements of 2D-to-3D conversion because of their instability. This paper presented a stereoscopic image presentation system, which was designed to transfer a monocular image to a pair of stereoscopic images for 3D displays while considering human perception. The core step of the system proposed an algorithm called depth optimization considering human perception (DOCHP), using semantic segmentation images as input and considering human perception, including attentional mechanisms and depth perception to enhance the stereoscopic effect of the stereoscopic images. The experimental results demonstrated that the stereoscopic images, which were generated through the deep map optimized by the system, provided users with a strong sense of 3D effect. This article demonstrated the necessity of incorporating human perceptual characteristics in the production of autostereoscopic images and bolstered the promotion and application of autostereoscopic images.

Key words: 2D-to-3D, 3D displays, human perception, monocular images, stereoscopic sensation enhancement

CLC Number:

TP391

CHEN Peng, JIANG Hao, XIANG Wei. Stereoscopic image generation considering human perception[J]. Journal of Graphics, 2023, 44(5): 966-977.

Figures/Tables 11

Fig. 1 Proposed system consists of three steps

Fig. 2 Stereoscopic device takes a 2D image and its depth map as input

Fig. 3 Overview of DOCHP

Table 1 The features used in the stage of attention calculation

特点		样例
高级特征	识别率：由模型识别分割块的类型和精度
高级特征	显著性图：显示注意力分布的灰度图 (亮区域表示浓度高)
低级特征	颜色：分割块与其他分割块之间的颜色对比的总和
	布局：线段中心与图像中心之间的距离
	深度：深度图中分割块的平均深度

Fig. 4 Depth optimization ((a) Example of a depth conflict; (b) Depth distribution of two giraffes in the example (not real data, but provided for explanation); (c) Examples of two operations)

Fig. 5 Visualization of scores ((a) Original color image；(b) Scores based on high-level features; (c) Scores based on low-level features; (d) Scores based on depth information; (e) Attention scores combining all three scores)

Fig. 6 Results before and after the stage of iterative adjustment ((a) 2D image; (b) Before iterative adjustment; (c) After iterative adjustment)

Fig. 7 Comparison of depth maps ((a) Original color image; (b) Shows the estimated depth maps generated by[31]; (c) The enhanced depth maps optimized[35]; (d) The manual depth maps; (e) The depth maps optimized by DOCHP)

Table 2 Model-based analysis

模型	P(M)	P(M\|data)	BF₁₀
Null model (包含被试、图像)	0.333	5.932×10^-11	1.000
生成方法	0.333	0.510	8.598×10⁹
生成方法、图像	0.333	0.490	8.260×10⁹

Table 3 Model summary

变量	水平	均值	方差	95%置信度-低	95%置信度-高
生成方法	本文方法	0.404	0.102	0.198	0.609
	手工	0.484	0.104	0.276	0.692
	估计	-0.610	0.105	-0.820	-0.403
	传统优化	-0.278	0.101	-0.484	-0.080
图	图1	-0.328	0.166	-0.661	-0.002
	图2	0.240	0.162	-0.080	0.564
	图3	-0.307	0.163	-0.634	0.010
	图4	0.239	0.163	-0.089	0.563
	图5	-0.205	0.161	-0.530	0.117
	图6	0.513	0.169	0.181	0.852
	图7	0.314	0.165	-0.014	0.638
	图8	-0.329	0.164	-0.670	-0.014
	图9	-0.108	0.162	-0.433	0.211
	图10	-0.028	0.161	-0.352	0.283

Fig. 8 The effect of the 4 methods

References 40

[1]	李艳莉, 徐若锋. 基于统计特性的DIBR图像的无参考质量评价[J]. 激光与光电子学进展, 2022, 59(8): 228-236.
	LI Y L, XU R F. No-reference image quality assessment of DIBR-synthesized images based on statistical characteristics[J]. Laser & Optoelectronics Progress, 2022, 59(8): 228-236. (in Chinese)
[2]	PATIL S, CHARLES P. Review on 2D-to-3D image and video conversion methods[C]// 2015 International Conference on Computing Communication Control and Automation. New York: IEEE Press, 2015: 728-732.
[3]	沙浩, 刘越. 基于深度学习的图像本征属性预测方法综述[J]. 图学学报, 2021, 42(3): 385-397.
	SHA H, LIU Y. Review on deep learning based prediction of image intrinsic properties[J]. Journal of Graphics, 2021, 42(3): 385-397. (in Chinese)
[4]	LI Z Q, SNAVELY N. MegaDepth: learning single-view depth prediction from Internet photos[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2041-2050.
[5]	DE SILVA D V S X, FERNANDO W A C, WORRALL S T, et al. Just noticeable difference in depth model for stereoscopic 3D displays[C]// 2010 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2010: 1219-1224.
[6]	TERZIĆ K, HANSARD M. Methods for reducing visual discomfort in stereoscopic 3D: a review[J]. Signal Processing: Image Communication, 2016, 47: 402-416. DOI URL
[7]	牟琦, 张寒, 何志强, 等. 基于深度估计和特征融合的尺度自适应目标跟踪算法[J]. 图学学报, 2021, 42(4): 563-571.
	MU Q, ZHANG H, HE Z Q, et al. Scale adaptive target tracking algorithm based on depth estimation and feature fusion[J]. Journal of Graphics, 2021, 42(4): 563-571. (in Chinese)
[8]	JI P, WANG L H, LI D X, et al. An automatic 2D to 3D conversion algorithm using multi-depth cues[C]// 2012 International Conference on Audio, Language and Image Processing. New York: IEEE Press, 2012: 546-550.
[9]	KONRAD J, WANG M, ISHWAR P. 2D-to-3D image conversion by learning depth from examples[C]// 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2012: 16-22.
[10]	李林鑫. 2D视频到双目3D视频转换关键技术研究[D]. 南京: 东南大学, 2021.
	LI L X. Research on key technologies of 2D video to binocular 3D video conversion[D]. Nanjing: Southeast University, 2021. (in Chinese)
[11]	黄军, 王聪, 刘越, 等. 单目深度估计技术进展综述[J]. 中国图象图形学报, 2019, 24(12): 2081-2097.
	HUANG J, WANG C, LIU Y, et al. The progress of monocular depth estimation technology[J]. Journal of Image and Graphics, 2019, 24(12): 2081-2097. (in Chinese)
[12]	LADICKÝ L, SHI J B, POLLEFEYS M. Pulling things out of perspective[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 89-96.
[13]	SHI J P, TAO X, XU L, et al. Break Ames room illusion: depth from general single images[J]. ACM Transactions on Graphics, 2015, 34(6): 225: 1-225:11.
[14]	ZHOU W, SALZMANN M, HE X M, et al. Indoor scene structure analysis for single image depth estimation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 614-622.
[15]	KARSCH K, LIU C, KANG S B. Depth transfer: depth extraction from video using non-parametric sampling[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(11): 2144-2158. DOI PMID
[16]	KIM S, PARK K, SOHN K, et al. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields[C]// European Conference on Computer Vision. Cham: Springer, 2016: 143-159.
[17]	张晓娜. 基于深度学习的单目图像深度估计[D]. 石家庄: 河北师范大学, 2020.
	ZHANG X N. Deep learning based monocular depth estimation[D]. Shijiazhuang: Hebei Normal University, 2020. (in Chinese)
[18]	GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 3354-3361.
[19]	SAXENA A, SUN M, NG A Y. Learning 3-D scene structure from a single still image[C]// 2007 IEEE 11th International Conference on Computer Vision. New York: IEEE Press, 2007: 1-8.
[20]	LIN L X, HUANG G H, CHEN Y J, et al. Efficient and high-quality monocular depth estimation via gated multi-scale network[J]. IEEE Access, 2020, 8: 7709-7718. DOI URL
[21]	FU J W, LIANG J, WANG Z Y. Monocular depth estimation based on multi-scale graph convolution networks[J]. IEEE Access, 2019, 8: 997-1009. DOI URL
[22]	KUZNIETSOV Y, STÜCKLER J, LEIBE B. Semi-supervised deep learning for monocular depth map prediction[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2215-2223.
[23]	GOLDMAN M, HASSNER T, AVIDAN S. Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 2886-2895.
[24]	CHEN W F, FU Z, YANG D W, et al. Single-image depth perception in the wild[C]// The 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 730-738.
[25]	JUNG Y J, SOHN H, LEE S I, et al. Visual comfort improvement in stereoscopic 3D displays using perceptually plausible assessment metric of visual comfort[J]. IEEE Transactions on Consumer Electronics, 2014, 60(1): 1-9. DOI URL
[26]	OH C, HAM B, CHOI S, et al. Visual fatigue relaxation for stereoscopic video via nonlinear disparity remapping[J]. IEEE Transactions on Broadcasting, 2015, 61(2): 142-153. DOI URL
[27]	LEI J J, ZHANG C C, FANG Y M, et al. Depth sensation enhancement for multiple virtual view rendering[J]. IEEE Transactions on Multimedia, 2015, 17(4): 457-469. DOI URL
[28]	ISLAM M B, WONG L-K, LOW K-L, et al. Aesthetics-driven stereoscopic 3-D image recomposition with depth adaptation[J]. IEEE Transactions on Multimedia, 2018, 20(11): 2964-2979. DOI URL
[29]	SHAO F, LIN W C, LIN W S, et al. QoE-guided warping for stereoscopic image retargeting[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2017, 26(10): 4790-4805. DOI URL
[30]	CHUN M M. Contextual cueing of visual attention[J]. Trends in Cognitive Sciences, 2000, 4(5): 170-178. PMID
[31]	HOU Q B, CHENG M M, HU X W, et al. Deeply supervised salient object detection with short connections[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5300-5309.
[32]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988.
[33]	YAN Q, XU L, SHI J P, et al. Hierarchical saliency detection[C]// 2013 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2013: 1155-1162.
[34]	HIBBARD P B, HAINES A E, HORNSEY R L. Magnitude, precision, and realism of depth perception in stereoscopic vision[J]. Cognitive Research: Principles and Implications, 2017, 2: 25: 1-25:11. DOI URL
[35]	JUNG S W, KO S J. Depth sensation enhancement using the just noticeable depth difference[J]. IEEE Transactions on Image Processing, 2012, 21(8): 3624-3637. DOI URL
[36]	KAY M, NELSON G L, HEKLER E B. Researcher-centered design of statistics: why Bayesian statistics better fit the culture and incentives of HCI[C]// 2016 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2016: 4521-4532.
[37]	MOREY R D, ROMEIJN J W, ROUDER J N. The philosophy of Bayes factors and the quantification of statistical evidence[J]. Journal of Mathematical Psychology, 2016, 72: 6-18. DOI URL
[38]	WAGENMAKERS E J, MARSMAN M, JAMIL T, et al. Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications[J]. Psychonomic Bulletin & Review, 2018, 25(1): 35-57. DOI URL
[39]	SCHÖNBRODT F D, WAGENMAKERS E J. Bayes factor design analysis: planning for compelling evidence[J]. Psychonomic Bulletin & Review, 2018, 25(1): 128-142. DOI URL
[40]	ROUDER J N, MOREY R D, VERHAGEN J, et al. Bayesian analysis of factorial designs[J]. Psychological Methods, 2017, 22(2): 304-321. DOI PMID

特点		样例
高级特征	识别率：由模型识别分割块的类型和精度
高级特征	显著性图：显示注意力分布的灰度图 (亮区域表示浓度高)
低级特征	颜色：分割块与其他分割块之间的颜色对比的总和
	布局：线段中心与图像中心之间的距离
	深度：深度图中分割块的平均深度

特点		样例
高级特征	识别率：由模型识别分割块的类型和精度
高级特征	显著性图：显示注意力分布的灰度图 (亮区域表示浓度高)
低级特征	颜色：分割块与其他分割块之间的颜色对比的总和
	布局：线段中心与图像中心之间的距离
	深度：深度图中分割块的平均深度