图学学报 ›› 2023, Vol. 44 ›› Issue (5): 966-977.DOI: 10.11996/JG.j.2095-302X.2023050966
收稿日期:
2023-02-07
接受日期:
2023-06-15
出版日期:
2023-10-31
发布日期:
2023-10-31
通讯作者:
向为(1991-),男,讲师,博士。主要研究方向为智能设计等。E-mail:wxiang@zju.edu.cn
作者简介:
陈鹏(1996-),男,硕士研究生。主要研究方向为人工智能与数字图像处理。E-mail:chen_peng2023@163.com
CHEN Peng1(), JIANG Hao2, XIANG Wei1(
)
Received:
2023-02-07
Accepted:
2023-06-15
Online:
2023-10-31
Published:
2023-10-31
Contact:
XIANG Wei (1991-), lecturer, Ph.D. His main research interests cover intelligent design, etc. E-mail:About author:
CHEN Peng (1996-), master students. His main research interests cover artificial intelligence and digital image processing. E-mail:chen_peng2023@163.com
摘要:
近年来,三维(3D)显示器由于其优越的沉浸式体验而受到越来越多的关注。然而3D内容的缺乏限制了3D显示器的发展。为了获得稀缺的3D内容,二维(2D)到3D转换是一种有前途且有效的方法。转换需要向2D内容添加额外的深度信息。然而,现有的深度估计方法由于其不稳定性,不能满足2D到3D转换的要求。为此提出一种立体图像呈现系统,其在考虑人类感知的同时,将单目图像转换为一对用于3D显示的立体图像。该系统的核心步骤提出了一种考虑人类感知的深度优化算法(DOCHP),以语义分割图作为输入,通过考虑人类感知(包括注意力机制和深度感知)来生成优化的深度图,增强立体图像的立体效果。实验结果表明,采用系统优化的深度图生成立体图像,可以让用户感受到较强的3D效果。此结果显示了立体图像制作中考虑人类感知特征的必要性,也将支持裸眼立体图像的推广应用。
中图分类号:
陈鹏, 江浩, 向为. 考虑用户感知的立体图像生成[J]. 图学学报, 2023, 44(5): 966-977.
CHEN Peng, JIANG Hao, XIANG Wei. Stereoscopic image generation considering human perception[J]. Journal of Graphics, 2023, 44(5): 966-977.
特点 | 样例 | |
---|---|---|
高级 特征 | 识别率:由 模型识别分 割块的类型 和精度 | ![]() |
显著性图: 显示注意力 分布的灰度图 (亮区域表示 浓度高) | ![]() | |
低级 特征 | 颜色:分割 块与其他分 割块之间的 颜色对比的 总和 | ![]() |
布局:线段 中心与图像 中心之间的 距离 | ![]() | |
深度:深度 图中分割块 的平均深度 | ![]() |
表1 注意力计算阶段使用的特征
Table 1 The features used in the stage of attention calculation
特点 | 样例 | |
---|---|---|
高级 特征 | 识别率:由 模型识别分 割块的类型 和精度 | ![]() |
显著性图: 显示注意力 分布的灰度图 (亮区域表示 浓度高) | ![]() | |
低级 特征 | 颜色:分割 块与其他分 割块之间的 颜色对比的 总和 | ![]() |
布局:线段 中心与图像 中心之间的 距离 | ![]() | |
深度:深度 图中分割块 的平均深度 | ![]() |
图4 深度优化((a)深度冲突示例;(b) 2只长颈鹿的深度分布图(不是真实数据,但提供了解释);(c) 2次操作案例)
Fig. 4 Depth optimization ((a) Example of a depth conflict; (b) Depth distribution of two giraffes in the example (not real data, but provided for explanation); (c) Examples of two operations)
图5 分数可视化((a)彩色图;(b)基于高级特征的得分;(c)基于低级特征的得分;(d)基于深度信息的得分;(e)结合所有3个得分的注意力得分)
Fig. 5 Visualization of scores ((a) Original color image;(b) Scores based on high-level features; (c) Scores based on low-level features; (d) Scores based on depth information; (e) Attention scores combining all three scores)
图6 迭代调整阶段前后的结果((a) 2D图像;(b)之前;(c)之后)
Fig. 6 Results before and after the stage of iterative adjustment ((a) 2D image; (b) Before iterative adjustment; (c) After iterative adjustment)
图7 深度图的比较((a)彩色图;(b)生成的估计深度图[31];(c)优化的增强深度图[35];(d)手动深度图;(e) DOCHP优化的深度图)
Fig. 7 Comparison of depth maps ((a) Original color image; (b) Shows the estimated depth maps generated by[31]; (c) The enhanced depth maps optimized[35]; (d) The manual depth maps; (e) The depth maps optimized by DOCHP)
模型 | P(M) | P(M|data) | BF10 |
---|---|---|---|
Null model (包含被试、图像) | 0.333 | 5.932×10-11 | 1.000 |
生成方法 | 0.333 | 0.510 | 8.598×109 |
生成方法、图像 | 0.333 | 0.490 | 8.260×109 |
表2 基于模型的分析
Table 2 Model-based analysis
模型 | P(M) | P(M|data) | BF10 |
---|---|---|---|
Null model (包含被试、图像) | 0.333 | 5.932×10-11 | 1.000 |
生成方法 | 0.333 | 0.510 | 8.598×109 |
生成方法、图像 | 0.333 | 0.490 | 8.260×109 |
变量 | 水平 | 均值 | 方差 | 95%置信 度-低 | 95%置信 度-高 |
---|---|---|---|---|---|
生成 方法 | 本文方法 | 0.404 | 0.102 | 0.198 | 0.609 |
手工 | 0.484 | 0.104 | 0.276 | 0.692 | |
估计 | -0.610 | 0.105 | -0.820 | -0.403 | |
传统优化 | -0.278 | 0.101 | -0.484 | -0.080 | |
图 | -0.328 | 0.166 | -0.661 | -0.002 | |
0.240 | 0.162 | -0.080 | 0.564 | ||
-0.307 | 0.163 | -0.634 | 0.010 | ||
0.239 | 0.163 | -0.089 | 0.563 | ||
-0.205 | 0.161 | -0.530 | 0.117 | ||
0.513 | 0.169 | 0.181 | 0.852 | ||
0.314 | 0.165 | -0.014 | 0.638 | ||
-0.329 | 0.164 | -0.670 | -0.014 | ||
-0.108 | 0.162 | -0.433 | 0.211 | ||
-0.028 | 0.161 | -0.352 | 0.283 |
表3 模型后验总结
Table 3 Model summary
变量 | 水平 | 均值 | 方差 | 95%置信 度-低 | 95%置信 度-高 |
---|---|---|---|---|---|
生成 方法 | 本文方法 | 0.404 | 0.102 | 0.198 | 0.609 |
手工 | 0.484 | 0.104 | 0.276 | 0.692 | |
估计 | -0.610 | 0.105 | -0.820 | -0.403 | |
传统优化 | -0.278 | 0.101 | -0.484 | -0.080 | |
图 | -0.328 | 0.166 | -0.661 | -0.002 | |
0.240 | 0.162 | -0.080 | 0.564 | ||
-0.307 | 0.163 | -0.634 | 0.010 | ||
0.239 | 0.163 | -0.089 | 0.563 | ||
-0.205 | 0.161 | -0.530 | 0.117 | ||
0.513 | 0.169 | 0.181 | 0.852 | ||
0.314 | 0.165 | -0.014 | 0.638 | ||
-0.329 | 0.164 | -0.670 | -0.014 | ||
-0.108 | 0.162 | -0.433 | 0.211 | ||
-0.028 | 0.161 | -0.352 | 0.283 |
[1] | 李艳莉, 徐若锋. 基于统计特性的DIBR图像的无参考质量评价[J]. 激光与光电子学进展, 2022, 59(8): 228-236. |
LI Y L, XU R F. No-reference image quality assessment of DIBR-synthesized images based on statistical characteristics[J]. Laser & Optoelectronics Progress, 2022, 59(8): 228-236. (in Chinese) | |
[2] | PATIL S, CHARLES P. Review on 2D-to-3D image and video conversion methods[C]// 2015 International Conference on Computing Communication Control and Automation. New York: IEEE Press, 2015: 728-732. |
[3] | 沙浩, 刘越. 基于深度学习的图像本征属性预测方法综述[J]. 图学学报, 2021, 42(3): 385-397. |
SHA H, LIU Y. Review on deep learning based prediction of image intrinsic properties[J]. Journal of Graphics, 2021, 42(3): 385-397. (in Chinese) | |
[4] | LI Z Q, SNAVELY N. MegaDepth: learning single-view depth prediction from Internet photos[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2041-2050. |
[5] | DE SILVA D V S X, FERNANDO W A C, WORRALL S T, et al. Just noticeable difference in depth model for stereoscopic 3D displays[C]// 2010 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2010: 1219-1224. |
[6] |
TERZIĆ K, HANSARD M. Methods for reducing visual discomfort in stereoscopic 3D: a review[J]. Signal Processing: Image Communication, 2016, 47: 402-416.
DOI URL |
[7] | 牟琦, 张寒, 何志强, 等. 基于深度估计和特征融合的尺度自适应目标跟踪算法[J]. 图学学报, 2021, 42(4): 563-571. |
MU Q, ZHANG H, HE Z Q, et al. Scale adaptive target tracking algorithm based on depth estimation and feature fusion[J]. Journal of Graphics, 2021, 42(4): 563-571. (in Chinese) | |
[8] | JI P, WANG L H, LI D X, et al. An automatic 2D to 3D conversion algorithm using multi-depth cues[C]// 2012 International Conference on Audio, Language and Image Processing. New York: IEEE Press, 2012: 546-550. |
[9] | KONRAD J, WANG M, ISHWAR P. 2D-to-3D image conversion by learning depth from examples[C]// 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2012: 16-22. |
[10] | 李林鑫. 2D视频到双目3D视频转换关键技术研究[D]. 南京: 东南大学, 2021. |
LI L X. Research on key technologies of 2D video to binocular 3D video conversion[D]. Nanjing: Southeast University, 2021. (in Chinese) | |
[11] | 黄军, 王聪, 刘越, 等. 单目深度估计技术进展综述[J]. 中国图象图形学报, 2019, 24(12): 2081-2097. |
HUANG J, WANG C, LIU Y, et al. The progress of monocular depth estimation technology[J]. Journal of Image and Graphics, 2019, 24(12): 2081-2097. (in Chinese) | |
[12] | LADICKÝ L, SHI J B, POLLEFEYS M. Pulling things out of perspective[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 89-96. |
[13] | SHI J P, TAO X, XU L, et al. Break Ames room illusion: depth from general single images[J]. ACM Transactions on Graphics, 2015, 34(6): 225: 1-225:11. |
[14] | ZHOU W, SALZMANN M, HE X M, et al. Indoor scene structure analysis for single image depth estimation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 614-622. |
[15] |
KARSCH K, LIU C, KANG S B. Depth transfer: depth extraction from video using non-parametric sampling[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(11): 2144-2158.
DOI PMID |
[16] | KIM S, PARK K, SOHN K, et al. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields[C]// European Conference on Computer Vision. Cham: Springer, 2016: 143-159. |
[17] | 张晓娜. 基于深度学习的单目图像深度估计[D]. 石家庄: 河北师范大学, 2020. |
ZHANG X N. Deep learning based monocular depth estimation[D]. Shijiazhuang: Hebei Normal University, 2020. (in Chinese) | |
[18] | GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 3354-3361. |
[19] | SAXENA A, SUN M, NG A Y. Learning 3-D scene structure from a single still image[C]// 2007 IEEE 11th International Conference on Computer Vision. New York: IEEE Press, 2007: 1-8. |
[20] |
LIN L X, HUANG G H, CHEN Y J, et al. Efficient and high-quality monocular depth estimation via gated multi-scale network[J]. IEEE Access, 2020, 8: 7709-7718.
DOI URL |
[21] |
FU J W, LIANG J, WANG Z Y. Monocular depth estimation based on multi-scale graph convolution networks[J]. IEEE Access, 2019, 8: 997-1009.
DOI URL |
[22] | KUZNIETSOV Y, STÜCKLER J, LEIBE B. Semi-supervised deep learning for monocular depth map prediction[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2215-2223. |
[23] | GOLDMAN M, HASSNER T, AVIDAN S. Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 2886-2895. |
[24] | CHEN W F, FU Z, YANG D W, et al. Single-image depth perception in the wild[C]// The 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 730-738. |
[25] |
JUNG Y J, SOHN H, LEE S I, et al. Visual comfort improvement in stereoscopic 3D displays using perceptually plausible assessment metric of visual comfort[J]. IEEE Transactions on Consumer Electronics, 2014, 60(1): 1-9.
DOI URL |
[26] |
OH C, HAM B, CHOI S, et al. Visual fatigue relaxation for stereoscopic video via nonlinear disparity remapping[J]. IEEE Transactions on Broadcasting, 2015, 61(2): 142-153.
DOI URL |
[27] |
LEI J J, ZHANG C C, FANG Y M, et al. Depth sensation enhancement for multiple virtual view rendering[J]. IEEE Transactions on Multimedia, 2015, 17(4): 457-469.
DOI URL |
[28] |
ISLAM M B, WONG L-K, LOW K-L, et al. Aesthetics-driven stereoscopic 3-D image recomposition with depth adaptation[J]. IEEE Transactions on Multimedia, 2018, 20(11): 2964-2979.
DOI URL |
[29] |
SHAO F, LIN W C, LIN W S, et al. QoE-guided warping for stereoscopic image retargeting[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2017, 26(10): 4790-4805.
DOI URL |
[30] |
CHUN M M. Contextual cueing of visual attention[J]. Trends in Cognitive Sciences, 2000, 4(5): 170-178.
PMID |
[31] | HOU Q B, CHENG M M, HU X W, et al. Deeply supervised salient object detection with short connections[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5300-5309. |
[32] | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988. |
[33] | YAN Q, XU L, SHI J P, et al. Hierarchical saliency detection[C]// 2013 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2013: 1155-1162. |
[34] |
HIBBARD P B, HAINES A E, HORNSEY R L. Magnitude, precision, and realism of depth perception in stereoscopic vision[J]. Cognitive Research: Principles and Implications, 2017, 2: 25: 1-25:11.
DOI URL |
[35] |
JUNG S W, KO S J. Depth sensation enhancement using the just noticeable depth difference[J]. IEEE Transactions on Image Processing, 2012, 21(8): 3624-3637.
DOI URL |
[36] | KAY M, NELSON G L, HEKLER E B. Researcher-centered design of statistics: why Bayesian statistics better fit the culture and incentives of HCI[C]// 2016 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2016: 4521-4532. |
[37] |
MOREY R D, ROMEIJN J W, ROUDER J N. The philosophy of Bayes factors and the quantification of statistical evidence[J]. Journal of Mathematical Psychology, 2016, 72: 6-18.
DOI URL |
[38] |
WAGENMAKERS E J, MARSMAN M, JAMIL T, et al. Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications[J]. Psychonomic Bulletin & Review, 2018, 25(1): 35-57.
DOI URL |
[39] |
SCHÖNBRODT F D, WAGENMAKERS E J. Bayes factor design analysis: planning for compelling evidence[J]. Psychonomic Bulletin & Review, 2018, 25(1): 128-142.
DOI URL |
[40] |
ROUDER J N, MOREY R D, VERHAGEN J, et al. Bayesian analysis of factorial designs[J]. Psychological Methods, 2017, 22(2): 304-321.
DOI PMID |
[1] | 闫善武, 肖洪兵, 王瑜, 孙梅. 融合行人时空信息的视频异常检测[J]. 图学学报, 2023, 44(1): 95-103. |
[2] | 谷雨, 赵军. 列车闸瓦钎及闸瓦故障图像检测算法研究[J]. 图学学报, 2023, 44(1): 88-94. |
[3] | 张晨阳, 曹艳华, 杨晓忠. 基于分数阶小波与引导滤波的多聚焦图像融合方法[J]. 图学学报, 2023, 44(1): 77-87. |
[4] | 邵英杰, 尹辉, 谢颖, 黄华. 草图引导的选择循环推理式人脸图像修复网络[J]. 图学学报, 2023, 44(1): 67-76. |
[5] | 潘森垒, 钱文华, 曹进德, 徐丹. 基于注意力机制的东巴画情感分类[J]. 图学学报, 2023, 44(1): 59-66. |
[6] | 单芳湄, 王梦文, 李敏. 融合注意力机制的肠道息肉分割多尺度卷积神经网络[J]. 图学学报, 2023, 44(1): 50-58. |
[7] | 张倩, 王夏黎, 王炜昊, 武历展, 李超. 基于多尺度特征融合的细胞计数方法[J]. 图学学报, 2023, 44(1): 41-49. |
[8] | 邵文斌, 刘玉杰, 孙晓瑞, 李宗民. 基于残差增强注意力的跨模态行人重识别[J]. 图学学报, 2023, 44(1): 33-40. |
[9] | 皮骏, 刘宇恒, 李久昊. 基于YOLOv5s的轻量化森林火灾检测算法研究[J]. 图学学报, 2023, 44(1): 26-32. |
[10] | 裴卉宁, 邵星辰, 谭昭芸, 黄雪芹, 白仲航. 融合DE-GWO与SVR的文化意象预测模型[J]. 图学学报, 2023, 44(1): 184-193. |
[11] | 范震, 刘晓静, 李小波, 崔亚超. 一种对光照和遮挡鲁棒的单应性估计方法[J]. 图学学报, 2023, 44(1): 166-176. |
[12] | 李小波, 李阳贵, 郭宁, 范震. 融合注意力机制的YOLOv5口罩检测算法[J]. 图学学报, 2023, 44(1): 16-25. |
[13] | 刘振晔, 陈仁杰, 刘利刚. 基于边长的三维形状插值[J]. 图学学报, 2023, 44(1): 158-165. |
[14] | 王佳栋, 曹娟, 陈中贵. 保特征的点云骨架提取算法[J]. 图学学报, 2023, 44(1): 146-157. |
[15] | 王玉萍, 曾毅, 李胜辉, 张磊. 一种基于Transformer的三维人体姿态估计方法[J]. 图学学报, 2023, 44(1): 139-145. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||