图学学报 ›› 2025, Vol. 46 ›› Issue (6): 1316-1326.DOI: 10.11996/JG.j.2095-302X.2025061316
收稿日期:2025-02-12
接受日期:2025-04-25
出版日期:2025-12-30
发布日期:2025-12-27
通讯作者:鲁鹏(1978-),男,副教授,博士。主要研究方向为计算机图形学与计算机视觉等。E-mail:lupeng@bupt.edu.cn第一作者:曹璐静(2000-),女,硕士研究生。主要研究方向为计算机视觉与视频上色。E-mail:Una@bupt.edu.cn
Received:2025-02-12
Accepted:2025-04-25
Published:2025-12-30
Online:2025-12-27
First author:CAO Lujing (2000-), master student. Her main research interests cover computer vision and video colorization. E-mail:Una@bupt.edu.cn
摘要:
利用多参考图像引导视频上色是一种高效的用户意图引导上色方法,且可更好地应对视频中的场景切换。针对上色过程中如何合理分配参考图的色彩、确保上色结果忠实于用户提供的参考图,并保持色彩自然性和时序一致性等问题。提出了一种基于多参考图像的视频上色方法。首先,设计了参考图像特征提取与推荐模块,通过卷积神经网络(CNN)提取多张参考图像的特征,并计算其与待上色灰度帧之间的语义相似度,以此为基础为灰度帧推荐色彩信息。接着,时序色彩模块引入约束注意力机制,通过前一帧的色彩信息为当前帧提供色彩建议,从而确保色彩过渡的自然性与时序一致性。然后,色彩融合网络将参考图像推荐的色彩与时序色彩特征进行融合,解决了来自多个色彩源的色彩冲突,生成协调一致的色彩表示。最后,解码器模块将融合后的色彩信息解码为最终的彩色视频帧。实验结果表明,该方法在多个公共数据集上表现优异,特别是在复杂场景切换时,生成的视频在视觉效果、色彩过渡平滑度和整体一致性方面均有显著提升,展示了其在视频上色领域的广泛应用潜力。
中图分类号:
曹璐静, 鲁鹏. 一种基于多参考图的视频上色方法[J]. 图学学报, 2025, 46(6): 1316-1326.
CAO Lujing, LU Peng. A video colorization method based on multiple reference images[J]. Journal of Graphics, 2025, 46(6): 1316-1326.
图5 典型场景和光照示例((a) 不同主题下的典型场景;(b) 不同光照条件下的样本)
Fig. 5 Typical scenarios and examples of lighting conditions ((a) Typical scenes under different themes; (b) Samples under different lighting conditions)
| 方法 | PSNR | LPIPS | FID | SSIM | CF | tOF | tLP |
|---|---|---|---|---|---|---|---|
| 文献[ | 20.90 | 0.192 | 28.88 | 0.855 | 82.43 | 0.682 2 | 0.729 1 |
| 文献[ | 18.51 | 0.354 | 33.45 | 0.610 | 83.06 | 0.176 5 | 0.748 8 |
| 文献[ | 17.69 | 0.494 | 227.50 | 0.547 | 74.29 | 1.048 3 | 7.265 2 |
| 文献[ | 27.71 | 0.143 | 47.60 | 0.896 | 84.24 | 0.158 1 | 0.688 0 |
| 本文方法 | 30.27 | 0.071 | 24.75 | 0.894 | 86.11 | 0.069 2 | 0.535 4 |
表1 本文方法与最先进的方法比较
Table 1 Comparison with the state-of-the-art methods
| 方法 | PSNR | LPIPS | FID | SSIM | CF | tOF | tLP |
|---|---|---|---|---|---|---|---|
| 文献[ | 20.90 | 0.192 | 28.88 | 0.855 | 82.43 | 0.682 2 | 0.729 1 |
| 文献[ | 18.51 | 0.354 | 33.45 | 0.610 | 83.06 | 0.176 5 | 0.748 8 |
| 文献[ | 17.69 | 0.494 | 227.50 | 0.547 | 74.29 | 1.048 3 | 7.265 2 |
| 文献[ | 27.71 | 0.143 | 47.60 | 0.896 | 84.24 | 0.158 1 | 0.688 0 |
| 本文方法 | 30.27 | 0.071 | 24.75 | 0.894 | 86.11 | 0.069 2 | 0.535 4 |
图6 不同方法在多种类型视频片段中的实验对比和定性分析((a) 参考图像;(b) 灰度视频帧;(c) 文献[24]结果;(d) 文献[37]结果;(e) 文献[9]结果;(f) 文献[10]结果;(g) 本文结果)
Fig. 6 Experimental comparison and qualitative analysis of different methods in multiple types of video clips ((a) Reference image; (b) Grayscale video frames; (c) Results of reference [24]; (d) Results of reference [37]; (e) Results of reference [9]; (f) Results of reference [10]; (g) Ours)
| 方法 | 文献[ | 文献[ | 文献[ | 文献[ | Ours |
|---|---|---|---|---|---|
| 文献[ | - | 1.48×10-3 | 1.37×10-10 | 6.98×10-6 | 7.56×10-31 |
| 文献[ | - | - | 4.79×10-25 | 2.01×10-17 | 1.89×10-53 |
| 文献[ | - | - | - | 7.20×10-1 | 6.25×10-6 |
| 文献[ | - | - | - | - | 1.18×10-10 |
| 本文方法 | - | - | - | - | - |
表2 Dunn检验结果
Table 2 Dunn test results
| 方法 | 文献[ | 文献[ | 文献[ | 文献[ | Ours |
|---|---|---|---|---|---|
| 文献[ | - | 1.48×10-3 | 1.37×10-10 | 6.98×10-6 | 7.56×10-31 |
| 文献[ | - | - | 4.79×10-25 | 2.01×10-17 | 1.89×10-53 |
| 文献[ | - | - | - | 7.20×10-1 | 6.25×10-6 |
| 文献[ | - | - | - | - | 1.18×10-10 |
| 本文方法 | - | - | - | - | - |
| 方法 | PSNR | LPIPS | FID | SSIM |
|---|---|---|---|---|
| 去除时序色彩 特征推荐模块 | 19.99 | 0.290 | 70.51 | 0.877 |
| 去除色彩融合网络 | 19.48 | 0.325 | 84.35 | 0.880 |
| 本文方法 | 28.62 | 0.084 | 27.11 | 0.894 |
表3 多参考图视频上色模型消融实验
Table 3 Multi-reference image video colorization model ablation experiments
| 方法 | PSNR | LPIPS | FID | SSIM |
|---|---|---|---|---|
| 去除时序色彩 特征推荐模块 | 19.99 | 0.290 | 70.51 | 0.877 |
| 去除色彩融合网络 | 19.48 | 0.325 | 84.35 | 0.880 |
| 本文方法 | 28.62 | 0.084 | 27.11 | 0.894 |
| L1损失 | LS-GAN损失 | 循环一致性损失 | TV正则损失 | PSNR | LPIPS | FID | SSIM |
|---|---|---|---|---|---|---|---|
| √ | √ | √ | 12.92 | 0.560 | 86.69 | 0.335 | |
| √ | √ | √ | 14.17 | 0.459 | 59.72 | 0.342 | |
| √ | √ | √ | 15.27 | 0.294 | 49.58 | 0.347 | |
| √ | √ | √ | 15.86 | 0.221 | 47.52 | 0.347 | |
| √ | √ | √ | √ | 28.62 | 0.084 | 27.11 | 0.894 |
表4 损失函数消融实验
Table 4 Loss function ablation experiment
| L1损失 | LS-GAN损失 | 循环一致性损失 | TV正则损失 | PSNR | LPIPS | FID | SSIM |
|---|---|---|---|---|---|---|---|
| √ | √ | √ | 12.92 | 0.560 | 86.69 | 0.335 | |
| √ | √ | √ | 14.17 | 0.459 | 59.72 | 0.342 | |
| √ | √ | √ | 15.27 | 0.294 | 49.58 | 0.347 | |
| √ | √ | √ | 15.86 | 0.221 | 47.52 | 0.347 | |
| √ | √ | √ | √ | 28.62 | 0.084 | 27.11 | 0.894 |
| [1] | WU Y Z, WANG X T, LI Y, et al. Towards vivid and diverse image colorization with generative color prior[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14357-14366. |
| [2] |
PAN X G, ZHAN X H, DAI B, et al. Exploiting deep generative prior for versatile image restoration and manipulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(11): 7474-7489.
DOI URL |
| [3] |
CHENG Z Z, YANG Q X, SHENG B. Colorization using neural network ensemble[J]. IEEE Transactions on Image Processing, 2017, 26(11): 5491-5505.
DOI PMID |
| [4] | REINHARD E, ADHIKHMIN M, GOOCH B, et al. Color transfer between images[J]. IEEE Computer Graphics and Applications, 2001, 21(5): 34-41. |
| [5] | IRONY R, COHEN-OR D, LISCHINSKI D. Colorization by example[C]// The 16th Eurographics conference on Rendering Techniques. Aire-la-Ville: Eurographics Association Press, 2005: 201-210. |
| [6] | LEI C Y, CHEN Q F. Fully automatic video colorization with self-regularization and diversity[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3748-3756. |
| [7] | LIU Y M, COHEN M, UYTTENDAELE M, et al. AutoStyle: automatic style transfer from image collections to users’ images[J]. Computer Graphics Forum, 2014, 33(4): 21-31. |
| [8] |
KHAN A, JIANG L, LI W, et al. Fast color transfer from multiple images[J]. Applied Mathematics-A Journal of Chinese Universities, 2017, 32(2): 183-200.
DOI URL |
| [9] |
WANG H Z, ZHAI D M, LIU X M, et al. Unsupervised deep exemplar colorization via pyramid dual non-local attention[J]. IEEE Transactions on Image Processing, 2023, 32: 4114-4127.
DOI URL |
| [10] |
YANG Y X, PAN J S, PENG Z Z, et al. BiSTNet: semantic image prior guided bidirectional temporal feature fusion for deep exemplar-based video colorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(8): 5612-5624.
DOI URL |
| [11] | SINGH A, CHANANI A, KARNICK H. Video colorization using CNNs and keyframes extraction: an application in saving bandwidth[C]// The 4th International Conference on Computer Vision and Image Processing. Cham: Springer, 2020: 190-198. |
| [12] | MEYER S, CORNILLÈRE V, DJELOUAH A, et al. Deep video color propagation[EB/OL]. [2024-08-12]. http://bmvc2018.org/contents/papers/0521.pdf. |
| [13] | YAO C H, CHANG C Y, CHIEN S Y. Occlusion-aware video temporal consistency[C]// The 25th ACM International Conference on Multimedia. New York: ACM, 2017: 777-785. |
| [14] | BONNEEL N, TOMPKIN J, SUNKAVALLI K, et al. Blind video temporal consistency[J]. ACM Transactions on Graphics (TOG), 2015, 34(6): 196. |
| [15] | VONDRICK C, SHRIVASTAVA A, FATHI A, et al. Tracking emerges by colorizing videos[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 402-419. |
| [16] | ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: evolution of optical flow estimation with deep networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1647-1655. |
| [17] | JAMPANI V, GADDE R, GEHLER P V. Video propagation networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3154-3164. |
| [18] |
PAUL S, BHATTACHARYA S, GUPTA S. Spatiotemporal colorization of video using 3D steerable pyramids[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(8): 1605-1619.
DOI URL |
| [19] | WU R Z, LIN H J, QI X J, et al. Memory selection network for video propagation[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 175-190. |
| [20] | EILERTSEN G, MANTIUK R K, UNGER J. Single-frame regularization for temporally stable CNNs[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 11168-11177. |
| [21] | LAI W S, HUANG J B, WANG O, et al. Learning blind video temporal consistency[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 179-195. |
| [22] | SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[C]// The 29th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 802-810. |
| [23] | LEI C Y, XING Y Z, CHEN Q S. Blind video temporal consistency via deep video prior[C]//// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 92. |
| [24] | ZHANG B, HE M M, LIAO J, et al. Deep exemplar-based video colorization[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 8044-8053. |
| [25] |
ZHAO Y Z, PO L M, LIU K C, et al. SVCNet: scribble-based video colorization network with temporal aggregation[J]. IEEE Transactions on Image Processing, 2023, 32: 4443-4458.
DOI PMID |
| [26] |
HORN B K P, SCHUNCK B G. Determining optical flow[J]. Artificial Intelligence, 1981, 17(1/3): 185-203.
DOI URL |
| [27] |
ZHANG H, XU T, LI H S, et al. StackGAN++: realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1947-1962.
DOI PMID |
| [28] | MAO X D, LI Q, XIE H R, et al. Least squares generative adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2813-2821. |
| [29] | WANG X L, JABRI A, EFROS A A. Learning correspondence from the cycle-consistency of time[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 2561-2571. |
| [30] | SANGKLOY P, LU J W, FANG C, et al. Scribbler: controlling deep image synthesis with sketch and color[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6836-6845. |
| [31] | MARSZALEK M, LAPTEV I, SCHMID C. Actions in context[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 2929-2936. |
| [32] |
WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
DOI PMID |
| [33] | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595. |
| [34] | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a Nash equilibrium[EB/OL]. [2024-08-12]. https://arxiv.org/abs/1706.08500v1. |
| [35] | HASLER D, SUESSTRUNK S E. Measuring colorfulness in natural images[C]// 2003 Human Vision and Electronic Imaging VIII. Bellingham: SPIE, 2003: 87-95. |
| [36] | CHU M Y, THUEREY N. Data-driven synthesis of smoke flows with CNN-based feature descriptors[J]. ACM Transactions on Graphics (TOG), 2017, 36(4): 69. |
| [37] | LU P, YU J B, PENG X J, et al. Gray2ColorNet: transfer more colors from reference image[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 3210-3218. |
| [1] | 刘圆圆, 房友江, 孟天宇, 孟政宇, 罗鹏伟, 杨培根, 姜雨彤, 魏小鹏, 张强, 杨鑫. 基于几何超图感知的三维场景图生成[J]. 图学学报, 2025, 46(6): 1337-1345. |
| [2] | 樊乐翔, 马冀, 周登文. 基于退化分离的轻量级盲超分辨率重建网络[J]. 图学学报, 2025, 46(6): 1304-1315. |
| [3] | 李星辰, 李宗民, 杨超智. 基于可信伪标签微调的测试时适应算法[J]. 图学学报, 2025, 46(6): 1292-1303. |
| [4] | 张馨匀, 张力文, 周李, 罗笑南. 基于图像分块交互的咖啡果实成熟度预测模型[J]. 图学学报, 2025, 46(6): 1274-1280. |
| [5] | 于男男, 孟政宇, 房友江, 孙传昱, 殷雪峰, 张强, 魏小鹏, 杨鑫. 融合多频超图的图像语义分割方法[J]. 图学学报, 2025, 46(6): 1267-1273. |
| [6] | 贺蒙蒙, 张小艳, 李洪安. 基于Mamba结构的轻量级皮肤病变图像分割网络[J]. 图学学报, 2025, 46(6): 1257-1266. |
| [7] | 岳子佳, 王文嵩, 陈双敏, 辛士庆, 屠长河. 跨越开边界的测地距离传播[J]. 图学学报, 2025, 46(5): 1042-1049. |
| [8] | 黄凯奇, 武美奇, 陈宏昊, 丰效坤, 张岱凌. 视觉图灵三境界:大模型时代下视觉智能进展与展望[J]. 图学学报, 2025, 46(5): 919-930. |
| [9] | 黄敬, 时瑞浩, 宋文明, 郭和攀, 魏璜, 魏小松, 姚剑. 自动驾驶图像合成方法综述:从模拟器到新范式[J]. 图学学报, 2025, 46(5): 931-949. |
| [10] | 翟永杰, 翟邦朝, 胡哲东, 杨珂, 王乾铭, 赵晓瑜. 基于自适应特征融合金字塔与注意力机制的输电线路绝缘子缺陷检测方法[J]. 图学学报, 2025, 46(5): 950-959. |
| [11] | 冷烁, 王玮, 欧家勇, 薛志刚, 宋英龙, 莫斯钧. 基于大型视觉语言模型的施工现场安全监控研究[J]. 图学学报, 2025, 46(5): 960-968. |
| [12] | 叶文龙, 陈斌. PanoLoRA:基于Stable Diffusion的全景图像生成的高效微调方法[J]. 图学学报, 2025, 46(5): 980-989. |
| [13] | 朱泓淼, 钟国杰, 张严辞. 基于均值漂移与深度学习融合的小语义点云语义分割[J]. 图学学报, 2025, 46(5): 998-1009. |
| [14] | 郭瑞东, 蓝贵文, 范冬林, 钟展, 徐梓睿, 任新月. 基于特征聚焦扩散网络的电力巡检目标检测算法[J]. 图学学报, 2025, 46(4): 719-726. |
| [15] | 雷松林, 赵征鹏, 阳秋霞, 普园媛, 谷金晶, 徐丹. 基于可解耦扩散模型的零样本风格迁移[J]. 图学学报, 2025, 46(4): 727-738. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||
