欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (6): 1316-1326.DOI: 10.11996/JG.j.2095-302X.2025061316

• 图像处理与计算机视觉 • 上一篇    下一篇

一种基于多参考图的视频上色方法

曹璐静(), 鲁鹏()   

  1. 北京邮电大学人工智能学院北京 100876
  • 收稿日期:2025-02-12 接受日期:2025-04-25 出版日期:2025-12-30 发布日期:2025-12-27
  • 通讯作者:鲁鹏(1978-),男,副教授,博士。主要研究方向为计算机图形学与计算机视觉等。E-mail:lupeng@bupt.edu.cn
  • 第一作者:曹璐静(2000-),女,硕士研究生。主要研究方向为计算机视觉与视频上色。E-mail:Una@bupt.edu.cn

A video colorization method based on multiple reference images

CAO Lujing(), LU Peng()   

  1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2025-02-12 Accepted:2025-04-25 Published:2025-12-30 Online:2025-12-27
  • First author:CAO Lujing (2000-), master student. Her main research interests cover computer vision and video colorization. E-mail:Una@bupt.edu.cn

摘要:

利用多参考图像引导视频上色是一种高效的用户意图引导上色方法,且可更好地应对视频中的场景切换。针对上色过程中如何合理分配参考图的色彩、确保上色结果忠实于用户提供的参考图,并保持色彩自然性和时序一致性等问题。提出了一种基于多参考图像的视频上色方法。首先,设计了参考图像特征提取与推荐模块,通过卷积神经网络(CNN)提取多张参考图像的特征,并计算其与待上色灰度帧之间的语义相似度,以此为基础为灰度帧推荐色彩信息。接着,时序色彩模块引入约束注意力机制,通过前一帧的色彩信息为当前帧提供色彩建议,从而确保色彩过渡的自然性与时序一致性。然后,色彩融合网络将参考图像推荐的色彩与时序色彩特征进行融合,解决了来自多个色彩源的色彩冲突,生成协调一致的色彩表示。最后,解码器模块将融合后的色彩信息解码为最终的彩色视频帧。实验结果表明,该方法在多个公共数据集上表现优异,特别是在复杂场景切换时,生成的视频在视觉效果、色彩过渡平滑度和整体一致性方面均有显著提升,展示了其在视频上色领域的广泛应用潜力。

关键词: 视频上色, 多参考图像, 约束注意力机制, 时序一致性, 色彩融合网络

Abstract:

Multiple reference images were used to guide video colorization, which provides an efficient means of user-intent guidance and can better handle scene changes in videos. However, challenges remain in the allocation of color information from the reference images, in ensuring that the colorized result faithfully matches the user’s reference images, and in maintaining color naturalness and temporal consistency. To address these challenges, a video colorization method based on multiple reference images was proposed. First, a reference image feature extraction and recommendation module was designed. Convolutional neural networks were employed to extract features from multiple reference images and to calculate their semantic similarity to the grayscale video frames, upon which color information was recommended for the grayscale frames based on this similarity. Next, a temporal color module was introduced, in which a constrained attention mechanism used color information from the previous frame to guide the colorization of the current frame, ensuring natural color transitions and temporal consistency. Then, a color fusion network fused the recommended color from reference images with temporal color features, resolving conflicts between colors among multiple sources and generating a consistent color representation. Finally, a decoder module decoded the fused color information into the final color video frames. Experimental results demonstrated that the proposed method performed well on several public datasets, especially in handling complex scene transitions. The generated videos significantly improved visual quality, color transition smoothness, and overall consistency, demonstrating its great potential for application in video colorization.

Key words: video coloring, multiple reference images, constrained attention mechanism, temporal consistency, color fusion network

中图分类号: