Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2025, Vol. 46 ›› Issue (6): 1316-1326.DOI: 10.11996/JG.j.2095-302X.2025061316

• Image Processing and Computer Vision • Previous Articles     Next Articles

A video colorization method based on multiple reference images

CAO Lujing(), LU Peng()   

  1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2025-02-12 Accepted:2025-04-25 Online:2025-12-30 Published:2025-12-27
  • Contact: LU Peng
  • About author:First author contact:

    CAO Lujing (2000-), master student. Her main research interests cover computer vision and video colorization. E-mail:Una@bupt.edu.cn

Abstract:

Multiple reference images were used to guide video colorization, which provides an efficient means of user-intent guidance and can better handle scene changes in videos. However, challenges remain in the allocation of color information from the reference images, in ensuring that the colorized result faithfully matches the user’s reference images, and in maintaining color naturalness and temporal consistency. To address these challenges, a video colorization method based on multiple reference images was proposed. First, a reference image feature extraction and recommendation module was designed. Convolutional neural networks were employed to extract features from multiple reference images and to calculate their semantic similarity to the grayscale video frames, upon which color information was recommended for the grayscale frames based on this similarity. Next, a temporal color module was introduced, in which a constrained attention mechanism used color information from the previous frame to guide the colorization of the current frame, ensuring natural color transitions and temporal consistency. Then, a color fusion network fused the recommended color from reference images with temporal color features, resolving conflicts between colors among multiple sources and generating a consistent color representation. Finally, a decoder module decoded the fused color information into the final color video frames. Experimental results demonstrated that the proposed method performed well on several public datasets, especially in handling complex scene transitions. The generated videos significantly improved visual quality, color transition smoothness, and overall consistency, demonstrating its great potential for application in video colorization.

Key words: video coloring, multiple reference images, constrained attention mechanism, temporal consistency, color fusion network

CLC Number: