基于三维流形拟合与分频引导注意力机制的多聚焦图像融合

doi:10.11996/JG.j.2095-302X.2026020351

摘要/Abstract

摘要：

多聚焦图像融合是一种将同一场景下不同聚焦区域的多幅图像整合，以生成一幅同时具备清晰细节与完整结构信息的全聚焦清晰图像的技术，在消费电子、医学成像和卫星遥感等领域应用广泛。针对基于深度学习的图像融合普遍存在信息丢失、伪影、数据集匮乏以及时空开销大等问题，提出了基于三维流形拟合与分频引导注意力机制的融合模型。该模型采用特征分解-融合-重构的新范式，通过在编码阶段有效识别并分离背景结构与细节信息，从而有效减少结构信息损耗与伪影引入；创新性地利用三维流形拟合实现多聚焦图像共性特征提取，降低模型对数据量的依赖，减少时空开销；在特征融合阶段，引入分频引导注意力机制，精准刻画图像高频细节与低频背景，实现跨频域特征的自适应加权融合，缓解复杂纹理模糊、细节缺失等问题。同时为了保障融合图像的全局视觉与局部细节质量，将多种损失约束进行整合设计加权复合损失函数。在公开经典测试集Lytro和MFFW上的实验结果表明在6项常用评价指标中，该方法均取得最优，充分验证了其有效性。

关键词: 多聚焦图像融合, 流形拟合, 特征提取, 交叉注意力, 频域

Abstract:

Multi-focus image fusion is a technique that integrates multiple images of the same scene with different focus regions to generate a fully focused and clear image featuring both distinct details and complete structural information. It has found widespread applications in fields such as consumer electronics, medical imaging, and satellite remote sensing. To address the prevalent issues such as information loss, artifacts, insufficient datasets, and high spatiotemporal overhead in deep learning-based image fusion methods, a novel fusion model based on Three-Dimensional (3D) manifold fitting and frequency-separated guided attention mechanism was proposed. The model adopted a new paradigm of feature decomposition-fusion-reconstruction. During the encoding phase, background structures and detail information were effectively identified and separated, significantly reducing the loss of structural information and the introduction of artifacts. Innovatively, 3D manifold fitting was employed to extract common features of multi-focus images, thereby reducing the model’s dependency on large datasets and lowers spatiotemporal overhead. In the feature fusion stage, a frequency-separated guided attention mechanism was introduced to accurately characterize high-frequency details and low-frequency backgrounds of images, enabling adaptive weighted fusion of cross-frequency domain features and alleviating problems such as blurred complex textures and missing details. Furthermore, to ensure the global visual quality and local detail preservation of the fused image, a weighted composite loss function was designed by integrating multiple loss constraints. Experimental results on public classical test datasets Lytro and MFFW demonstrated that the proposed method achieved state-of-the-art performance across six commonly used evaluation metrics, fully verifying its effectiveness.

Key words: multi-focus image fusion, manifold fitting, feature extraction, cross-attention, frequency domain

中图分类号:

张宙, 王泽宇, 宋海玉, 李威, 葛鸣宇, 王嘉宇, 王文琦. 基于三维流形拟合与分频引导注意力机制的多聚焦图像融合[J]. 图学学报, 2026, 47(2): 351-359.

ZHANG Zhou, WANG Zeyu, SONG Haiyu, LI Wei, GE Mingyu, WANG Jiayu, WANG Wenqi. Multi-focus image fusion based on 3D manifold fitting and frequency division-guided attention mechanism[J]. Journal of Graphics, 2026, 47(2): 351-359.

图/表 8

参考文献 24

[1]	李奕, 吴小俊. 香农熵加权稀疏表示图像融合方法研究[J]. 自动化学报, 2014, 40(8): 1819-1835.
	LI Y, WU X J. Image fusion based on sparse representation using Shannon entropy weighting[J]. Acta Automatica Sinica, 2014, 40(8): 1819-1835 (in Chinese).
[2]	ZHAO L B, ZHANG X L, HUANG B, et al. MFANet: multi-feature aggregation network for multi-focus image fusion[C]// ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2025: 1-5.
[3]	HU X Y, JIANG J J, LIU X M, et al. ZMFF: Zero-shot multi-focus image fusion[J]. Information Fusion, 2023, 92: 127-138. DOI URL
[4]	LIU J Y, LI S T, LIU H B, et al. A lightweight pixel-level unified image fusion network[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(12): 18120-18132. DOI URL
[5]	BAI H W, ZHAO Z X, ZHANG J S, et al. ReFusion: learning image fusion from reconstruction with learnable loss via meta-learning[J]. International Journal of Computer Vision, 2025, 133(5): 2547-2567. DOI
[6]	WANG Z Y, LI X F, ZHAO L B, et al. When multi-focus image fusion networks meet traditional edge-preservation technology[J]. International Journal of Computer Vision, 2023, 131(10): 2529-2552. DOI
[7]	LIU Y, CHEN X, PENG H, et al. Multi-focus image fusion with a deep convolutional neural network[J]. Information Fusion, 2017, 36: 191-207. DOI URL
[8]	AMIN-NAJI M, AGHAGOLZADEH A, EZOJI M. Ensemble of CNN for multi-focus image fusion[J]. Information Fusion, 2019, 51: 201-214. DOI URL
[9]	LI J X, GUO X B, LU G M, et al. DRPL: deep regression pair learning for multi-focus image fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 4816-4831. DOI URL
[10]	潘树焱, 刘立群. MSFAFuse: 基于多尺度特征信息与注意力机制的SAR和可见光图像融合模型[J]. 图学学报, 2025, 46(2): 300-311. DOI
	PAN S Y, LIU L Q. MSFAFuse: SAR and optical image fusion model based on multi-scale feature information and attention mechanism[J]. Journal of Graphics, 2025, 46(2): 300-311 (in Chinese). DOI
[11]	ZHAO Z X, BAI H W, ZHANG J S, et al. CDDFuse: correlation-driven dual-branch feature decomposition for multi-modality image fusion[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 5906-5916.
[12]	ZHAO Z X, XU S, ZHANG C X, et al. DIDFuse: deep image decomposition for infrared and visible image fusion[EB/OL]. [2025-03-22]. https://www.ijcai.org/proceedings/2020/135.
[13]	DENG X, DRAGOTTI P L. Deep convolutional neural network for multi-modal image restoration and fusion[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(10): 3333-3348. DOI URL
[14]	HORÉ A, ZIOU D. Image quality metrics: PSNR vs. SSIM[C]// The 20th International Conference on Pattern Recognition. New York: IEEE Press, 2010: 2366-2369.
[15]	ZHANG J C, LIAO Q M, LIU S J, et al. Real-MFF: a large realistic multi-focus image dataset with ground truth[J]. Pattern Recognition Letters, 2020, 138: 370-377. DOI URL
[16]	NEJATI M, SAMAVI S, SHIRANI S. Multi-focus image fusion using dictionary-based sparse representation[J]. Information Fusion, 2015, 25: 72-84. DOI URL
[17]	XU S, WEI X L, ZHANG C X, et al. MFFW: a new dataset for multi-focus image fusion[EB/OL]. [2025-12-04]. https://arxiv.org/abs/2002.04780.pdf.
[18]	QU G H, ZHANG D L, YAN P F. Information measure for performance of image fusion[J]. Electronics Letters, 2002, 38(7): 313-315. DOI URL
[19]	WANG Q, SHEN Y, JIN J. Performance evaluation of image fusion techniques[M]//STATHAKI T. Image Fusion: Algorithms and Applications. Amsterdam: Academic Press, 2008: 469-492.
[20]	WILLIAMS S. Pearson’s correlation coefficient[J]. The New Zealand Medical Journal, 1996, 109(1015): 38.
[21]	LIANG P W, JIANG J J, LIU X M, et al. Fusion from decomposition: a self-supervised decomposition approach for image fusion[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 719-735.
[22]	JUNG H, KIM Y, JANG H, et al. Unsupervised deep image fusion with structure tensor representations[J]. IEEE Transactions on Image Processing, 2020, 29: 3845-3858. DOI URL
[23]	LI M N, PEI R H, ZHENG T Y, et al. FusionDiff: multi-focus image fusion using denoising diffusion probabilistic models[J]. Expert Systems with Applications, 2024, 238: 121664. DOI URL
[24]	XU H, MA J Y, JIANG J, et al. U2Fusion: a unified unsupervised image fusion network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 502-518. DOI URL

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.733 5	0.820 7	0.679 4	26.035 3	0.970 0	0.862 4
DeFusion^[21]	0.832 2	0.824 8	0.734 4	29.062 7	0.981 4	0.912 5
DIF-Net^[22]	0.848 6	0.825 0	0.708 1	26.523 2	0.982 5	0.911 7
FusionDiff^[23]	0.901 0	0.828 2	0.733 4	26.872 1	0.977 0	0.893 0
U2Fusion^[24]	0.796 6	0.823 0	0.724 6	25.863 2	0.974 9	0.874 3
本文算法	0.949 5	0.851 1	0.793 9	32.681 9	0.988 5	0.932 5

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.733 5	0.820 7	0.679 4	26.035 3	0.970 0	0.862 4
DeFusion^[21]	0.832 2	0.824 8	0.734 4	29.062 7	0.981 4	0.912 5
DIF-Net^[22]	0.848 6	0.825 0	0.708 1	26.523 2	0.982 5	0.911 7
FusionDiff^[23]	0.901 0	0.828 2	0.733 4	26.872 1	0.977 0	0.893 0
U2Fusion^[24]	0.796 6	0.823 0	0.724 6	25.863 2	0.974 9	0.874 3
本文算法	0.949 5	0.851 1	0.793 9	32.681 9	0.988 5	0.932 5

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.673 7	0.816 6	0.577 7	24.020 7	0.954 1	0.822 5
DeFusion^[21]	0.744 5	0.810 6	0.644 4	24.366 1	0.968 5	0.868 1
DIF-Net^[22]	0.773 9	0.810 4	0.614 4	23.908 4	0.968 6	0.846 8
FusionDiff^[23]	0.805 3	0.811 2	0.658 3	23.255 6	0.964 8	0.842 9
U2Fusion^[24]	0.746 7	0.818 8	0.617 9	24.072 8	0.960 5	0.830 3
本文算法	0.877 4	0.831 2	0.708 3	28.826 3	0.975 2	0.872 7

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.673 7	0.816 6	0.577 7	24.020 7	0.954 1	0.822 5
DeFusion^[21]	0.744 5	0.810 6	0.644 4	24.366 1	0.968 5	0.868 1
DIF-Net^[22]	0.773 9	0.810 4	0.614 4	23.908 4	0.968 6	0.846 8
FusionDiff^[23]	0.805 3	0.811 2	0.658 3	23.255 6	0.964 8	0.842 9
U2Fusion^[24]	0.746 7	0.818 8	0.617 9	24.072 8	0.960 5	0.830 3
本文算法	0.877 4	0.831 2	0.708 3	28.826 3	0.975 2	0.872 7

实验	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
消除三维流形拟合模块	0.632 3	0.532 6	0.685 8	25.373 1	0.832 7	0.792 3
消除分频引导注意力模块	0.704 7	0.642 7	0.452 7	25.427 4	0.842 2	0.810 4
完整模型	0.949 5	0.851 1	0.793 9	32.681 9	0.988 5	0.932 5