Multi-focus image fusion based on 3D manifold fitting and frequency division-guided attention mechanism

doi:10.11996/JG.j.2095-302X.2026020351

Abstract

Abstract:

Multi-focus image fusion is a technique that integrates multiple images of the same scene with different focus regions to generate a fully focused and clear image featuring both distinct details and complete structural information. It has found widespread applications in fields such as consumer electronics, medical imaging, and satellite remote sensing. To address the prevalent issues such as information loss, artifacts, insufficient datasets, and high spatiotemporal overhead in deep learning-based image fusion methods, a novel fusion model based on Three-Dimensional (3D) manifold fitting and frequency-separated guided attention mechanism was proposed. The model adopted a new paradigm of feature decomposition-fusion-reconstruction. During the encoding phase, background structures and detail information were effectively identified and separated, significantly reducing the loss of structural information and the introduction of artifacts. Innovatively, 3D manifold fitting was employed to extract common features of multi-focus images, thereby reducing the model’s dependency on large datasets and lowers spatiotemporal overhead. In the feature fusion stage, a frequency-separated guided attention mechanism was introduced to accurately characterize high-frequency details and low-frequency backgrounds of images, enabling adaptive weighted fusion of cross-frequency domain features and alleviating problems such as blurred complex textures and missing details. Furthermore, to ensure the global visual quality and local detail preservation of the fused image, a weighted composite loss function was designed by integrating multiple loss constraints. Experimental results on public classical test datasets Lytro and MFFW demonstrated that the proposed method achieved state-of-the-art performance across six commonly used evaluation metrics, fully verifying its effectiveness.

Key words: multi-focus image fusion, manifold fitting, feature extraction, cross-attention, frequency domain

CLC Number:

ZHANG Zhou, WANG Zeyu, SONG Haiyu, LI Wei, GE Mingyu, WANG Jiayu, WANG Wenqi. Multi-focus image fusion based on 3D manifold fitting and frequency division-guided attention mechanism[J]. Journal of Graphics, 2026, 47(2): 351-359.

Figures/Tables 8

References 24

[1]	李奕, 吴小俊. 香农熵加权稀疏表示图像融合方法研究[J]. 自动化学报, 2014, 40(8): 1819-1835.
	LI Y, WU X J. Image fusion based on sparse representation using Shannon entropy weighting[J]. Acta Automatica Sinica, 2014, 40(8): 1819-1835 (in Chinese).
[2]	ZHAO L B, ZHANG X L, HUANG B, et al. MFANet: multi-feature aggregation network for multi-focus image fusion[C]// ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2025: 1-5.
[3]	HU X Y, JIANG J J, LIU X M, et al. ZMFF: Zero-shot multi-focus image fusion[J]. Information Fusion, 2023, 92: 127-138. DOI URL
[4]	LIU J Y, LI S T, LIU H B, et al. A lightweight pixel-level unified image fusion network[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(12): 18120-18132. DOI URL
[5]	BAI H W, ZHAO Z X, ZHANG J S, et al. ReFusion: learning image fusion from reconstruction with learnable loss via meta-learning[J]. International Journal of Computer Vision, 2025, 133(5): 2547-2567. DOI
[6]	WANG Z Y, LI X F, ZHAO L B, et al. When multi-focus image fusion networks meet traditional edge-preservation technology[J]. International Journal of Computer Vision, 2023, 131(10): 2529-2552. DOI
[7]	LIU Y, CHEN X, PENG H, et al. Multi-focus image fusion with a deep convolutional neural network[J]. Information Fusion, 2017, 36: 191-207. DOI URL
[8]	AMIN-NAJI M, AGHAGOLZADEH A, EZOJI M. Ensemble of CNN for multi-focus image fusion[J]. Information Fusion, 2019, 51: 201-214. DOI URL
[9]	LI J X, GUO X B, LU G M, et al. DRPL: deep regression pair learning for multi-focus image fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 4816-4831. DOI URL
[10]	潘树焱, 刘立群. MSFAFuse: 基于多尺度特征信息与注意力机制的SAR和可见光图像融合模型[J]. 图学学报, 2025, 46(2): 300-311. DOI
	PAN S Y, LIU L Q. MSFAFuse: SAR and optical image fusion model based on multi-scale feature information and attention mechanism[J]. Journal of Graphics, 2025, 46(2): 300-311 (in Chinese). DOI
[11]	ZHAO Z X, BAI H W, ZHANG J S, et al. CDDFuse: correlation-driven dual-branch feature decomposition for multi-modality image fusion[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 5906-5916.
[12]	ZHAO Z X, XU S, ZHANG C X, et al. DIDFuse: deep image decomposition for infrared and visible image fusion[EB/OL]. [2025-03-22]. https://www.ijcai.org/proceedings/2020/135.
[13]	DENG X, DRAGOTTI P L. Deep convolutional neural network for multi-modal image restoration and fusion[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(10): 3333-3348. DOI URL
[14]	HORÉ A, ZIOU D. Image quality metrics: PSNR vs. SSIM[C]// The 20th International Conference on Pattern Recognition. New York: IEEE Press, 2010: 2366-2369.
[15]	ZHANG J C, LIAO Q M, LIU S J, et al. Real-MFF: a large realistic multi-focus image dataset with ground truth[J]. Pattern Recognition Letters, 2020, 138: 370-377. DOI URL
[16]	NEJATI M, SAMAVI S, SHIRANI S. Multi-focus image fusion using dictionary-based sparse representation[J]. Information Fusion, 2015, 25: 72-84. DOI URL
[17]	XU S, WEI X L, ZHANG C X, et al. MFFW: a new dataset for multi-focus image fusion[EB/OL]. [2025-12-04]. https://arxiv.org/abs/2002.04780.pdf.
[18]	QU G H, ZHANG D L, YAN P F. Information measure for performance of image fusion[J]. Electronics Letters, 2002, 38(7): 313-315. DOI URL
[19]	WANG Q, SHEN Y, JIN J. Performance evaluation of image fusion techniques[M]//STATHAKI T. Image Fusion: Algorithms and Applications. Amsterdam: Academic Press, 2008: 469-492.
[20]	WILLIAMS S. Pearson’s correlation coefficient[J]. The New Zealand Medical Journal, 1996, 109(1015): 38.
[21]	LIANG P W, JIANG J J, LIU X M, et al. Fusion from decomposition: a self-supervised decomposition approach for image fusion[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 719-735.
[22]	JUNG H, KIM Y, JANG H, et al. Unsupervised deep image fusion with structure tensor representations[J]. IEEE Transactions on Image Processing, 2020, 29: 3845-3858. DOI URL
[23]	LI M N, PEI R H, ZHENG T Y, et al. FusionDiff: multi-focus image fusion using denoising diffusion probabilistic models[J]. Expert Systems with Applications, 2024, 238: 121664. DOI URL
[24]	XU H, MA J Y, JIANG J, et al. U2Fusion: a unified unsupervised image fusion network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 502-518. DOI URL

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.733 5	0.820 7	0.679 4	26.035 3	0.970 0	0.862 4
DeFusion^[21]	0.832 2	0.824 8	0.734 4	29.062 7	0.981 4	0.912 5
DIF-Net^[22]	0.848 6	0.825 0	0.708 1	26.523 2	0.982 5	0.911 7
FusionDiff^[23]	0.901 0	0.828 2	0.733 4	26.872 1	0.977 0	0.893 0
U2Fusion^[24]	0.796 6	0.823 0	0.724 6	25.863 2	0.974 9	0.874 3
本文算法	0.949 5	0.851 1	0.793 9	32.681 9	0.988 5	0.932 5

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.733 5	0.820 7	0.679 4	26.035 3	0.970 0	0.862 4
DeFusion^[21]	0.832 2	0.824 8	0.734 4	29.062 7	0.981 4	0.912 5
DIF-Net^[22]	0.848 6	0.825 0	0.708 1	26.523 2	0.982 5	0.911 7
FusionDiff^[23]	0.901 0	0.828 2	0.733 4	26.872 1	0.977 0	0.893 0
U2Fusion^[24]	0.796 6	0.823 0	0.724 6	25.863 2	0.974 9	0.874 3
本文算法	0.949 5	0.851 1	0.793 9	32.681 9	0.988 5	0.932 5

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.673 7	0.816 6	0.577 7	24.020 7	0.954 1	0.822 5
DeFusion^[21]	0.744 5	0.810 6	0.644 4	24.366 1	0.968 5	0.868 1
DIF-Net^[22]	0.773 9	0.810 4	0.614 4	23.908 4	0.968 6	0.846 8
FusionDiff^[23]	0.805 3	0.811 2	0.658 3	23.255 6	0.964 8	0.842 9
U2Fusion^[24]	0.746 7	0.818 8	0.617 9	24.072 8	0.960 5	0.830 3
本文算法	0.877 4	0.831 2	0.708 3	28.826 3	0.975 2	0.872 7

算法	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
Cu-Net^[13]	0.673 7	0.816 6	0.577 7	24.020 7	0.954 1	0.822 5
DeFusion^[21]	0.744 5	0.810 6	0.644 4	24.366 1	0.968 5	0.868 1
DIF-Net^[22]	0.773 9	0.810 4	0.614 4	23.908 4	0.968 6	0.846 8
FusionDiff^[23]	0.805 3	0.811 2	0.658 3	23.255 6	0.964 8	0.842 9
U2Fusion^[24]	0.746 7	0.818 8	0.617 9	24.072 8	0.960 5	0.830 3
本文算法	0.877 4	0.831 2	0.708 3	28.826 3	0.975 2	0.872 7

实验	Q_MI	Q_NICE	VIF_P	PSNR	CORR	SSIM
消除三维流形拟合模块	0.632 3	0.532 6	0.685 8	25.373 1	0.832 7	0.792 3
消除分频引导注意力模块	0.704 7	0.642 7	0.452 7	25.427 4	0.842 2	0.810 4
完整模型	0.949 5	0.851 1	0.793 9	32.681 9	0.988 5	0.932 5