欢迎访问《图学学报》

图学学报 ›› 2026, Vol. 47 ›› Issue (2): 351-359.DOI: 10.11996/JG.j.2095-302X.2026020351

• 图像处理与计算机视觉 • 上一篇    下一篇

基于三维流形拟合与分频引导注意力机制的多聚焦图像融合

张宙, 王泽宇(), 宋海玉, 李威, 葛鸣宇, 王嘉宇, 王文琦   

  1. 大连民族大学计算机科学与工程学院辽宁 大连 116600
  • 收稿日期:2025-05-22 接受日期:2025-12-04 出版日期:2026-04-30 发布日期:2026-05-20
  • 通讯作者:王泽宇,E-mail:20231578@dlnu.edu.cn
  • 基金资助:
    辽宁省自然科学基金计划(2024-BS-028);省级大学生创新训练项目(202412026124);国家级大学生创新训练项目(202512026029)

Multi-focus image fusion based on 3D manifold fitting and frequency division-guided attention mechanism

ZHANG Zhou, WANG Zeyu(), SONG Haiyu, LI Wei, GE Mingyu, WANG Jiayu, WANG Wenqi   

  1. College of Computer Science and Engineering, Dalian Nationalities University, Dalian Liaoning 116600, China
  • Received:2025-05-22 Accepted:2025-12-04 Published:2026-04-30 Online:2026-05-20
  • Contact: WANG Zeyu,E-mail:20231578@dlnu.edu.cn
  • Supported by:
    Natural Science Foundation of Liaoning Province Program(2024-BS-028);Provincial College Students’ Innovation Training Project(202412026124);International College Students’ Innovation Training Project(202512026029)

摘要:

多聚焦图像融合是一种将同一场景下不同聚焦区域的多幅图像整合,以生成一幅同时具备清晰细节与完整结构信息的全聚焦清晰图像的技术,在消费电子、医学成像和卫星遥感等领域应用广泛。针对基于深度学习的图像融合普遍存在信息丢失、伪影、数据集匮乏以及时空开销大等问题,提出了基于三维流形拟合与分频引导注意力机制的融合模型。该模型采用特征分解-融合-重构的新范式,通过在编码阶段有效识别并分离背景结构与细节信息,从而有效减少结构信息损耗与伪影引入;创新性地利用三维流形拟合实现多聚焦图像共性特征提取,降低模型对数据量的依赖,减少时空开销;在特征融合阶段,引入分频引导注意力机制,精准刻画图像高频细节与低频背景,实现跨频域特征的自适应加权融合,缓解复杂纹理模糊、细节缺失等问题。同时为了保障融合图像的全局视觉与局部细节质量,将多种损失约束进行整合设计加权复合损失函数。在公开经典测试集Lytro和MFFW上的实验结果表明在6项常用评价指标中,该方法均取得最优,充分验证了其有效性。

关键词: 多聚焦图像融合, 流形拟合, 特征提取, 交叉注意力, 频域

Abstract:

Multi-focus image fusion is a technique that integrates multiple images of the same scene with different focus regions to generate a fully focused and clear image featuring both distinct details and complete structural information. It has found widespread applications in fields such as consumer electronics, medical imaging, and satellite remote sensing. To address the prevalent issues such as information loss, artifacts, insufficient datasets, and high spatiotemporal overhead in deep learning-based image fusion methods, a novel fusion model based on Three-Dimensional (3D) manifold fitting and frequency-separated guided attention mechanism was proposed. The model adopted a new paradigm of feature decomposition-fusion-reconstruction. During the encoding phase, background structures and detail information were effectively identified and separated, significantly reducing the loss of structural information and the introduction of artifacts. Innovatively, 3D manifold fitting was employed to extract common features of multi-focus images, thereby reducing the model’s dependency on large datasets and lowers spatiotemporal overhead. In the feature fusion stage, a frequency-separated guided attention mechanism was introduced to accurately characterize high-frequency details and low-frequency backgrounds of images, enabling adaptive weighted fusion of cross-frequency domain features and alleviating problems such as blurred complex textures and missing details. Furthermore, to ensure the global visual quality and local detail preservation of the fused image, a weighted composite loss function was designed by integrating multiple loss constraints. Experimental results on public classical test datasets Lytro and MFFW demonstrated that the proposed method achieved state-of-the-art performance across six commonly used evaluation metrics, fully verifying its effectiveness.

Key words: multi-focus image fusion, manifold fitting, feature extraction, cross-attention, frequency domain

中图分类号: