Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (1): 65-77.DOI: 10.11996/JG.j.2095-302X.2024010065

• Image Processing and Computer Vision • Previous Articles     Next Articles

Deep multimodal medical image fusion network based on high-low frequency feature decomposition

WANG Xinyu1,2(), LIU Hui1,2(), ZHU Jicheng1,2, SHENG Yurui3, ZHANG Caiming2,4   

  1. 1. School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan Shandong 250014, China
    2. Shandong Key Laboratory of Digital Media Technology, Jinan Shandong 250014, China
    3. The First Affiliated Hospital of Shandong First Medical University, Jinan Shandong 250014, China
    4. School of Software, Shandong University, Jinan Shandong 250014, China
  • Received:2023-07-20 Accepted:2023-09-20 Online:2024-02-29 Published:2024-02-29
  • Contact: LIU Hui (1978-), professor, Ph.D. Her main research interests cover data mining and visualization. E-mail:liuh_lh@sdufe.edu.cn
  • About author:

    WANG Xinyu (1999-), master student. Her main research interest covers multimodal data fusion. E-mail:wangxy@mail.sdufe.edu.cn

  • Supported by:
    National Natural Science Foundation of China(62072274);National Natural Science Foundation of China(U22A2033);The Central Guidance on Local Science and Technology Development Project(YDZX2022009);Mount Taishan Scholar Distinguished Expert Plan of Shandong Province(tstp20221137)

Abstract:

Multimodal medical image fusion aims to enhance the interpretability and applicability of medical images in clinical settings by leveraging correlations and complementary information across different imaging modalities. However, existing manually designed models often fail to effectively extract critical target features, resulting in issues such as blurred fusion images and loss of textural details. To address this, a novel deep multimodal medical image fusion network based on high-low frequency feature decomposition was proposed. This approach incorporated channel attention and spatial attention mechanisms into the fusion process, allowing for a more intricate fusion of high-low frequency features while preserving both global structure and local textural details. Firstly, the high-frequency features of two modal images were extracted using the pre-trained model VGG-19, and their low-frequency features were extracted through downsampling to form intermediate features between high and low frequencies. Secondly, a residual attention network was embedded in the feature fusion module to sequentially infer attention maps from independent channels and spatial dimensions. These maps were then employed to guide the adaptive feature optimization of input feature maps. Finally, the reconstruction module fused high-low frequency features and output the fusion image. Experimental results on both the Harvard open dataset and a self-created abdominal dataset demonstrated that compared to the source image, the fusion image produced by the proposed method achieved an 8.29% improvement in peak signal-to-noise ratio, 85.07% in structural similarity, 65.67% in correlation coefficient, 46.76% in feature mutual information, and 80.89% in visual fidelity.

Key words: multi-modal medical image fusion, pre-trained model, deep learning, high-low frequency feature extraction, residual attention network

CLC Number: