Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (5): 941-956.DOI: 10.11996/JG.j.2095-302X.2024050941

• Image Processing and Computer Vision • Previous Articles     Next Articles

Research on multi-scale remote sensing image change detection using Swin Transformer

LIU Li1,2(), ZHANG Qifan1,2,3, BAI Yuang1,2, HUANG Kaiye1,2   

  1. 1. Department of Computer Science, North China Electric Power University, Baoding Hebei 071051, China
    2. Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding Hebei 071051, China
    3. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100080, China
  • Received:2024-05-28 Revised:2024-08-06 Online:2024-10-31 Published:2024-10-31
  • About author:First author contact:

    LIU Li (1978-), associate professor, Ph.D. Her main research interests cover artificial intelligence and computer vision. E-mail:liuli@ncepu.edu.cn

  • Supported by:
    Hebei Province Graduate Student Innovation Ability Training Funding Project(CXZZSS2024163);The Key Research and Development Projects in Hebei Province(20310103D)

Abstract:

Due to the complexity of terrain information and the diversity of change detection data, it is difficult to ensure the adequacy and effectiveness of feature extraction in remote sensing images, resulting in low reliability of detection results obtained by change detection methods. Although convolutional neural networks are widely applied in remote sensing change detection due to their advantage of effectively extracting semantic features, the inherent locality of convolutional operations limits the receptive field, making it difficult to capture global spatiotemporal information, thus limiting the modeling of long-range dependencies in the feature space. To capture long-distance semantic dependencies and extract deep global semantic features, a multi-scale feature fusion network SwinChangeNet based on the Swin Transformer was designed. Firstly, SwinChangeNet utilized a twin multi-level Swin Transformer feature encoder for long-range context modeling. Secondly, a feature difference extraction module was introduced into the encoder to calculate the multi-level feature differences before and after changes at different scales, and then the multi-scale feature maps were fused through an adaptive fusion layer. Finally, residual connections and channel attention mechanisms were introduced to decode the fused feature information, thereby generating a complete and accurate change map. Compared with seven classic and cutting-edge change detection methods on two publicly available datasets, CDD and CD-Data_GZ, the proposed model demonstrated the best performance in both datasets. In the CDD dataset, compared with the second-best performing model, the F1 score increased by 1.11% and the accuracy by 2.38%. The proposed model outperformed the others in the CD-Data_GZ dataset. Compared to the second best-performing model, the F1 score, accuracy, and recall increased by 4.78%, 4.32%, and 4.09%, respectively, showing significant improvements. The comparative experimental results demonstrated that the proposed model has superior detection performance. The stability and effectiveness of each improved module in the model were also validated through the ablation experiment. In conclusion, the model proposed in this article focused on the task of remote sensing image change detection, introducing the Swin Transformer structure. This enabled the network to more effectively encode local and global features of remote sensing images, resulting in more accurate detection results, while ensuring that the network converges efficiently on datasets with a wide variety of land features.

Key words: change detection, siamese network, Swin Transformer, multi-scale feature fusion, attention mechanism, feature difference extraction

CLC Number: