欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (5): 930-940.DOI: 10.11996/JG.j.2095-302X.2024050930

• 图像处理与计算机视觉 • 上一篇    下一篇

融合改进Transformer的车辆部件检测方法

翟永杰(), 李佳蔚, 陈年昊, 王乾铭(), 王新颖   

  1. 华北电力大学自动化系,河北 保定 071003
  • 收稿日期:2024-05-30 修回日期:2024-07-23 出版日期:2024-10-31 发布日期:2024-10-31
  • 通讯作者:王乾铭(1995-),男,讲师,博士。主要研究方向为电力视觉、输电线路巡检和视觉知识推理。E-mail:qianmingwang@ncepu.edu.cn
  • 第一作者:翟永杰(1972-),男,教授,博士。主要研究方向为电力视觉。E-mail:zhaiyongjie@ncepu.edu.cn
  • 基金资助:
    国家自然科学基金项目(62373151);河北省自然科学基金面上项目(F2023502010);中央高校基本科研业务费专项资金项目(2023JC006);中央高校基本科研业务费专项资金项目(2024MS136)

The vehicle parts detection method enhanced with Transformer integration

ZHAI Yongjie(), LI Jiawei, CHEN Nianhao, WANG Qianming(), WANG Xinying   

  1. Department of Automation, North China Electric Power University, Baoding Hebei 071003, China
  • Received:2024-05-30 Revised:2024-07-23 Published:2024-10-31 Online:2024-10-31
  • Contact: WANG Qianming (1995-), lecturer, Ph.D. His main research interests cover electric power vision, transmission line inspection and visual knowledge reasoning. E-mail:qianmingwang@ncepu.edu.cn
  • First author:ZHAI Yongjie (1972-), professor, Ph.D. His main research interest covers power vision. E-mail:zhaiyongjie@ncepu.edu.cn
  • Supported by:
    National Natural Science Foundation of China Funded Project(62373151);Hebei Provincial Natural Science Foundation General Project(F2023502010);Fundamental Research Funds for the Central Universities(2023JC006);Fundamental Research Funds for the Central Universities(2024MS136)

摘要:

为有效解决车辆部件检测中模型由于特征提取不充分以及候选框未能充分利用导致的错检、漏检等问题,提出了融合改进Transformer的车辆部件检测方法。首先将多头自注意力和双层路由注意力结合,提出了关键区域多头自注意力(KR-MHSA);然后将基线模型(Mask R-CNN)中ResNet的最后一层与KR-MHSA进行残差融合,提升了模型的基础特征提取能力;最后通过改进的Swin Transformer对模型生成的候选框进行特征学习,使模型更好地理解不同候选框之间的差异和相似性。实验在构建的59类车辆部件数据集上进行,对比实验结果证明,本文模型在检测和分割效果上均优于其他先进实例分割模型。相较于基线模型,检测准确率提高了4.47%,分割准确率提高了4.4%,有效地解决了车辆部件检测中特征提取不足和候选框未充分利用导致的错检、漏检和实例分割精度较低的问题,使保险公司能够更准确、更高效地更换损坏的部件,提高索赔效率。

关键词: 车辆部件, 深度学习, 实例分割, Mask R-CNN, 特征提取, 多头自注意力, 双层路由注意力

Abstract:

To effectively address issues such as false detections and missed detections caused by insufficient feature extraction and inadequate utilization of candidate boxes in vehicle component detection models, an improved Transformer-based method for vehicle component detection was proposed. Firstly, by combining multi-head self-attention and bi-layer routing attention, a key region multi-head self-attention (KR-MHSA) mechanism was introduced. Secondly, the final layer of ResNet in the baseline model (Mask R-CNN) was integrated with KR-MHSA using residual fusion, enhancing the basic feature extraction capabilities of the model. Finally, the improved Swin Transformer was employed for feature learning on the candidate boxes generated by the model, enabling the model to better understand the differences and similarities between various candidate boxes. Experiments conducted on a constructed dataset of 59 vehicle component categories demonstrated that the proposed model outperformed other state-of-the-art instance segmentation models in both detection and segmentation performance. Compared to the baseline model, the detection accuracy improved by 4.47%, and the segmentation accuracy improved by 4.4%. This effectively resolved the issues of insufficient feature extraction and inadequate utilization of candidate boxes in vehicle component detection, leading to more accurate and efficient replacement of damaged parts by insurance companies, thus improving claims processing efficiency.

Key words: vehicle parts, deep learning, instance segmentation, Mask R-CNN, feature extraction, multi-head self-attention, bi-level routing attention

中图分类号: