融合改进Transformer的车辆部件检测方法

doi:10.11996/JG.j.2095-302X.2024050930

图学学报 ›› 2024, Vol. 45 ›› Issue (5): 930-940.DOI: 10.11996/JG.j.2095-302X.2024050930

• 图像处理与计算机视觉 • 上一篇下一篇

融合改进Transformer的车辆部件检测方法

翟永杰(), 李佳蔚, 陈年昊, 王乾铭(), 王新颖

华北电力大学自动化系，河北保定 071003

收稿日期:2024-05-30 修回日期:2024-07-23 出版日期:2024-10-31 发布日期:2024-10-31
通讯作者:王乾铭(1995-)，男，讲师，博士。主要研究方向为电力视觉、输电线路巡检和视觉知识推理。E-mail：qianmingwang@ncepu.edu.cn
第一作者:翟永杰(1972-)，男，教授，博士。主要研究方向为电力视觉。E-mail：zhaiyongjie@ncepu.edu.cn
基金资助:
国家自然科学基金项目(62373151);河北省自然科学基金面上项目(F2023502010);中央高校基本科研业务费专项资金项目(2023JC006);中央高校基本科研业务费专项资金项目(2024MS136)

The vehicle parts detection method enhanced with Transformer integration

ZHAI Yongjie(), LI Jiawei, CHEN Nianhao, WANG Qianming(), WANG Xinying

Department of Automation, North China Electric Power University, Baoding Hebei 071003, China

Received:2024-05-30 Revised:2024-07-23 Published:2024-10-31 Online:2024-10-31
Contact: WANG Qianming (1995-), lecturer, Ph.D. His main research interests cover electric power vision, transmission line inspection and visual knowledge reasoning. E-mail：qianmingwang@ncepu.edu.cn
First author：ZHAI Yongjie (1972-), professor, Ph.D. His main research interest covers power vision. E-mail：zhaiyongjie@ncepu.edu.cn
Supported by:
National Natural Science Foundation of China Funded Project(62373151);Hebei Provincial Natural Science Foundation General Project(F2023502010);Fundamental Research Funds for the Central Universities(2023JC006);Fundamental Research Funds for the Central Universities(2024MS136)

摘要/Abstract

摘要：

为有效解决车辆部件检测中模型由于特征提取不充分以及候选框未能充分利用导致的错检、漏检等问题，提出了融合改进Transformer的车辆部件检测方法。首先将多头自注意力和双层路由注意力结合，提出了关键区域多头自注意力(KR-MHSA)；然后将基线模型(Mask R-CNN)中ResNet的最后一层与KR-MHSA进行残差融合，提升了模型的基础特征提取能力；最后通过改进的Swin Transformer对模型生成的候选框进行特征学习，使模型更好地理解不同候选框之间的差异和相似性。实验在构建的59类车辆部件数据集上进行，对比实验结果证明，本文模型在检测和分割效果上均优于其他先进实例分割模型。相较于基线模型，检测准确率提高了4.47%，分割准确率提高了4.4%，有效地解决了车辆部件检测中特征提取不足和候选框未充分利用导致的错检、漏检和实例分割精度较低的问题，使保险公司能够更准确、更高效地更换损坏的部件，提高索赔效率。

关键词: 车辆部件, 深度学习, 实例分割, Mask R-CNN, 特征提取, 多头自注意力, 双层路由注意力

Abstract:

To effectively address issues such as false detections and missed detections caused by insufficient feature extraction and inadequate utilization of candidate boxes in vehicle component detection models, an improved Transformer-based method for vehicle component detection was proposed. Firstly, by combining multi-head self-attention and bi-layer routing attention, a key region multi-head self-attention (KR-MHSA) mechanism was introduced. Secondly, the final layer of ResNet in the baseline model (Mask R-CNN) was integrated with KR-MHSA using residual fusion, enhancing the basic feature extraction capabilities of the model. Finally, the improved Swin Transformer was employed for feature learning on the candidate boxes generated by the model, enabling the model to better understand the differences and similarities between various candidate boxes. Experiments conducted on a constructed dataset of 59 vehicle component categories demonstrated that the proposed model outperformed other state-of-the-art instance segmentation models in both detection and segmentation performance. Compared to the baseline model, the detection accuracy improved by 4.47%, and the segmentation accuracy improved by 4.4%. This effectively resolved the issues of insufficient feature extraction and inadequate utilization of candidate boxes in vehicle component detection, leading to more accurate and efficient replacement of damaged parts by insurance companies, thus improving claims processing efficiency.

Key words: vehicle parts, deep learning, instance segmentation, Mask R-CNN, feature extraction, multi-head self-attention, bi-level routing attention

中图分类号:

翟永杰, 李佳蔚, 陈年昊, 王乾铭, 王新颖. 融合改进Transformer的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940.

ZHAI Yongjie, LI Jiawei, CHEN Nianhao, WANG Qianming, WANG Xinying. The vehicle parts detection method enhanced with Transformer integration[J]. Journal of Graphics, 2024, 45(5): 930-940.

图/表 8

参考文献 35

[1]	黄春梅, 彭昊. 车辆远程定损系统——以中国人民财产保险公司为例[J]. 内燃机与配件, 2021(6): 197-198.
	HUANG C M, PENG H. Vehicle remote loss assessment system—taking China people's property insurance company as an example[J]. Internal Combustion Engine & Parts, 2021(6): 197-198 (in Chinese).
[2]	高文婷, 刘越. 面向移动增强现实的实时深度学习目标检测方法综述[J]. 图学学报, 2021, 42(4): 525-534.
	GAO W T, LIU Y. Review of real-time deep learning-based object detection for mobile augmented reality[J]. Journal of Graphics, 2021, 42(4): 525-534 (in Chinese).
[3]	汪丹丹, 张旭东, 范之国, 等. 基于RGB-D的反向融合实例分割算法[J]. 图学学报, 2021, 42(5): 767-774.
	WANG D D, ZHANG X D, FAN Z G, et al. A reverse fusion instance segmentation algorithm based on RGB-D[J]. Journal of Graphics, 2021, 42(5): 767-774 (in Chinese).
[4]	荆修平, 田莹. 采用长距离依赖和多尺度表达的轻量化车辆检测[J]. 光学精密工程, 2023, 31(6): 950-961.
	JING X P, TIAN Y. Lightweight vehicle detection using long-distance dependence and multi-scale representation[J]. Optics and Precision Engineering, 2023, 31(6): 950-961 (in Chinese).
[5]	赵璐璐, 王学营, 张翼, 等. 基于YOLOv5s融合SENet的车辆目标检测技术研究[J]. 图学学报, 2022, 43(5): 776-782.
	ZHAO L L, WANG X Y, ZHANG Y, et al. Vehicle target detection based on YOLOv5s fusion SENet[J]. Journal of Graphics, 2022, 43(5): 776-782 (in Chinese).
[6]	周金治, 景瑞琦, 吴静, 等. 改进YOLOv5s的车辆目标检测算法研究与实现[J]. 计算机与数字工程, 2023, 51(11): 2546-2552, 2579.
	ZHOU J Z, JING R Q, WU J, et al. Research and implementation of vehicle target detection algorithm based on improved YOLOv5s[J]. Computer & Digital Engineering, 2023, 51(11): 2546-2552, 2579 (in Chinese).
[7]	BAI T G, SHI F B, WANG Z P, et al. Vehicle target detection in aerial images based on improved YOLOv5[C]// 2023 15th International Conference on Intelligent Human-Machine Systems and Cybernetics. New York: IEEE Press, 2023: 190-193.
[8]	丛眸, 张平, 王宁. 改进YOLOv3算法及其在航拍图像车辆检测中的应用[J]. 计算机应用与软件, 2023, 40(1): 228-233.
	CONG M, ZHANG P, WANG N. Improved YOLOv3 algorithm and Its application on aerial image vehicle object detection[J]. Computer Applications and Software, 2023, 40(1): 228-233 (in Chinese).
[9]	龙赛, 宋晓凤, 张苏, 等. 改进YOLOv5s的航拍图像车辆检测研究[J]. 激光杂志, 2022, 43(10): 22-29.
	LONG S, SONG X F, ZHANG S, et al. Research on vehicle detection in aerial images with improved YOLOv5s[J]. Laser Journal, 2022, 43(10): 22-29 (in Chinese).
[10]	谢东升. 基于深度学习的车辆智能定损算法研究[D]. 天津: 天津大学, 2019.
	XIE D S. Research on vehicle intelligent damage location algorithm based on deep learning[D]. Tianjin: Tianjin University, 2019 (in Chinese).
[11]	ZHAI Y J, CHEN N H, ZHANG Z Q, et al. SU-VPDN: a scene understanding method for vehicle part detection[J]. Engineering Applications of Artificial Intelligence, 2024, 132: 107956.
[12]	PAPAGEORGIOU C P, OREN M, POGGIO T. A general framework for object detection[C]// The 6th International Conference on Computer Vision. New York: IEEE Press, 1998: 555-562.
[13]	施海. 一种基于图像识别的车辆智能定损系统[J]. 科学技术创新, 2020(10): 48-50.
	SHI H. An image Recognition-based vehicle intelligent damage assessment system[J]. Scientific and Technological Innovation, 2020(10): 48-50 (in Chinese).
[14]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2005: 886-893.
[15]	CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20(2): 273-297.
[16]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 580-587.
[17]	GIRSHICK R. Fast R-CNN[C]// The IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448.
[18]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// The 28th International Conference on Neural Information Processing Systems. New York: ACM, 2015: 91-99.
[19]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// The IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988.
[20]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[21]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[22]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2024-01-11]. https://arxiv.org/abs/2010.11929v1.
[23]	LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// The IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002.
[24]	吴建雄. 基于卷积神经网络的车辆部件检测[D]. 武汉: 华中科技大学, 2017.
	WU J X. Detection of vehicle parts based on convolution neural network[D]. Wuhan: Huazhong University of Science and Technology, 2017 (in Chinese).
[25]	舒娟. 基于深度学习的车辆部件检测[D]. 武汉: 华中科技大学, 2017.
	SHU J. Vehicle component detection based on deep learning[D]. Wuhan: Huazhong University of Science and Technology, 2017 (in Chinese).
[26]	REDMON J, FARHADI A. Yolov3: An incremental improvement[EB/OL]. [2024-01-11]. https://arxiv.org/abs/1804.02767.
[27]	ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with bi-level routing attention[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10323-10333.
[28]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[29]	REN S C, ZHOU D Q, HE S F, et al. Shunted self-attention via multi-scale token aggregation[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10843-10852.
[30]	WANG X L, ZHANG R F, KONG T, et al. SOLOv2: dynamic and fast instance segmentation[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 1487.
[31]	CAI Z W, VASCONCELOS N. Cascade R-CNN: High quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1483-1498.
[32]	FANG Y X, YANG S S, WANG X G, et al. Instances as queries[C]// The IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 6890-6899.
[33]	HUANG Z J, HUANG L C, GONG Y C, et al. Mask scoring R-CNN[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 6402-6411.
[34]	CHEN K, PANG J M, WANG J Q, et al. Hybrid task cascade for instance segmentation[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4969-4978.
[35]	KIRILLOV A, WU Y X, HE K M, et al. PointRend: image segmentation as rendering[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9796-9805.

部件类别	部件类别	部件类别
1.车顶外板(Roof_outer_panel)	21.后门外拉手(右) (Back_door_handle (right))	41.前门饰条(右) (Front_door_trim (right))
2.倒车镜(右) (Outer_mirror (right))	22.后门外拉手(左) (Back_door_handle (left))	42.前门饰条(左) (Front_door_trim (left))
3.倒车镜(左) (Outer_mirror (left))	23.后叶子板(右) (Rear_fender (right))	43.前门外拉手(右) (Front_door_handle (right))
4.倒车镜护盖(右) (Mirror_cover (right))	24.后叶子板(左) (Rear_fender (left))	44.前门外拉手(左) (Front_door_handle (left))
5.倒车镜护盖(左) (Mirror_cover (left))	25.后叶子板轮眉(右) (Rear_fender_wheel_eyebrow (right))	45.前雾灯(右) (Fog_lamp (right))
6.底大边(右) (Bottom_edge (right))	26.后叶子板轮眉(左) (Rear_fender_wheel_eyebrow (left))	46.前雾灯(左) (Fog_lamp (left))
7.底大边(左) (Bottom_edge (left))	27.举升门玻璃 (liftgate_glass)	47.前叶子板(右) (Front_fender (right))
8.钢圈 (Steel_ring)	28.举升门壳 (liftgate_shell)	48.前叶子板(左) (Front_fender (left))
9.行李箱盖(Baggage_cover)	29.轮胎(Tire)	49.前叶子板轮眉(右) (Front_fender_wheel_eyebrow (right))
10.后保险杠电眼 (Rear_bumper_electric_eye)	30.内尾灯(右) (Inner_tail_light (right))	50.前叶子板轮眉(左) (Front_fender_wheel_eyebrow (left))
11.后保险杠皮 (Rear_bumper_skin)	31.内尾灯(左) (Inner_tail_light (left))	51.外尾灯(右) (Exterior_tail_light (right))
12.后保险杠装饰灯(右) (Rear_bumper_decorative_light (right))	32.前保险杠皮 (Front_bumper_skin)	52.外尾灯(左) (Exterior_tail_light (left))
13.后保险杠装饰灯(左) (Rear_bumper_decorative_light (left))	33.前保险杠下格栅 (Front_bumper_lower_grille)	53.尾灯(右) (Tail_light (right))
14.后风挡玻璃(Rear_window_glass)	34.前大灯(右) (Head_lamp (right))	54.尾灯(左) (Tail_light (left))
15.后门玻璃(右) (Rear_door_glass (right))	35.前大灯(左) (Head_lamp (left))	55.油箱盖 (Fuel_tank_cap)
16.后门玻璃(左) (Rear_door_glass (left))	36.前风挡玻璃 (Front_window_glass)	56.中网 (Grille)
17.后门壳(右) (Back_door_shell (right))	37.前门玻璃(右) (Front_door_glass (right))	57.中网徽标 (Grille_logo)
18.后门壳(左) (Back_door_shell (left))	38.前门玻璃(左) (Front_door_glass (left))	58.发动机罩 (Engine_cover)
19.后门饰条(右) (Rear_door_trim (right))	39.前门壳(右) (Car_right_door)	59.车牌 (License_plate)
20.后门饰条(右) (Rear_door_trim (left))	40.前门壳(左) (Car_left_door)

部件类别	部件类别	部件类别
1.车顶外板(Roof_outer_panel)	21.后门外拉手(右) (Back_door_handle (right))	41.前门饰条(右) (Front_door_trim (right))
2.倒车镜(右) (Outer_mirror (right))	22.后门外拉手(左) (Back_door_handle (left))	42.前门饰条(左) (Front_door_trim (left))
3.倒车镜(左) (Outer_mirror (left))	23.后叶子板(右) (Rear_fender (right))	43.前门外拉手(右) (Front_door_handle (right))
4.倒车镜护盖(右) (Mirror_cover (right))	24.后叶子板(左) (Rear_fender (left))	44.前门外拉手(左) (Front_door_handle (left))
5.倒车镜护盖(左) (Mirror_cover (left))	25.后叶子板轮眉(右) (Rear_fender_wheel_eyebrow (right))	45.前雾灯(右) (Fog_lamp (right))
6.底大边(右) (Bottom_edge (right))	26.后叶子板轮眉(左) (Rear_fender_wheel_eyebrow (left))	46.前雾灯(左) (Fog_lamp (left))
7.底大边(左) (Bottom_edge (left))	27.举升门玻璃 (liftgate_glass)	47.前叶子板(右) (Front_fender (right))
8.钢圈 (Steel_ring)	28.举升门壳 (liftgate_shell)	48.前叶子板(左) (Front_fender (left))
9.行李箱盖(Baggage_cover)	29.轮胎(Tire)	49.前叶子板轮眉(右) (Front_fender_wheel_eyebrow (right))
10.后保险杠电眼 (Rear_bumper_electric_eye)	30.内尾灯(右) (Inner_tail_light (right))	50.前叶子板轮眉(左) (Front_fender_wheel_eyebrow (left))
11.后保险杠皮 (Rear_bumper_skin)	31.内尾灯(左) (Inner_tail_light (left))	51.外尾灯(右) (Exterior_tail_light (right))
12.后保险杠装饰灯(右) (Rear_bumper_decorative_light (right))	32.前保险杠皮 (Front_bumper_skin)	52.外尾灯(左) (Exterior_tail_light (left))
13.后保险杠装饰灯(左) (Rear_bumper_decorative_light (left))	33.前保险杠下格栅 (Front_bumper_lower_grille)	53.尾灯(右) (Tail_light (right))
14.后风挡玻璃(Rear_window_glass)	34.前大灯(右) (Head_lamp (right))	54.尾灯(左) (Tail_light (left))
15.后门玻璃(右) (Rear_door_glass (right))	35.前大灯(左) (Head_lamp (left))	55.油箱盖 (Fuel_tank_cap)
16.后门玻璃(左) (Rear_door_glass (left))	36.前风挡玻璃 (Front_window_glass)	56.中网 (Grille)
17.后门壳(右) (Back_door_shell (right))	37.前门玻璃(右) (Front_door_glass (right))	57.中网徽标 (Grille_logo)
18.后门壳(左) (Back_door_shell (left))	38.前门玻璃(左) (Front_door_glass (left))	58.发动机罩 (Engine_cover)
19.后门饰条(右) (Rear_door_trim (right))	39.前门壳(右) (Car_right_door)	59.车牌 (License_plate)
20.后门饰条(右) (Rear_door_trim (left))	40.前门壳(左) (Car_left_door)

方法	KR-MHSA	WSWformer	Layer3	Layer4	AP50 (Bbox)/%	AP50 (Segm)/%
基线模型					37.07	36.05
	√			√	40.70	39.37
		√			39.26	37.95
	√	√	√		38.50	36.90
本文模型	√	√		√	41.54	40.45

方法	KR-MHSA	WSWformer	Layer3	Layer4	AP50 (Bbox)/%	AP50 (Segm)/%
基线模型					37.07	36.05
	√			√	40.70	39.37
		√			39.26	37.95
	√	√	√		38.50	36.90
本文模型	√	√		√	41.54	40.45

方法	双层路由注意力	多头自注意力	WSWformer	AP50 (Bbox)/%	AP50 (Segm)/%
基线模型				37.07	36.05
	√		√	40.53	39.42
		√	√	40.51	39.41
本文模型	√	√	√	41.54	40.45

融合改进Transformer的车辆部件检测方法

The vehicle parts detection method enhanced with Transformer integration

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 35

相关文章 15

编辑推荐

Metrics

本文评价

方法	AP50 (Bbox)	AP50 (Segm)
SOLOv2	28.40	27.60
Point rend	36.10	35.30
Cascade Mask R-CNN	38.10	37.20
QueryInst	40.70	40.30
Mask Score R-CNN	37.90	37.10
Htc	39.60	38.30
Baseline	37.07	36.05
本文模型	41.54	40.45

[1]	李琼 , 考月英 , 张莹 , 徐沛 . 面向无人机航拍图像的目标检测研究综述[J]. 图学学报, 2024, 45(6): 1145-1164.
[2]	刘灿锋, 孙浩, 东辉. 结合 Transformer 与 Kolmogorov Arnold 网络的分子扩增时序预测研究[J]. 图学学报, 2024, 45(6): 1256-1265.
[3]	宋思程, 陈辰, 李晨辉, 王长波. 基于密度图多目标追踪的时空数据可视化[J]. 图学学报, 2024, 45(6): 1289-1300.
[4]	王宗继, 刘云飞, 陆峰. Cloud Sphere: 一种基于渐进式变形自编码的三维模型表征方法[J]. 图学学报, 2024, 45(6): 1375-1388.
[5]	许丹丹, 崔勇, 张世倩, 刘雨聪, 林予松. 优化医学影像三维渲染可视化效果：技术综述[J]. 图学学报, 2024, 45(5): 879-891.
[6]	胡凤阔, 叶兰, 谭显峰, 张钦展, 胡志新, 方清, 王磊, 满孝锋. 一种基于改进YOLOv8的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900.
[7]	刘义艳, 郝婷楠, 贺晨, 常英杰. 基于DBBR-YOLO的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921.
[8]	姜晓恒, 段金忠, 卢洋, 崔丽莎, 徐明亮. 融合先验知识推理的表面缺陷检测[J]. 图学学报, 2024, 45(5): 957-967.
[9]	熊超, 王云艳, 罗雨浩. 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016.
[10]	胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.
[11]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[12]	宫永超, 沈旭昆. 一种用于互惠目标检测与实例分割的深层架构[J]. 图学学报, 2024, 45(4): 745-759.
[13]	李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
[14]	朱光辉, 缪君, 胡宏利, 申基, 杜荣华. 基于自增强注意力机制的室内单图像分段平面三维重建[J]. 图学学报, 2024, 45(3): 464-471.
[15]	王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13.