欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 779-790.DOI: 10.11996/JG.j.2095-302X.2024040779

• 图像处理与计算机视觉 • 上一篇    下一篇

面向交通标志的改进YOLO目标检测算法

赵磊(), 李栋(), 房建东, 曹琪   

  1. 内蒙古工业大学信息工程学院,内蒙古 呼和浩特 010050
  • 收稿日期:2024-04-18 接受日期:2024-06-13 出版日期:2024-08-31 发布日期:2024-09-03
  • 通讯作者:李栋(1984-),男,副教授,博士。主要研究方向为计算机视觉、信息处理与智能控制。E-mail:lidong@imut.edu.cn
  • 第一作者:赵磊(1999-),男,硕士研究生。主要研究方向为计算机视觉、信息处理与智能控制。E-mail:zhaolei990323@163.com
  • 基金资助:
    内蒙古自治区自然科学基金项目(2022QN06004)

Improved YOLO object detection algorithm for traffic signs

ZHAO Lei(), LI Dong(), FANG Jiandong, CAO Qi   

  1. School of Information Engineering, Inner Mongolia University of Technology, Hohhot, Inner Mongolia 010050, China
  • Received:2024-04-18 Accepted:2024-06-13 Published:2024-08-31 Online:2024-09-03
  • Contact: LI Dong (1984-), associate professor, Ph.D. His main research interests cover computer vision, information processing and intelligent control, etc. E-mail:lidong@imut.edu.cn
  • First author:ZHAO Lei (1999-), master student. His main research interests cover computer vision, information processing and intelligent control. E-mail:zhaolei990323@163.com
  • Supported by:
    Natural Science Foundation Project of Inner Mongolia Autonomous Region(2022QN06004)

摘要:

针对当前算法在面对交通标志时存在识别精度低、检测错误较多等问题,提出了一种基于YOLOv5优化的交通标志检测方法。在Backbone部分,为了获得不同大小的感受野,不同复杂度的特征,并增强特征图的重要特征,抑制冗余特征,使用DBB重参数模块代替Conv卷积,并加入SE注意力机制;在Neck部分,设计了新的SLA Neck,聚合来自不同层的特征图,有效防止小目标特征信息损失,对融合后的特征进行上采样,增加小目标检测层,增强浅层语义信息;在Head部分引入IoU-Aware查询选择,即将IoU分数引入分类分支的目标函数,预测框与GT的IoU作为类别预测的标签,以实现对正样本分类和定位的一致性约束;使用SIoU损失函数代替CIoU损失函数,考虑真实框与预测框之间的方向,提升收敛速度和推理能力。实验结果表明,在TT100K数据集下,方法相较于YOLOv5m,计算量减少了3.3%,参数量减少了34.8%,而mAP和mAP@50:95分别提升了13.8%和10.4%。实验说明,该模型在减少模型参数量及大小的同时提高了检测精度,具有应用价值。

关键词: 交通标志检测, YOLOv5, 重参数化, 注意力机制, SLA

Abstract:

To address the existing problems such as low recognition accuracy and numerous detection errors in the current algorithms when detecting traffic signs, a traffic sign detection method based on the optimization of YOLOv5 was proposed. In the Backbone section, to achieve receptive fields of various sizes, obtain features of different complexities, and enhance the critical features of feature maps while suppressing redundant ones, the reparameterization module DBB was employed instead of Conv convolution, and convolutions with diverse scales are utilized to obtain receptive fields of various sizes. By means of feature extraction branches with different scales and diverse complexities, the feature space is enriched. Simultaneously, the SE attention mechanism was introduced. to enhance the significant features of the feature map and suppress redundant features, thereby enhancing the detection performance of the network; In the Neck section, a new SLA Neck was designed to aggregate feature maps from different layers, effectively preventing the loss of small target feature information. is employed as the neck structure, which reduces the number of parameters and the amount of computation while fusing the feature information of different levels, capturing more context information and details, segmenting the background information, enabling the model to be more focused on the target characteristic area, and enhancing the performance of the model when encountering objects of different sizes to achieve precise positioning; concurrently, The fused features were upsampled, and a small object detection layer was added to enhance shallow feature information. In the Head section, the IoU-Aware query selection was introduced, and the IoU score was incorporated into the objective function of the classification branch, using the IoU between the predicted box and the ground truth (GT) as the label for category prediction. This could achieve the consistent constraint on the classification and localization of the positive samples. and enhance the matching mechanism of the model, and reduce the occurrences of incorrect detection and missed detection; simultaneously, The SIoU was introduced as the loss function instead of the CIoU loss function, taking into account the direction between the ground truth box and the predicted box is encompassed within the loss range to elevate convergence speed and inference capability. The experimental results indicated that on the TT100K dataset, the proposed method, compared with YOLOv5m, reduced the amount of computation by 3.3%, and the number of parameters by 34.8%, while mAP and mAP@50:95 were improved by 13.8% and 10.4%, respectively. The experiment demonstrated that this model enhanced the detection accuracy while reducing the number of model parameters and the size of the model, making it valuable for practical applications.

Key words: traffic sign detection, YOLOv5, reparameterization, attention mechanisms, SLA

中图分类号: