欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (6): 1104-1111.DOI: 10.11996/JG.j.2095-302X.2023061104

• 图像处理与计算机视觉 • 上一篇    下一篇

具有双层路由注意力的YOLOv8道路场景目标检测方法

魏陈浩1(), 杨睿1, 刘振丙1, 蓝如师1(), 孙希延2, 罗笑南2   

  1. 1.广西图像图形与智能处理重点实验室(桂林电子科技大学),广西 桂林 541004
    2.卫星导航定位与位置服务国家地方联合工程研究中心(桂林电子科技大学),广西 桂林 541004
  • 收稿日期:2023-06-29 接受日期:2023-09-16 出版日期:2023-12-31 发布日期:2023-12-17
  • 通讯作者: 蓝如师(1986-),男,教授,博士。主要研究方向为人工智能、图像处理和医学信息处理。E-mail:rslan2016@163.com
  • 作者简介:

    魏陈浩(1999-),男,硕士研究生。主要研究方向为目标检测和深度学习。E-mail:chwei529@163.com

YOLOv8 with bi-level routing attention for road scene object detection

WEI Chen-hao1(), YANG Rui1, LIU Zhen-bing1, LAN Ru-shi1(), SUN Xi-yan2, LUO Xiao-nan2   

  1. 1. Guangxi Key Laboratory of Image and Graphic Intelligent Processing (Guilin University of Electronic Technology), Guilin Guangxi 541004, China
    2. National Local Joint Engineering Research Center of Satellite Navigation and Location Service (Guilin University of Electronic Technology), Guilin Guangxi, 541004, China
  • Received:2023-06-29 Accepted:2023-09-16 Online:2023-12-31 Published:2023-12-17
  • Contact: LAN Rui-shi (1986-), professor, Ph.D. His main research interests cover artificial intelligence, image processing and medical information processing. E-mail:rslan2016@163.com
  • About author:

    WEI Chen-hao (1999-), master student. His main research interests cover object detection and deep learning. E-mail:chwei529@163.com

摘要:

随着机动车的数量不断增加,道路交通环境变得更加复杂,尤其是光照变化以及复杂背景都会干扰目标检测算法的准确性和精度,同时道路场景下多变形态的目标也会给检测任务造成干扰。针对这一系列问题,提出了一种YOLOv8n_T方法,在YOLOv8的基础上首先针对骨干网络构建了基于可变形卷积的D_C2f块,强化了特征提取网络对复杂背景下目标的特征学习,更好地适应道路目标复杂多变的情形;其次增加了双层路由注意力模块,以查询自适应的方式去除不相关的区域,留下相关度最高的区域;最后针对道路上行人、交通灯等小目标增加小目标检测层。实验表明,本文提出的YOLOv8n_T有效提高了模型在道路场景下的目标检测精度,在BDD100K数据集上的平均精度比原始YOLOv8n提升了6.8个百分点,比YOLOv5n提升了11.2个百分点。

关键词: 可变形卷积, 道路场景, 目标检测, YOLO, 注意力机制

Abstract:

With the continuous increase of motor vehicles, the road traffic environment has become increasingly complex, particularly due to changes in light conditions and complex backgrounds that can interfere with the accuracy and precision of target detection algorithms. Meanwhile, the diverse shapes of targets in road scenes can pose challenges to the detection task. In response to these challenges, a method named YOLOv8n_T was proposed. Building on the YOLOv8 skeleton network, it incorporated a D_C2f block utilizing deformable convolution to enhance feature learning for targets under complex backgrounds, making it more adaptable to the diverse and complex scenarios of road targets. Furthermore, the model incorporated a dual routing attention module to query adaptively and remove irrelevant regions, retaining only the most relevant regions. For small targets such as pedestrians and traffic lights on the road, a small target detection layer was added. Experimental results demonstrated that the proposed YOLOv8n_T could significantly enhance the precision of target detection in road scenarios, with an average precision increase of 6.8 percentage points compared to the original YOLOv8n and 11.2 percentage points compared to YOLOv5n on the BDD100K dataset.

Key words: deformable convolution, road scene, object detection, YOLO, attention mechanism

中图分类号: