融合注意力机制的YOLOv5口罩检测算法

doi:10.11996/JG.j.2095-302X.2023010016

图学学报 ›› 2023, Vol. 44 ›› Issue (1): 16-25.DOI: 10.11996/JG.j.2095-302X.2023010016

• 图像处理与计算机视觉 • 上一篇下一篇

融合注意力机制的YOLOv5口罩检测算法

李小波¹(), 李阳贵¹^,²(), 郭宁¹, 范震¹

1. 青海大学计算机技术与应用系，青海西宁 810016
2. 青海大学省部共建三江源生态与高原农牧业国家重点实验室，青海西宁 810016

收稿日期:2022-06-05 修回日期:2022-08-03 出版日期:2023-10-31 发布日期:2023-02-16
通讯作者: 李阳贵
作者简介:李小波(1998-)，男，硕士研究生。主要研究方向为目标检测、图像处理。E-mail：1336441422@qq.com
基金资助:
国家自然科学基金项目(61962051);省部共建三江源生态与高原农牧业国家重点实验室自主课题项目(2021-ZZ-2)

Mask detection algorithm based on YOLOv5 integrating attention mechanism

LI Xiao-bo¹(), LI Yang-gui¹^,²(), GUO Ning¹, FAN Zhen¹

1. Department of Computer Technology and Applications, Qinghai University, Xining Qinghai 810016, China
2. State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining Qinghai 810016, China

Received:2022-06-05 Revised:2022-08-03 Online:2023-10-31 Published:2023-02-16
Contact: LI Yang-gui
About author:LI Xiao-bo (1998-), master student. His main research interests cover object detection and image processing. E-mail：1336441422@qq.com
Supported by:
National Natural Science Foundation of China(61962051);Independent Project of the State Key Laboratory of Plateau Ecology and Agriculture(2021-ZZ-2)

摘要/Abstract

摘要：

新冠疫情期间正确佩戴口罩可以有效防止病毒的传播，针对公共场所存在的人员密集、检测目标较小等加大检测难度的问题，提出一种以YOLOv5s模型为基础并引入注意力机制融合多尺度注意力权重的口罩佩戴检测算法。在YOLOv5s模型的骨干网络中分别引入4种注意力机制，抑制无关信息，增强特征图的信息表达能力，提高模型对小尺度目标的检测能力。实验结果表明，引入CBAM模块后较原网络mAP值提升了6.9个百分点，在4种注意力机制中提升幅度最明显，而引入NAM模块后在损失少量mAP的情况下使参数量最少，最后通过对比实验选用GIoU损失函数计算边界框回归损失，进一步提升定位精度，最终结果较原网络mAP值提升了8.5个百分点。改进模型在不同场景下的检测结果证明了该算法对小目标检测的准确率和实用性。

关键词: 口罩检测, YOLOv5, 注意力机制, 特征融合, 小目标检测

Abstract:

Wearing masks correctly during the COVID-19 pandemic can effectively prevent the spread of the virus. In response to the detection challenge posed by dense crowds and small detection targets in public places, a mask wearing detection algorithm based on the YOLOv5s model and integrating an attention mechanism was proposed. Four attention mechanisms were introduced into the backbone network of the YOLOv5s model to respectively suppress irrelevant information, enhance the ability of the feature map to express information, and improve the model?s detection ability for small-scale targets. Experimental results show that the introduction of the convolutional block attention module could increase the mAP value by 6.9 percentage points compared with the original network, with the greatest improvement among the four attention mechanisms. The normalization-based attention module also showed excellent performance, with the least quantity of parameters while losing a small amount of mAP. Through comparative experiments, the GIoU loss function was selected to calculate the bounding box regression loss, resulting in further improvements to positioning accuracy, resulting in an mAP value that was improved by 8.5 percentage points compared to the original network. The detection results of the improved model in different scenarios prove the accuracy and practicability of the algorithm for small target detection.

Key words: mask detection, YOLOv5, attention mechanism, feature fusion, small target detection

中图分类号:

TP391

李小波, 李阳贵, 郭宁, 范震. 融合注意力机制的YOLOv5口罩检测算法[J]. 图学学报, 2023, 44(1): 16-25.

LI Xiao-bo, LI Yang-gui, GUO Ning, FAN Zhen. Mask detection algorithm based on YOLOv5 integrating attention mechanism[J]. Journal of Graphics, 2023, 44(1): 16-25.

图/表 18

图1 YOLOv5s模型网络结构图

Fig. 1 YOLOv5s network structure diagram

图2 Mosaic数据增强

Fig. 2 Mosaic data augmentation

图3 FPN+PAN结构融合特征过程

Fig. 3 The fusion feature process of FPN+PAN structure

图4 注意力模块的引入位置

Fig. 4 The introduction position of attention module

图5 SE模块的基本结构

Fig. 5 The structure of SE module

图6 CBAM模块的基本结构

Fig. 6 The structure of CBAM module

图7 CAM和SAM的处理过程

Fig. 7 The process of CAM and SAM

图8 CA模块的基本结构

Fig. 8 The structure of CA module

图9 NAM模块中的CAM和SAM

Fig. 9 The structure of CAM and SAM in NAM module

图10 GIoU loss公式中的符号含义

Fig. 10 The symbolic meaning in GIoU loss formula ((a) A, B, C; (b) A∩B; (c) A∪B; (d) C-A∪B)

图11 示例图片((a)数据集图片示例；(b)标注示例)

Fig. 11 Example pictures ((a) Example of dataset pictures; (b) Example of annotation pictures)

表1 YOLO格式标注文件内容

Table 1 The contents of the YOLO format annotation file

编号	Class	Center_x	Center_y	Width	Height
0 (左下)	0	0.208 065	0.594 470	0.129 032	0.253 456
1 (左上)	1	0.301 613	0.395 161	0.103 226	0.168 203
2 (右上)	0	0.662 097	0.364 055	0.111 290	0.161 290
3 (右下)	1	0.789 516	0.597 926	0.104 839	0.191 244

图12 锚框中心点的位置和大小分布((a)位置分布；(b)大小分布)

Fig. 12 Location and size distribution of anchor box ((a) Location distribution; (b) Size distribution)

表2 实验环境配置

Table 2 The experimental environment configuration

参数	配置
操作系统	Ubuntu 20.04.2 LTS
CPU	Intel(R) Xeon(R) CPU E5-2603 v4 @1.70 GHz
GPU	GeForce GTX 1080 Ti
编程语言	Python3.8.13
深度学习框架	PyTorch1.9.1
加速环境	CUDA11.1+cudnn8.3.3

表3 4种注意力机制性能比较

Table 3 The performance comparison of four attention mechanisms

算法	类别	P	R	AP/mAP (%)	Parameters
	nomask	0.759	0.693	73.6
YOLOv5s	mask	0.810	0.792	80.7	7 015 519
	all	0.785	0.743	77.1
	nomask	0.841	0.753	80.9
YOLOv5s-SE	mask	0.845	0.820	85.0	7 222 879
	all	0.843	0.786	82.9
	nomask	0.920	0.727	81.9
YOLOv5s-CBAM	mask	0.915	0.791	86.2	7 222 977
	all	0.917	0.759	84.0
	nomask	0.905	0.737	81.4
YOLOv5s-CA	mask	0.898	0.802	85.8	7 215 759
	all	0.901	0.770	83.6
	nomask	0.883	0.743	81.1
YOLOv5s-NAM	mask	0.886	0.822	86.5	7 191 135
	all	0.885	0.783	83.8

图13 不同算法的P-R曲线

Fig. 13 The P-R curves of different algorithms ((a) YOLOv5s; (b) YOLOv5s-SE; (c) YOLOv5s-CBAM; (d) YOLOv5s-CA; (e) YOLOv5s-NAM)

图14 不同算法的检测效果对比((a)目标稀疏；(b)目标密集；(c)目标更为密集；(d)目标非常密集)

Fig. 14 Comparison of detection effects of different algorithms ((a) Sparse targets; (b) Dense targets; (c) More dense targets; (d) Very dense targets)

表4 CIoU和GIoU对算法结果的影响

Table 4 Influence of CIoU and GIoU on algorithm results

Loss	AP (%)		mAP (%)
Loss	mask	nomask	mAP (%)
CIoU	86.2	81.9	84.0
GIoU	87.9	83.3	85.6

参考文献 30

[1]	周艳萍, 饶翮, 姜怡, 等. 新型冠状病毒肺炎疫情后期公众正确使用口罩调查分析[J]. 药物流行病学杂志, 2021, 30(3): 205-209.
	ZHOU Y P, RAO H, JIANG Y, et al. Analysis on the accuracy of public use of masks in the post-epidemic period of COVID-19[J]. Chinese Journal of Pharmacoepidemiology, 2021, 30(3): 205-209 (in Chinese).
[2]	LEUNG N H L, CHU D K W, SHIU E Y C, et al. Respiratory virus shedding in exhaled breath and efficacy of face masks[J]. Nature Medicine, 2020, 26(5): 676-680. DOI PMID
[3]	左双燕, 陈玉华, 曾翠, 等. 各国口罩应用范围及相关标准介绍[J]. 中国感染控制杂志, 2020, 19(2): 109-116.
	ZUO S Y, CHEN Y H, ZENG C, et al. Application scope and relevant standards of masks in various countries[J]. Chinese Journal of Infection Control, 2020, 19(2): 109-116 (in Chinese).
[4]	曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722.
	CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022, 27(6): 1697-1722 (in Chinese).
[5]	包晓敏, 王思琪. 基于深度学习的目标检测算法综述[J]. 传感器与微系统, 2022, 41(4): 5-9.
	BAO X M, WANG S Q. Survey of object detection algorithm based on deep learning[J]. Transducer and Microsystem Technologies, 2022, 41(4): 5-9 (in Chinese).
[6]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[7]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-02-05]. https://arxiv.org/abs/1804.02767.
[8]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-02-03]. https://arxiv.org/abs/2004.10934.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[10]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007.
[11]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 580-587.
[12]	GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448.
[13]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[14]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988.
[15]	陈昭俊, 储珺, 曾伦杰. 基于动态加权类别平衡损失的多类别口罩佩戴检测[EB/OL]. (2022-04-20) [2022-05-03]. https://kns.cnki.net/kcms/detail/10.1034.T.20220419.1111.002.html.
	CHEN Z J, CHU J, ZENG L J. Multi category mask wearing detection based on dynamic weighted category balance loss[EB/OL]. (2022-04-20) [2022-05-03]. https://kns.cnki.net/kcms/detail/10.1034.T.20220419.1111.002.html(in Chinese).
[16]	张修宝, 林子原, 田万鑫, 等. 全天候自然场景下的人脸佩戴口罩识别技术[J]. 中国科学: 信息科学, 2020, 50(7): 1110-1120.
	ZHANG X B, LIN Z Y, TIAN W X, et al. Mask-wearing recognition in the wild[J]. Scientia Sinica: Informationis, 2020, 50(7): 1110-1120 (in Chinese). DOI URL
[17]	牛作东, 覃涛, 李捍东, 等. 改进RetinaFace的自然场景口罩佩戴检测算法[J]. 计算机工程与应用, 2020, 56(12): 1-7. DOI
	NIU Z D, QIN T, LI H D, et al. Improved algorithm of RetinaFace for natural scene mask wear detection[J]. Computer Engineering and Applications, 2020, 56(12): 1-7 (in Chinese). DOI
[18]	彭成, 张乔虹, 唐朝晖, 等. 基于YOLOv5增强模型的口罩佩戴检测方法研究[J]. 计算机工程, 2022, 48(4): 39-49. DOI
	PENG C, ZHANG Q H, TANG Z H, et al. Research on mask wearing detection method based on YOLOv5 enhancement model[J]. Computer Engineering, 2022, 48(4): 39-49 (in Chinese). DOI
[19]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[20]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
[21]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13708-13717.
[22]	LIU Y C, SHAO Z R, TENG Y Y, et al. NAM: normalization- based attention module[EB/OL]. [2022-02-03]. https://arxiv.org/abs/2111.12419.
[23]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI PMID
[24]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 936-944.
[25]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8759-8768.
[26]	彭雅坤, 曹伊宁, 刘晓群. 基于YOLOv5s的滑雪人员检测研究[J]. 长江信息通信, 2021, 34(8): 24-26.
	PENG Y K, CAO Y N, LIU X Q. Research on the detection of skiers based on YOLOv5s[J]. Changjiang Information ＆ Communications, 2021, 34(8): 24-26 (in Chinese).
[27]	MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[EB/OL]. [2022-02-03]. https://arxiv.org/abs/1406.6247.
[28]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 658-666.
[29]	ZHENG Z H, WANG P, REN D W, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2022, 52(8): 8574-8586. DOI URL
[30]	杨其晟, 李文宽, 杨晓峰, 等. 改进YOLOv5的苹果花生长状态检测方法[J]. 计算机工程与应用, 2022, 58(4): 237-246. DOI
	YANG Q S, LI W K, YANG X F, et al. Improved YOLOv5 method for detecting growth status of apple flowers[J]. Computer Engineering and Applications, 2022, 58(4): 237-246 (in Chinese). DOI

融合注意力机制的YOLOv5口罩检测算法

Mask detection algorithm based on YOLOv5 integrating attention mechanism

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 30

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867.
[2]	宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭. 基于改进教师学生网络的隧道火灾检测[J]. 图学学报, 2023, 44(5): 978-987.
[3]	李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666.
[4]	郝帅, 赵新生, 马旭, 张旭, 何田, 侯李祥. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676.
[5]	李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709.
[6]	余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738.
[7]	胡欣, 周运强, 肖剑, 杨杰. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437.
[8]	毛爱坤, 刘昕明, 陈文壮, 宋绍楼. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455.
[9]	郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464.
[10]	罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472.
[11]	李雨, 闫甜甜, 周东生, 魏小鹏. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481.
[12]	刘冰, 叶成绪. 面向不平衡数据的肺部疾病细粒度分类模型[J]. 图学学报, 2023, 44(3): 513-520.
[13]	史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530.
[14]	吴文欢, 张淏坤. 融合空间十字注意力与通道注意力的语义分割网络[J]. 图学学报, 2023, 44(3): 531-539.
[15]	陆秋, 邵铧泽, 张云磊. 动态平衡多尺度特征融合的结直肠息肉分割[J]. 图学学报, 2023, 44(2): 225-232.