Mask detection algorithm based on YOLOv5 integrating attention mechanism

doi:10.11996/JG.j.2095-302X.2023010016

Abstract

Abstract:

Wearing masks correctly during the COVID-19 pandemic can effectively prevent the spread of the virus. In response to the detection challenge posed by dense crowds and small detection targets in public places, a mask wearing detection algorithm based on the YOLOv5s model and integrating an attention mechanism was proposed. Four attention mechanisms were introduced into the backbone network of the YOLOv5s model to respectively suppress irrelevant information, enhance the ability of the feature map to express information, and improve the model?s detection ability for small-scale targets. Experimental results show that the introduction of the convolutional block attention module could increase the mAP value by 6.9 percentage points compared with the original network, with the greatest improvement among the four attention mechanisms. The normalization-based attention module also showed excellent performance, with the least quantity of parameters while losing a small amount of mAP. Through comparative experiments, the GIoU loss function was selected to calculate the bounding box regression loss, resulting in further improvements to positioning accuracy, resulting in an mAP value that was improved by 8.5 percentage points compared to the original network. The detection results of the improved model in different scenarios prove the accuracy and practicability of the algorithm for small target detection.

Key words: mask detection, YOLOv5, attention mechanism, feature fusion, small target detection

CLC Number:

TP391

LI Xiao-bo, LI Yang-gui, GUO Ning, FAN Zhen. Mask detection algorithm based on YOLOv5 integrating attention mechanism[J]. Journal of Graphics, 2023, 44(1): 16-25.

Figures/Tables 18

Fig. 1 YOLOv5s network structure diagram

Fig. 2 Mosaic data augmentation

Fig. 3 The fusion feature process of FPN+PAN structure

Fig. 4 The introduction position of attention module

Fig. 5 The structure of SE module

Fig. 6 The structure of CBAM module

Fig. 7 The process of CAM and SAM

Fig. 8 The structure of CA module

Fig. 9 The structure of CAM and SAM in NAM module

Fig. 10 The symbolic meaning in GIoU loss formula ((a) A, B, C; (b) A∩B; (c) A∪B; (d) C-A∪B)

Fig. 11 Example pictures ((a) Example of dataset pictures; (b) Example of annotation pictures)

Table 1 The contents of the YOLO format annotation file

编号	Class	Center_x	Center_y	Width	Height
0 (左下)	0	0.208 065	0.594 470	0.129 032	0.253 456
1 (左上)	1	0.301 613	0.395 161	0.103 226	0.168 203
2 (右上)	0	0.662 097	0.364 055	0.111 290	0.161 290
3 (右下)	1	0.789 516	0.597 926	0.104 839	0.191 244

Fig. 12 Location and size distribution of anchor box ((a) Location distribution; (b) Size distribution)

Table 2 The experimental environment configuration

参数	配置
操作系统	Ubuntu 20.04.2 LTS
CPU	Intel(R) Xeon(R) CPU E5-2603 v4 @1.70 GHz
GPU	GeForce GTX 1080 Ti
编程语言	Python3.8.13
深度学习框架	PyTorch1.9.1
加速环境	CUDA11.1+cudnn8.3.3

Table 3 The performance comparison of four attention mechanisms

算法	类别	P	R	AP/mAP (%)	Parameters
	nomask	0.759	0.693	73.6
YOLOv5s	mask	0.810	0.792	80.7	7 015 519
	all	0.785	0.743	77.1
	nomask	0.841	0.753	80.9
YOLOv5s-SE	mask	0.845	0.820	85.0	7 222 879
	all	0.843	0.786	82.9
	nomask	0.920	0.727	81.9
YOLOv5s-CBAM	mask	0.915	0.791	86.2	7 222 977
	all	0.917	0.759	84.0
	nomask	0.905	0.737	81.4
YOLOv5s-CA	mask	0.898	0.802	85.8	7 215 759
	all	0.901	0.770	83.6
	nomask	0.883	0.743	81.1
YOLOv5s-NAM	mask	0.886	0.822	86.5	7 191 135
	all	0.885	0.783	83.8

Fig. 13 The P-R curves of different algorithms ((a) YOLOv5s; (b) YOLOv5s-SE; (c) YOLOv5s-CBAM; (d) YOLOv5s-CA; (e) YOLOv5s-NAM)

Fig. 14 Comparison of detection effects of different algorithms ((a) Sparse targets; (b) Dense targets; (c) More dense targets; (d) Very dense targets)

Table 4 Influence of CIoU and GIoU on algorithm results

Loss	AP (%)		mAP (%)
Loss	mask	nomask	mAP (%)
CIoU	86.2	81.9	84.0
GIoU	87.9	83.3	85.6

References 30

[1]	周艳萍, 饶翮, 姜怡, 等. 新型冠状病毒肺炎疫情后期公众正确使用口罩调查分析[J]. 药物流行病学杂志, 2021, 30(3): 205-209.
	ZHOU Y P, RAO H, JIANG Y, et al. Analysis on the accuracy of public use of masks in the post-epidemic period of COVID-19[J]. Chinese Journal of Pharmacoepidemiology, 2021, 30(3): 205-209 (in Chinese).
[2]	LEUNG N H L, CHU D K W, SHIU E Y C, et al. Respiratory virus shedding in exhaled breath and efficacy of face masks[J]. Nature Medicine, 2020, 26(5): 676-680. DOI PMID
[3]	左双燕, 陈玉华, 曾翠, 等. 各国口罩应用范围及相关标准介绍[J]. 中国感染控制杂志, 2020, 19(2): 109-116.
	ZUO S Y, CHEN Y H, ZENG C, et al. Application scope and relevant standards of masks in various countries[J]. Chinese Journal of Infection Control, 2020, 19(2): 109-116 (in Chinese).
[4]	曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722.
	CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022, 27(6): 1697-1722 (in Chinese).
[5]	包晓敏, 王思琪. 基于深度学习的目标检测算法综述[J]. 传感器与微系统, 2022, 41(4): 5-9.
	BAO X M, WANG S Q. Survey of object detection algorithm based on deep learning[J]. Transducer and Microsystem Technologies, 2022, 41(4): 5-9 (in Chinese).
[6]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[7]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-02-05]. https://arxiv.org/abs/1804.02767.
[8]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-02-03]. https://arxiv.org/abs/2004.10934.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[10]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007.
[11]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 580-587.
[12]	GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448.
[13]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[14]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988.
[15]	陈昭俊, 储珺, 曾伦杰. 基于动态加权类别平衡损失的多类别口罩佩戴检测[EB/OL]. (2022-04-20) [2022-05-03]. https://kns.cnki.net/kcms/detail/10.1034.T.20220419.1111.002.html.
	CHEN Z J, CHU J, ZENG L J. Multi category mask wearing detection based on dynamic weighted category balance loss[EB/OL]. (2022-04-20) [2022-05-03]. https://kns.cnki.net/kcms/detail/10.1034.T.20220419.1111.002.html(in Chinese).
[16]	张修宝, 林子原, 田万鑫, 等. 全天候自然场景下的人脸佩戴口罩识别技术[J]. 中国科学: 信息科学, 2020, 50(7): 1110-1120.
	ZHANG X B, LIN Z Y, TIAN W X, et al. Mask-wearing recognition in the wild[J]. Scientia Sinica: Informationis, 2020, 50(7): 1110-1120 (in Chinese). DOI URL
[17]	牛作东, 覃涛, 李捍东, 等. 改进RetinaFace的自然场景口罩佩戴检测算法[J]. 计算机工程与应用, 2020, 56(12): 1-7. DOI
	NIU Z D, QIN T, LI H D, et al. Improved algorithm of RetinaFace for natural scene mask wear detection[J]. Computer Engineering and Applications, 2020, 56(12): 1-7 (in Chinese). DOI
[18]	彭成, 张乔虹, 唐朝晖, 等. 基于YOLOv5增强模型的口罩佩戴检测方法研究[J]. 计算机工程, 2022, 48(4): 39-49. DOI
	PENG C, ZHANG Q H, TANG Z H, et al. Research on mask wearing detection method based on YOLOv5 enhancement model[J]. Computer Engineering, 2022, 48(4): 39-49 (in Chinese). DOI
[19]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[20]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
[21]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13708-13717.
[22]	LIU Y C, SHAO Z R, TENG Y Y, et al. NAM: normalization- based attention module[EB/OL]. [2022-02-03]. https://arxiv.org/abs/2111.12419.
[23]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI PMID
[24]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 936-944.
[25]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8759-8768.
[26]	彭雅坤, 曹伊宁, 刘晓群. 基于YOLOv5s的滑雪人员检测研究[J]. 长江信息通信, 2021, 34(8): 24-26.
	PENG Y K, CAO Y N, LIU X Q. Research on the detection of skiers based on YOLOv5s[J]. Changjiang Information ＆ Communications, 2021, 34(8): 24-26 (in Chinese).
[27]	MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[EB/OL]. [2022-02-03]. https://arxiv.org/abs/1406.6247.
[28]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 658-666.
[29]	ZHENG Z H, WANG P, REN D W, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2022, 52(8): 8574-8586. DOI URL
[30]	杨其晟, 李文宽, 杨晓峰, 等. 改进YOLOv5的苹果花生长状态检测方法[J]. 计算机工程与应用, 2022, 58(4): 237-246. DOI
	YANG Q S, LI W K, YANG X F, et al. Improved YOLOv5 method for detecting growth status of apple flowers[J]. Computer Engineering and Applications, 2022, 58(4): 237-246 (in Chinese). DOI