基于改进YOLOv5s的着装不规范检测算法研究

doi:10.11996/JG.j.2095-302X.2024030433

图学学报 ›› 2024, Vol. 45 ›› Issue (3): 433-445.DOI: 10.11996/JG.j.2095-302X.2024030433

• 图像处理与计算机视觉 • 上一篇下一篇

基于改进YOLOv5s的着装不规范检测算法研究

李跃华(), 仲新, 姚章燕, 胡彬()

南通大学信息科学技术学院，江苏南通 226000

收稿日期:2023-10-20 接受日期:2024-01-30 出版日期:2024-06-30 发布日期:2024-06-06
通讯作者:胡彬(1985-)，男，讲师，博士。主要研究方向为计算机视觉。E-mail：hubin@ntu.edu.cn
第一作者:李跃华(1977-)，男，副教授，硕士。主要研究方向为嵌入式系统、物联网和智能控制技术。E-mail：lyh@ntu.edu.cn
基金资助:
国家自然科学基金项目(62072259);国家自然科学基金青年科学基金项目(62102199)

Detection of dress code violations based on improved YOLOv5s

LI Yuehua(), ZHONG Xin, YAO Zhangyan, HU Bin()

School of Information Science and Technology, Nantong University, Nantong Jiangsu 226000, China

Received:2023-10-20 Accepted:2024-01-30 Published:2024-06-30 Online:2024-06-06
First author：LI Yuehua (1977-)，associate professor, master. His main research interests cover embedded systems, internet of things and intelligent control technology. E-mail：lyh@ntu.edu.cn
Supported by:
National Natural Science Foundation of China(62072259);National Natural Science Foundation-Young Scientists Fund(62102199)

摘要/Abstract

摘要：

针对餐饮后厨工作人员着装不规范，在复杂背景下采用现有算法检测精度低且易出现误检、漏检等问题，提出一种基于YOLOv5s的着装规范检测改进算法YOLOv5s-ESW。首先，在主干网络引入新型多尺度注意力机制改进C3模块，增强网络的特征提取能力；其次，在颈部网络中采用空间和通道重建卷积模块(SCConv)替换原始网络中的卷积模块(Conv)，减少模型参数冗余，同时提升模型的精度；最后，在预测部分引入WIoU损失函数更换CIoU损失函数，提高模型泛化能力，加快收敛速度。将改进算法应用到自建餐饮后厨工作人员着装数据集中进行实验，实验表明，改进后的模型检测平均精度提升了4.1%，参数量减少了11.4%。该模型在提高了检测精度的同时降低了网络复杂度，能够满足餐饮后厨工作人员的着装规范检测的要求。

关键词: 着装规范检测, 注意力机制, 卷积, 损失函数, YOLOv5s-ESW算法

Abstract:

Addressing the issue of non-compliance in the attire of culinary staff in the complex background of the catering kitchen, where existing algorithms tend to have low detection accuracy and are prone to false detections and omissions, this paper proposed an improved attire compliance detection algorithm, YOLOv5s-ESW, based on YOLOv5s. Firstly, a novel multi-scale attention mechanism was introduced into the main network to enhance the network’s feature extraction capability. Secondly, within the neck network, the spatial and channel reconstruction convolution module (SCConv) replaced the original convolution module (Conv) to reduce model parameter redundancy and simultaneously enhanced model accuracy. Lastly, the WIoU loss function was introduced in the prediction part to accelerate convergence and enhance the model’s generalization capability. The improved algorithm was applied to a self-compiled dataset of catering kitchen staff attire for experimentation. The results validated that the improved model has elevated its mean detection accuracy by 4.1% and reduced its parameter quantity by 11.4%. While enhancing detection accuracy, the model also reduced network complexity, thereby satisfying the requirements for attire compliance detection among catering kitchen staff.

Key words: dress code detection, attention mechanism, convolution, loss function, YOLOv5s-ESW algorithm

中图分类号:

TP391

李跃华, 仲新, 姚章燕, 胡彬. 基于改进YOLOv5s的着装不规范检测算法研究[J]. 图学学报, 2024, 45(3): 433-445.

LI Yuehua, ZHONG Xin, YAO Zhangyan, HU Bin. Detection of dress code violations based on improved YOLOv5s[J]. Journal of Graphics, 2024, 45(3): 433-445.

图/表 22

图1 YOLOv5s模型网络结构图

Fig. 1 YOLOv5s network structure diagram

图2 检测框分布图

Fig. 2 Distribution of detection boxes

图3 C3EMA结构图

Fig. 3 Structure of C3EMA

图4 EMA注意力整体结构图

Fig. 4 Depicts the overall structure of EMA attention

图5 SCConv模块整体结构图

Fig. 5 Overall structure of the SCConv module

图6 SRU模块结构图

Fig. 6 Structure of the SRU module

图7 CRU模块结构图

Fig. 7 Structure of the CRU module

图8 改进后的YOLOv5s网络结构

Fig. 8 Structure of the improved YOLOv5s

图9 WIoU参数示意图

Fig. 9 Schematic of the WIoU parameter

表1 实验配置环境

Table 1 The experimental environment configuration

项目	版本环境
Operating System	Windows 11
CPU	Intel i7-10870H
GPU	NVIDIA RTX 2080
Python	3.7.12
Pytorch	1.9.0
CUDA	12.0
cuDNN	8.5.0

表2 数据集类别及标注数量

Table 2 Dataset gategories and annotation quantities

类别	数量
Hat	4 361
No hat	6 451
Mask	5 711
No mask	6 674
Cloth	5 436
No cloth	7 583

图10 部分检测数据集((a)存在未戴口罩及未戴帽子；(b)存在未戴口罩及未穿服装；(c)存在未戴口罩及未穿服装未戴帽子；(d)存在未戴口罩；(e)不存在违规着装

Fig. 10 Part of the detection dataset ((a) Presence of not wearing masks and not wearing hats; (b) Presence of not wearing masks and not wearing clothes; (c) Presence of not wearing masks, not wearing clothes, and not wearing hats; (d) Presence of not wearing masks; (e) Absence of dress code violations)

表3 自建着装数据集的样式规定

Table 3 Style requirements for self-created clothing datasets

检测目标	样式
厨师服	厨师服(不区分长短袖)：白色、蓝色
厨师帽	厨师帽(高)、厨工帽(低)：白色
口罩	一次性无纺布口罩：蓝色、白色

表4 分类结果的混淆矩阵

Table 4 Confusion matrix of classification results

真实情况	预期结果
真实情况	正例	反例
正例	TP (真正例)	FN (真反例)
反例	FP (假正例)	TN (假反例)

图11 训练损失曲线((a)分类损失；(b)置信度损失；(c)定位损失）

Fig. 11 Model training loss curve ((a) Classification loss; (b) Confidence loss; (c) Localization loss)

表5 与其他注意力机制结果对比

Table 5 Comparison with other attention mechanisms

Method	Precision	Recall	mAP	Parametars/M
YOLOv5s	0.849	0.855	0.851	7.36
YOLOv5s+C3CBAM	0.857	0.854	0.846	7.39
YOLOv5s+C3ECA	0.851	0.853	0.845	7.45
YOLOv5s+C3CA	0.865	0.857	0.857	7.41
YOLOv5s+C3EMA	0.869	0.870	0.867	7.38

表6 YOLOv5s-ESW在自建着装数据集上的消融实验

Table 6 Ablation experiments of YOLOv5s-ESW on a custom clothing dataset

Method	Precision	Recall	mAP	Parameters/M	FPS
YOLOv5s	0.849	0.855	0.851	7.36	65.3
YOLOv5s+C3EMA	0.869	0.870	0.867	7.38	67.7
YOLOv5s+SCConv	0.863	0.869	0.862	5.76	78.3
YOLOv5s+WIoU	0.864	0.867	0.865	7.36	66.5
YOLOv5s+C3EMA+ SCConv	0.881	0.885	0.875	6.55	68.6
YOLOv5s-ESW (Ours)	0.893	0.897	0.892	6.52	70.1

表7 YOLOv5s-ESW在PASCAL VOC2012数据集上的消融实验

Table 7 Ablation experiments of YOLOv5s-ESW on the PASCAL VOC2012 dataset

Method	Precision	Recall	mAP	Parameters/M	FPS
YOLOv5s	0.724	0.736	0.728	7.11	61.7
YOLOv5s+C3EMA	0.740	0.745	0.741	7.19	58.3
YOLOv5s+SCConv	0.731	0.738	0.733	5.53	69.7
YOLOv5s+WIoU	0.738	0.743	0.740	7.15	60.2
YOLOv5s+C3EMA+ SCConv	0.745	0.748	0.746	6.37	65.9
YOLOv5s-ESW (Ours)	0.761	0.764	0.763	6.31	66.3

表8 不同目标检测算法在自建着装数据集上的对比实验

Table 8 Comparison of different object detection algorithms on on a custom clothing dataset

Model	AP0.5/%			Precision	Recall	mAP	Parameters/M	FPS
Model	Hat	Mask	Cloth	Precision	Recall	mAP	Parameters/M	FPS
FasterR-CNN	0.653	0.642	0.726	0.673	0.681	0.601	40.21	11.8
SSD	0.702	0.714	0.709	0.708	0.721	0.713	26.15	52.7
YOLOv5s	0.842	0.851	0.856	0.849	0.855	0.851	7.36	65.3
文献[29]	0.824	0.859	0.871	0.851	0.856	0.847	3.52	90.6
文献[30]	0.848	0.851	0.863	0.854	0.858	0.843	10.31	57.8
本文模型	0.891	0.887	0.897	0.893	0.897	0.892	6.52	70.1

表9 不同目标检测算法在PASCAL VOC2012数据集的对比实验

Table 9 Comparative experiments of different object detection algorithms on the PASCAL VOC2012 dataset

算法	输入尺寸	主干网络	mAP	FPS
FasterR-CNN	-	VGG	0.732	18.7
SSD	300	VGG	0.768	19.0
Efficient-D0	512	Efficient-B0	0.711	23.1
文献[32]	416	Darknet53-tiny	0.671	70.0
YOLOv4-tiny	416	CSPDarknet53s	0.663	108.0
YOLOv5s	640	CSPDarknet53	0.728	61.7
SD-YOLO	600	CSPDarknet53	0.692	78.0
YOLO-DAW	640	CSPDarknet53	0.686	58.9
YOLOv5s-ESW (Ours)	640	CSPDarknet53	0.763	66.3

图12 原YOLOv5s检测结果((a)漏检问题改善前1；(b)置信度提高前；(c)漏检问题改善前2；(d)漏检问题改善前3)

Fig. 12 Original YOLOv5s detection results ((a) Before missing problem improved 1; (b) Before confidence level improved; (c) Before missing problem improved 2; (d) Before missing problem improved 3)

图13 本文YOLOv5s-ESW算法检测结果((a)漏检问题改善后1；(b)置信度提高后；(c)漏检问题改善后2；(d)漏检问题改善后3)

Fig. 13 Detection results of the YOLOv5s-ESW algorithm proposed in this paper ((a) After missing problem improved 1; (b) After confidence level improved; (c) After missing problem improved 2; (d) After missing problem improved 3)

参考文献 35

[1]	章温慧. 餐饮行业地方政府智慧监管的优化研究: 以苍南县为例[D]. 上海: 上海师范大学, 2021.
	ZHANG W H. Optimization of local government’s intelligent supervision of catering industry[D]. Shanghai: Shanghai Normal University, 2021 (in Chinese).
[2]	夏追平, 张志斌, 关堂敏. “阳光厨房” 项目对规范餐饮从业人员操作行为的影响[J]. 上海预防医学, 2021, 33(4): 340-344.
	XIA Z P, ZHANG Z B, GUAN T M. Influence of the “Sunshine Kitchen Project” on the standardized operation of catering industry[J]. Shanghai Journal of Preventive Medicine, 2021, 33(4): 340-344 (in Chinese).
[3]	蔚宏奎. 提升食品安全监管, 护航盛夏“烟火气”[J]. 食品安全导刊, 2023(24): 9.
	WEI H K. Improve food safety supervision and escort “fireworks” in midsummer[J]. China Food Safety Magazine, 2023(24): 9 (in Chinese).
[4]	FAN H L. Theoretical basis and system establishment of China food safety intelligent supervision in the perspective of Internet of Things[J]. IEEE Access, 2019, 7: 71686-71695.
[5]	许德刚, 王露, 李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25. DOI
	XU D G, WANG L, LI F. Review of typical object detection algorithms for deep learning[J]. Computer Engineering and Applications, 2021, 57(8): 10-25 (in Chinese). DOI
[6]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 580-587.
[7]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[8]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[9]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[10]	陈刚, 张培基, 龚冬冬, 等. 火电厂监控视频安全服检测方法研究[J]. 图学学报, 2023, 44(2): 291-297. DOI
	CHEN G, ZHANG P J, GONG D D, et al. Research on safety clothing detection method for surveillance video of thermal power plant[J]. Journal of Graphics, 2023, 44(2): 291-297 (in Chinese). DOI
[11]	刘欣宜, 张宝峰, 符烨, 等. 基于深度学习的污染场地作业人员着装规范性检测[J]. 中国安全生产科学技术, 2020, 16(7): 169-175.
	LIU X Y, ZHANG B F, FU Y, et al. Detection on normalization of operating personnel dressing at contaminated sites based on deep learning[J]. Journal of Safety Science and Technology, 2020, 16(7): 169-175 (in Chinese).
[12]	CHEN Z H, ZHANG F, LIU H B, et al. Real-time detection algorithm of helmet and reflective vest based on improved YOLOv5[J]. Journal of Real-Time Image Processing, 2023, 20(1): 4.
[13]	林其雄, 陈畅, 闫云凤, 等. 一种基于特征引导的电力施工场景工装合规穿戴二阶段检测算法[J]. 浙江电力, 2022, 41(4): 44-50.
	LIN Q X, CHEN C, YAN Y F, et al. A two-stage detection algorithm for workwear compliance in power construction scenarios based on feature guidance[J]. Zhejiang Electric Power, 2022, 41(4): 44-50 (in Chinese).
[14]	OUYANG D L, HE S, ZHANG G Z, et al. Efficient multi-scale attention module with cross-spatial learning[C]// ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2023: 1-5.
[15]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[16]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13708-13717.
[17]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[18]	DONGYOON H, YOUNGJOON Y, BEOMYOUNGE K, et al. Learning features with parameter-free layers[EB/OL]. [2023-05-08]. https://doi.org/10.48550/arXiv.2202.02777.
[19]	HAN S, MAO H Z, DALLY WILLIAM J. Deep compression: compressing deep neural networks with pruning. traincd quantization and huffman coding[EB/OL]. [2023-04-04]. https://arxiv.org/abs/1510.00149.
[20]	DENTON E, ZAREMBA W, BRUNA J, et al. Exploiting linear structure within convolutional networks for efficient evaluation[J]. Advances in Neural Information Processing Systems, 2014, 2(January): 1269-1277.
[21]	HINTON G, VINYALS O, DEAN J, et al. Distilling the knowledge in a neural network[EB/OL]. [2023-03-04]. https://arxiv.org/abs/1503.02531.
[22]	LI J F, WEN Y, HE L H. SCConv: spatial and channel reconstruction convolution for feature redundancy[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 6153-6162.
[23]	WU Y, HE K. Group normalization[C]// European Conference on Computer Vision. Cham: Springer. 2018: 3-19.
[24]	LI X, WANG W H, HU X L, et al. Selective kernel networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 510-519.
[25]	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[26]	TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic gocusing mechanism[EB/OL]. (2023-08-08) [2023-08-12]. https://arxiv.org/abs/2301.10051.
[27]	SHETRY S. Application of convolutional neural network for image classification on Pascal VOC challenge 2012 dataset[EB/OL]. (2016-01-01) [2023-01-13]. https://arXiv.org/abs/:1607.03785.
[28]	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[EB/OL]. (2019-10-08) [2023-01-11]. https://arxiv.org/abs/1910.03151vl.
[29]	孙建诚, 杨舒涵, 龚芳媛, 等. 基于改进YOLOv5的复杂背景下路面裂缝检测[J]. 中国科技论文, 2023, 18(7): 779-785.
	SUN J C, YANG S H, GONG F Y, et al. Pavement crack detection in complex background based on improved YOLOv5[J]. China Sciencepaper, 2023, 18(7): 779-785 (in Chinese).
[30]	祁泽政, 徐银霞. 改进YOLOv5s算法的安全帽佩戴检测研究[J]. 计算机工程与应用, 2023, 59(14): 176-183. DOI
	QI Z Z, XU Y X. Research on helmet wearing detection of improved YOLOv5s algorithm[J]. Computer Engineering and Applications, 2023, 59(14): 176-183 (in Chinese). DOI
[31]	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10778-10787.
[32]	张秀花, 静茂凯, 袁永伟, 等. 基于改进YOLOv3-Tiny的番茄苗分级检测[J]. 农业工程学报, 2022, 38(1): 221-229.
	ZHANG X H, JING M K, YUAN Y W, et al. Tomato seedling classification detection using improved YOLOv3-Tiny[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(1): 221-229 (in Chinese).
[33]	BOCHKOVSKIY A, WANG C Y, LIAO H M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. [2023-01-23]. https://doi.org/10.48550/arXiv.2004.10934.
[34]	孟彩霞, 王兆楠, 石磊, 等. 改进YOLOv5s的铁路异物入侵检测算法[J/OL]. 小型微型计算机系统, 1-10. [2023-08-28]. http://kns.cnki.net/kcms/detail/21.1106.TP.20230217.1616.007.html.
	WENG C X, WANG Z N, SHI L, et al. Improved railroad foreign object intrusion detection algorithm for YOLOv5s[J/OL]. Journal of Chinese Computer Systems. 1-10. [2023-08-28]. http://kns.cnki.net/kcms/detail/21.1106.TP.20230217.1616.007.html (in Chinese).
[35]	殷智伟, 邵家玉, 张宁. YOLO-DAW: 基于窗口内部双重注意力机制的目标检测模型[J]. 东南大学学报: 自然科学版, 2023, 53(4): 718-724.
	YIN Z W, SHAO J Y, ZHANG N. YOLO-DAW: object detection model based on dual attention mechanism within windows[J]. Journal of Southeast University: Natural Science Edition, 2023, 53(4): 718-724 (in Chinese).

基于改进YOLOv5s的着装不规范检测算法研究

Detection of dress code violations based on improved YOLOv5s

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 22

参考文献 35

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李大湘, 吉展, 刘颖, 唐垚. 改进YOLOv7遥感图像目标检测算法[J]. 图学学报, 2024, 45(4): 650-658.
[2]	魏敏, 姚鑫. 基于多尺度与注意力机制的两阶段风暴单体外推研究[J]. 图学学报, 2024, 45(4): 696-704.
[3]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[4]	曾志超, 徐玥, 王景玉, 叶元龙, 黄志开, 王欢. 基于SOE-YOLO轻量化的水面目标检测算法[J]. 图学学报, 2024, 45(4): 736-744.
[5]	李松洋, 王雪婷, 陈相龙, 陈恩庆. 基于骨骼点动态时域滤波的人体动作识别[J]. 图学学报, 2024, 45(4): 760-769.
[6]	武兵, 田莹. 基于注意力机制的多尺度道路损伤检测算法研究[J]. 图学学报, 2024, 45(4): 770-778.
[7]	赵磊, 李栋, 房建东, 曹琪. 面向交通标志的改进YOLO目标检测算法[J]. 图学学报, 2024, 45(4): 779-790.
[8]	梁成武, 杨杰, 胡伟, 蒋松琪, 钱其扬, 侯宁. 基于时间动态帧选择与时空图卷积的可解释骨架行为识别[J]. 图学学报, 2024, 45(4): 791-803.
[9]	张相胜, 杨骁. 基于改进YOLOv7-tiny的橡胶密封圈缺陷检测方法[J]. 图学学报, 2024, 45(3): 446-453.
[10]	李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
[11]	艾列富, 陶勇, 蒋常玉. 基于全局注意力的正交融合图像描述符[J]. 图学学报, 2024, 45(3): 472-481.
[12]	黄友文, 林志钦, 章劲, 陈俊宽. 结合坐标Transformer的轻量级人体姿态估计算法[J]. 图学学报, 2024, 45(3): 516-527.
[13]	路龙飞, 王峻峰, 赵世闻, 李广, 丁鑫涛. 基于力位感知技能学习的轴孔柔顺装配方法[J]. 图学学报, 2024, 45(2): 250-258.
[14]	郭宗洋, 刘立东, 蒋东华, 刘子翔, 朱熟康, 陈京华. 基于语义引导神经网络的人体动作识别算法[J]. 图学学报, 2024, 45(1): 26-34.
[15]	吕伶, 李华, 王武. 基于增强特征提取网络与语义特征融合的多方向文本检测[J]. 图学学报, 2024, 45(1): 56-64.