基于残差增强注意力的跨模态行人重识别

doi:10.11996/JG.j.2095-302X.2023010033

图学学报 ›› 2023, Vol. 44 ›› Issue (1): 33-40.DOI: 10.11996/JG.j.2095-302X.2023010033

• 图像处理与计算机视觉 • 上一篇下一篇

基于残差增强注意力的跨模态行人重识别

邵文斌(), 刘玉杰(), 孙晓瑞, 李宗民

中国石油大学(华东)计算机科学与技术学院，山东青岛 266580

收稿日期:2022-04-26 修回日期:2022-06-13 出版日期:2023-10-31 发布日期:2023-02-16
通讯作者: 刘玉杰
作者简介:邵文斌(1998-)，男，硕士研究生。主要研究方向为行人重识别、目标检测。E-mail：wbShao@s.upc.edu.cn
基金资助:
国家重点研发计划项目(2019YFF0301800);国家自然科学基金项目(61379106);山东省自然科学基金项目(ZR2013FM036);山东省自然科学基金项目(ZR2015FM011)

Cross modality person re-identification based on residual enhanced attention

SHAO Wen-bin(), LIU Yu-jie(), SUN Xiao-rui, LI Zong-min

College of Computer Science and Technology, China University of Petroleum (East China), Qingdao Shandong 266580, China

Received:2022-04-26 Revised:2022-06-13 Online:2023-10-31 Published:2023-02-16
Contact: LIU Yu-jie
About author:SHAO Wen-bin (1998-), master student. His main research interests cover person re-identification, object detection. E-mail：wbShao@s.upc.edu.cn
Supported by:
National Key Research and Development Program(2019YFF0301800);National Natural Science Foundation of China(61379106);Shandong Provincial Natural Science Foundation(ZR2013FM036);Shandong Provincial Natural Science Foundation(ZR2015FM011)

摘要/Abstract

摘要：

跨模态行人重识别主要面临2个问题：①成像机制不同所导致的红外图像和可见光图像之间的模态差异；②图像特征的身份判别性不足导致的类内差异。针对上述2个问题，基于残差增强注意力的跨模态行人重识别方法被提出用来提高行人特征的模态不变性和身份判别性。首先，设计网络浅层参数独立、网络深层参数共享的双路卷积神经网络作为骨干网络。然后，分析现有注意力机制存在的全局弱化，设计了残差增强注意力方法解决该问题，提升注意力机制的性能，将其分别应用在网络浅层的通道维度和深层的空间位置上，提升模型对于模态差异的消除能力和行人特征的身份鉴别能力。在SYSU-MM01和RegDB 2个数据集上进行的实验证明了该方法的先进性，大量的对比实验也充分证明本文方法的有效性。

关键词: 行人重识别, 残差增强, 注意力机制, 模态不变性, 神经网络

Abstract:

Cross modality person re-identification mainly faces two problems: ① Modality discrepancies between infrared and visible images caused by different imaging mechanisms. ② Intra-class discrepancies caused by the insufficient identity discrimination of features. In order to address the above two problems, a cross modality person re-identification method based on residual enhanced attention was proposed to improve the modality invariance and identity discrimination of pedestrian features. First, with non-shared parameters at the shallow network and shared parameters at the deep layer a dual-stream convolutional neural network was designed as the backbone. Then, the problem of global weakening in the existing attention mechanism was analyzed, and a residual enhancement method was designed to solve this problem and improve the performance of the attention mechanism. It was applied to the shallow channel dimension and deep spatial location of the network respectively. Sufficient experiments on the two datasets SYSU-MM01 and RegDB have proved the effectiveness of the method.

Key words: person re-identification, residual enhancement, attention mechanism, modality invariance, neural network

中图分类号:

TP391

邵文斌, 刘玉杰, 孙晓瑞, 李宗民. 基于残差增强注意力的跨模态行人重识别[J]. 图学学报, 2023, 44(1): 33-40.

SHAO Wen-bin, LIU Yu-jie, SUN Xiao-rui, LI Zong-min. Cross modality person re-identification based on residual enhanced attention[J]. Journal of Graphics, 2023, 44(1): 33-40.

图/表 8

参考文献 29

[1]	YE M, SHEN J, LIN G, et al. Deep learning for person re-identification: a survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 2872-2893. DOI URL
[2]	杨文娟, 王文明, 王全玉, 等. 基于感知哈希和视觉词袋模型的图像检索方法[J]. 图学学报, 2019, 40(3): 519-524.
	YANG W J, WANG W M, WANG Q Y, et al. Image retrieval method based on perceptual hash algorithm and bag of visual words[J]. Journal of Graphics, 2019, 40(3): 519-524 (in Chinese).
[3]	ZHAO H, TIAN M, SUN S, et al. Spindle net: person re-identification with human body region guided feature decomposition and fusion[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1077-1085.
[4]	GE Y X, ZHU F, CHEN D P, et al. Self-paced contrastive learning with hybrid memory for domain adaptive object re-id[EB/OL]. [2021-12-08]. https://blog.csdn.net/NGUever15/article/details/120556059.
[5]	CHEN H, WANG Y, LAGADEC B. Joint generative and contrastive learning for unsupervised person re-identification[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 2004-2013.
[6]	WU A, ZHENG W, YU H, et al. RGB-infrared cross-modality person re-identification[C]//2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 5380-5389.
[7]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[8]	WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C]// 2018 European Conference on Computer Vision. Cham: Springer International Publishin, 2018: 3-19.
[9]	DAI P Y, JI R R, WANG H, WU Q, el al. Cross-modality person re-identification with generative adversarial training[EB/OL]. [2021-12-08]. https://www.ijcai.org/Proceedings/2018/0094.pdf.
[10]	WANG Z X, WANG Z, ZHENG Y. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 618-626.
[11]	WANG G, ZHANG T, CHENG J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3623-3632.
[12]	ZHAO Z, LIU B, CHU Q, et al. Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3520-3528.
[13]	YE M, LAN X, WANG Z, et al. Bi-directional center- constrained top-ranking for visible thermal person re-identification[C]//IEEE Transactions on Information Pernsics and Security. New York: IEEE Press, 2020: 407-419.
[14]	HAO Y, WANG N, LI J, el al. Hsme: hypersphere manifold embedding for visible thermal person re-identification[C]//The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 8385-8392.
[15]	JIA M X, ZHAI Y P, LU S J, et al. A similarity inference metric for RGB-infrared cross-modality person re-identification[EB/OL]. [2021-12-08]. https://arxiv.org/abs/2007.01504
[16]	LI D, WEI X, HONG X, el al. Infrared-visible cross-modal person re-identification with an x modality[C]//The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 4610-4617.
[17]	YE M, SHEN J, CRANDALL D, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//2020 European Conference on Computer Vision - ECCV 2020. Cham: Springer International Publishin, 2020: 229-247.
[18]	CHEN Y, WAN L, LI Z, et al. Neural feature search for rgb-infrared person re-identification[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New York: IEEE Press, 2021: 587-597.
[19]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[EB/OL]. [2021-12-08]. https://blog.csdn.net/toda666/article/details/80384915.
[20]	LUO H, GU Y, LIAO X, et al. Bag of tricks and a strong baseline for deep person re-identification[EB/OL]. [2021-12-08]. https://openaccess.thecvf.com/content_CVPRW_2019/papers/TRMTMCT/Luo_Bag_of_Tricks_and_a_Strong_Baseline_for_Deep_Person_CVPRW_2019_paper.pdf.
[21]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2005: 886-893.
[22]	NGUYEN D T, HONG H G, KIM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors: Basel, Switzerland, 2017, 17(3): E605.
[23]	WANG G, ZHANG T, YANG Y, et al. Cross-modality paired-images generation for RGB-infrared person re-identification[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12144-12151.
[24]	DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 248-255.
[25]	KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL]. [2021-12-08]. https://arxiv.org/abs/1412.6980.
[26]	YE M, LAN X, LENG Q, et al. Cross-modality person re-identification via modality-aware collaborative ensemble learning[C]// 2020 IEEE Transactions on Image Processing. New York: IEEE Press, 2020: 9387-9399.
[27]	LU Y, WU Y, LIU B, et al. Cross-modality person re-identification with shared-specific feature transfer[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 13376-13386.
[28]	YE M, RUAN W J, DU B, et al. Channel augmented joint learning for visible-infrared recognition[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 13567-13576.
[29]	SELVARAJU R, COGSWELL M, DAS A. Grad-cam: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. DOI

方法	年份	All-search (%)								Indoor-search (%)
		Single-shot				Multi-shot				Single-shot				Multi-shot
		R-1	R-10	R-20	mAP	R-1	R-10	R-20	mAP	R-1	R-10	R-20	mAP	R-1	R-10	R-20	mAP
HOG^[21]	2005	2.76	18.30	31.90	4.24	3.82	22.80	37.60	2.16	3.22	24.70	44.50	7.25	4.75	29.20	49.40	3.51
LOMO^[22]	2015	3.64	23.20	37.30	4.53	4.70	28.20	43.10	2.28	5.75	34.40	54.90	10.20	7.36	40.40	60.30	5.64
Zero-Padding^[6]	2017	14.80	54.10	71.30	15.90	19.10	61.40	78.40	10.90	20.60	68.40	85.80	26.90	24.40	75.90	91.30	18.60
D-HSME^[14]	2019	20.68	62.74	77.95	23.12	-	-	-	-	-	-	-	-	-	-	-	-
cmGAN^[9]	2018	27.00	67.50	80.60	27.80	31.50	72.70	85.00	22.30	31.60	77.20	89.20	42.20	37.00	80.90	92.10	32.80
eBDTR^[13]	2020	27.82	67.34	81.34	28.42	-	-	-	-	-	-	-	-	-	-	-	-
D²RL^[10]	2019	28.90	70.60	82.40	29.20	-	-	-	-	-	-	-	-	-	-	-	-
JSIA-ReID^[23]	2020	38.10	80.70	89.90	36.90	45.10	85.70	93.80	29.50	43.80	86.20	94.20	52.90	52.70	91.10	96.40	42.70
AlignGAN^[11]	2019	42.40	85.00	93.70	40.70	51.50	89.40	95.70	33.90	45.90	87.60	94.40	54.30	57.10	92.70	97.40	45.30
X modality^[16]	2020	49.92	89.79	95.96	50.73	-	-	-	-	-	-	-	-	-	-	-	-
MACE^[26]	2020	51.64	87.25	94.44	50.11	-	-	-	-	57.35	93.02	97.47	64.79	-	-	-	-
DDAG^[17]	2020	54.75	90.39	95.81	53.02	-	-	-	-	61.02	94.06	98.41	67.98	-	-	-	-
SIM^[15]	2020	56.93	-	-	60.88	-	-	-	-	-	-	-	-	-	-	-	-
NFS^[18]	2021	56.91	91.34	96.52	55.45	63.51	94.42	97.81	48.56	62.79	96.53	99.07	69.79	70.03	97.70	99.51	61.45
CICL^[12]	2021	57.20	94.30	98.40	59.30	60.70	95.20	98.60	52.60	66.60	98.80	99.70	74.70	73.80	99.40	99.90	68.30
cm-SSFT^[27]	2020	61.60	89.20	93.90	63.20	63.40	91.20	95.70	62.00	70.50	94.90	97.70	72.60	73.00	96.30	99.10	72.40
CANet^[28]	2021	69.88	95.71	98.46	66.89	-	-	-	-	76.26	97.88	99.49	80.37	-	-	-	-
本文方法	2022	68.53	95.80	97.87	66.27	66.81	95.63	98.17	62.80	74.65	97.02	99.21	79.33	78.91	97.53	99.14	75.27

方法	年份	All-search (%)								Indoor-search (%)
		Single-shot				Multi-shot				Single-shot				Multi-shot
		R-1	R-10	R-20	mAP	R-1	R-10	R-20	mAP	R-1	R-10	R-20	mAP	R-1	R-10	R-20	mAP
HOG^[21]	2005	2.76	18.30	31.90	4.24	3.82	22.80	37.60	2.16	3.22	24.70	44.50	7.25	4.75	29.20	49.40	3.51
LOMO^[22]	2015	3.64	23.20	37.30	4.53	4.70	28.20	43.10	2.28	5.75	34.40	54.90	10.20	7.36	40.40	60.30	5.64
Zero-Padding^[6]	2017	14.80	54.10	71.30	15.90	19.10	61.40	78.40	10.90	20.60	68.40	85.80	26.90	24.40	75.90	91.30	18.60
D-HSME^[14]	2019	20.68	62.74	77.95	23.12	-	-	-	-	-	-	-	-	-	-	-	-
cmGAN^[9]	2018	27.00	67.50	80.60	27.80	31.50	72.70	85.00	22.30	31.60	77.20	89.20	42.20	37.00	80.90	92.10	32.80
eBDTR^[13]	2020	27.82	67.34	81.34	28.42	-	-	-	-	-	-	-	-	-	-	-	-
D²RL^[10]	2019	28.90	70.60	82.40	29.20	-	-	-	-	-	-	-	-	-	-	-	-
JSIA-ReID^[23]	2020	38.10	80.70	89.90	36.90	45.10	85.70	93.80	29.50	43.80	86.20	94.20	52.90	52.70	91.10	96.40	42.70
AlignGAN^[11]	2019	42.40	85.00	93.70	40.70	51.50	89.40	95.70	33.90	45.90	87.60	94.40	54.30	57.10	92.70	97.40	45.30
X modality^[16]	2020	49.92	89.79	95.96	50.73	-	-	-	-	-	-	-	-	-	-	-	-
MACE^[26]	2020	51.64	87.25	94.44	50.11	-	-	-	-	57.35	93.02	97.47	64.79	-	-	-	-
DDAG^[17]	2020	54.75	90.39	95.81	53.02	-	-	-	-	61.02	94.06	98.41	67.98	-	-	-	-
SIM^[15]	2020	56.93	-	-	60.88	-	-	-	-	-	-	-	-	-	-	-	-
NFS^[18]	2021	56.91	91.34	96.52	55.45	63.51	94.42	97.81	48.56	62.79	96.53	99.07	69.79	70.03	97.70	99.51	61.45
CICL^[12]	2021	57.20	94.30	98.40	59.30	60.70	95.20	98.60	52.60	66.60	98.80	99.70	74.70	73.80	99.40	99.90	68.30
cm-SSFT^[27]	2020	61.60	89.20	93.90	63.20	63.40	91.20	95.70	62.00	70.50	94.90	97.70	72.60	73.00	96.30	99.10	72.40
CANet^[28]	2021	69.88	95.71	98.46	66.89	-	-	-	-	76.26	97.88	99.49	80.37	-	-	-	-
本文方法	2022	68.53	95.80	97.87	66.27	66.81	95.63	98.17	62.80	74.65	97.02	99.21	79.33	78.91	97.53	99.14	75.27

方法	Single-shot and All-search
方法	Rank-1	mAP
Baseline	58.17	56.69
Raw-SpaceAttention	65.67	62.50
Raw-ChannelAttention	63.43	62.41
Raw-DuanlAttention	69.53	67.52
Residual-SpaceAttention	68.34	66.57
Residual-ChannelAttention	67.80	66.23
Residual-DuanlAttention	68.53	66.27

方法	Single-shot and All-search
方法	Rank-1	mAP
Baseline	58.17	56.69
Raw-SpaceAttention	65.67	62.50
Raw-ChannelAttention	63.43	62.41
Raw-DuanlAttention	69.53	67.52
Residual-SpaceAttention	68.34	66.57
Residual-ChannelAttention	67.80	66.23
Residual-DuanlAttention	68.53	66.27

方法	Single-shot and All-search
方法	Rank-1	mAP
参数共享的单流网络	54.33	51.19
参数完全独立的双流网络	19.50	44.70
本文的网络结构	58.17	56.69

基于残差增强注意力的跨模态行人重识别

Cross modality person re-identification based on residual enhanced attention

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 29

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867.
[2]	蒋武君, 支力佳, 张少敏, 周涛. 基于通道残差嵌套U结构的CT影像肺结节分割方法[J]. 图学学报, 2023, 44(5): 879-889.
[3]	宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭. 基于改进教师学生网络的隧道火灾检测[J]. 图学学报, 2023, 44(5): 978-987.
[4]	毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639.
[5]	李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666.
[6]	李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709.
[7]	邓渭铭, 杨铁军, 李纯纯, 黄琳. 基于神经网络架构搜索的铭牌目标检测方法[J]. 图学学报, 2023, 44(4): 718-727.
[8]	余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738.
[9]	胡欣, 周运强, 肖剑, 杨杰. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437.
[10]	郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464.
[11]	罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472.
[12]	李雨, 闫甜甜, 周东生, 魏小鹏. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481.
[13]	肖天行, 吴静静. 基于残差和特征分块注意力的激光打码字符分割[J]. 图学学报, 2023, 44(3): 482-491.
[14]	吴文欢, 张淏坤. 融合空间十字注意力与通道注意力的语义分割网络[J]. 图学学报, 2023, 44(3): 531-539.
[15]	杨柳, 吴晓群. 基于深度学习的三维形状补全研究综述[J]. 图学学报, 2023, 44(2): 201-215.