图学学报 ›› 2023, Vol. 44 ›› Issue (1): 33-40.DOI: 10.11996/JG.j.2095-302X.2023010033
收稿日期:2022-04-26
									
				
											修回日期:2022-06-13
									
				
									
				
											出版日期:2023-10-31
									
				
											发布日期:2023-02-16
									
			通讯作者:
					刘玉杰
							作者简介:邵文斌(1998-),男,硕士研究生。主要研究方向为行人重识别、目标检测。E-mail:wbShao@s.upc.edu.cn
				
							基金资助:
        
               		SHAO Wen-bin(
), LIU Yu-jie(
), SUN Xiao-rui, LI Zong-min
			  
			
			
			
                
        
    
Received:2022-04-26
									
				
											Revised:2022-06-13
									
				
									
				
											Online:2023-10-31
									
				
											Published:2023-02-16
									
			Contact:
					LIU Yu-jie   
							About author:SHAO Wen-bin (1998-), master student. His main research interests cover person re-identification, object detection. E-mail:wbShao@s.upc.edu.cn				
							Supported by:摘要:
跨模态行人重识别主要面临2个问题:①成像机制不同所导致的红外图像和可见光图像之间的模态差异;②图像特征的身份判别性不足导致的类内差异。针对上述2个问题,基于残差增强注意力的跨模态行人重识别方法被提出用来提高行人特征的模态不变性和身份判别性。首先,设计网络浅层参数独立、网络深层参数共享的双路卷积神经网络作为骨干网络。然后,分析现有注意力机制存在的全局弱化,设计了残差增强注意力方法解决该问题,提升注意力机制的性能,将其分别应用在网络浅层的通道维度和深层的空间位置上,提升模型对于模态差异的消除能力和行人特征的身份鉴别能力。在SYSU-MM01和RegDB 2个数据集上进行的实验证明了该方法的先进性,大量的对比实验也充分证明本文方法的有效性。
中图分类号:
邵文斌, 刘玉杰, 孙晓瑞, 李宗民. 基于残差增强注意力的跨模态行人重识别[J]. 图学学报, 2023, 44(1): 33-40.
SHAO Wen-bin, LIU Yu-jie, SUN Xiao-rui, LI Zong-min. Cross modality person re-identification based on residual enhanced attention[J]. Journal of Graphics, 2023, 44(1): 33-40.
																													图2 双流网络((a)参数完全独立的双流网络;(b)本文使用的双流网络)
Fig. 2 Dual-stream network ((a) Parameter independent dual-stream network; (b) Dual-stream network applied in this paper)
| 方法 | 年份 | All-search (%) | Indoor-search (%) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Single-shot | Multi-shot | Single-shot | Multi-shot | ||||||||||||||
| R-1 | R-10 | R-20 | mAP | R-1 | R-10 | R-20 | mAP | R-1 | R-10 | R-20 | mAP | R-1 | R-10 | R-20 | mAP | ||
| HOG[ |  2005 | 2.76 | 18.30 | 31.90 | 4.24 | 3.82 | 22.80 | 37.60 | 2.16 | 3.22 | 24.70 | 44.50 | 7.25 | 4.75 | 29.20 | 49.40 | 3.51 | 
| LOMO[ |  2015 | 3.64 | 23.20 | 37.30 | 4.53 | 4.70 | 28.20 | 43.10 | 2.28 | 5.75 | 34.40 | 54.90 | 10.20 | 7.36 | 40.40 | 60.30 | 5.64 | 
| Zero-Padding[ |  2017 | 14.80 | 54.10 | 71.30 | 15.90 | 19.10 | 61.40 | 78.40 | 10.90 | 20.60 | 68.40 | 85.80 | 26.90 | 24.40 | 75.90 | 91.30 | 18.60 | 
| D-HSME[ |  2019 | 20.68 | 62.74 | 77.95 | 23.12 | - | - | - | - | - | - | - | - | - | - | - | - | 
| cmGAN[ |  2018 | 27.00 | 67.50 | 80.60 | 27.80 | 31.50 | 72.70 | 85.00 | 22.30 | 31.60 | 77.20 | 89.20 | 42.20 | 37.00 | 80.90 | 92.10 | 32.80 | 
| eBDTR[ |  2020 | 27.82 | 67.34 | 81.34 | 28.42 | - | - | - | - | - | - | - | - | - | - | - | - | 
| D2RL[ |  2019 | 28.90 | 70.60 | 82.40 | 29.20 | - | - | - | - | - | - | - | - | - | - | - | - | 
| JSIA-ReID[ |  2020 | 38.10 | 80.70 | 89.90 | 36.90 | 45.10 | 85.70 | 93.80 | 29.50 | 43.80 | 86.20 | 94.20 | 52.90 | 52.70 | 91.10 | 96.40 | 42.70 | 
| AlignGAN[ |  2019 | 42.40 | 85.00 | 93.70 | 40.70 | 51.50 | 89.40 | 95.70 | 33.90 | 45.90 | 87.60 | 94.40 | 54.30 | 57.10 | 92.70 | 97.40 | 45.30 | 
| X modality[ |  2020 | 49.92 | 89.79 | 95.96 | 50.73 | - | - | - | - | - | - | - | - | - | - | - | - | 
| MACE[ |  2020 | 51.64 | 87.25 | 94.44 | 50.11 | - | - | - | - | 57.35 | 93.02 | 97.47 | 64.79 | - | - | - | - | 
| DDAG[ |  2020 | 54.75 | 90.39 | 95.81 | 53.02 | - | - | - | - | 61.02 | 94.06 | 98.41 | 67.98 | - | - | - | - | 
| SIM[ |  2020 | 56.93 | - | - | 60.88 | - | - | - | - | - | - | - | - | - | - | - | - | 
| NFS[ |  2021 | 56.91 | 91.34 | 96.52 | 55.45 | 63.51 | 94.42 | 97.81 | 48.56 | 62.79 | 96.53 | 99.07 | 69.79 | 70.03 | 97.70 | 99.51 | 61.45 | 
| CICL[ |  2021 | 57.20 | 94.30 | 98.40 | 59.30 | 60.70 | 95.20 | 98.60 | 52.60 | 66.60 | 98.80 | 99.70 | 74.70 | 73.80 | 99.40 | 99.90 | 68.30 | 
| cm-SSFT[ |  2020 | 61.60 | 89.20 | 93.90 | 63.20 | 63.40 | 91.20 | 95.70 | 62.00 | 70.50 | 94.90 | 97.70 | 72.60 | 73.00 | 96.30 | 99.10 | 72.40 | 
| CANet[ |  2021 | 69.88 | 95.71 | 98.46 | 66.89 | - | - | - | - | 76.26 | 97.88 | 99.49 | 80.37 | - | - | - | - | 
| 本文方法 | 2022 | 68.53 | 95.80 | 97.87 | 66.27 | 66.81 | 95.63 | 98.17 | 62.80 | 74.65 | 97.02 | 99.21 | 79.33 | 78.91 | 97.53 | 99.14 | 75.27 | 
表1 在SYSU-MM01数据集与现有方法的对比结果
Table 1 Comparison with the state-of-the-art methods on the SYSU-MM01 dataset
| 方法 | 年份 | All-search (%) | Indoor-search (%) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Single-shot | Multi-shot | Single-shot | Multi-shot | ||||||||||||||
| R-1 | R-10 | R-20 | mAP | R-1 | R-10 | R-20 | mAP | R-1 | R-10 | R-20 | mAP | R-1 | R-10 | R-20 | mAP | ||
| HOG[ |  2005 | 2.76 | 18.30 | 31.90 | 4.24 | 3.82 | 22.80 | 37.60 | 2.16 | 3.22 | 24.70 | 44.50 | 7.25 | 4.75 | 29.20 | 49.40 | 3.51 | 
| LOMO[ |  2015 | 3.64 | 23.20 | 37.30 | 4.53 | 4.70 | 28.20 | 43.10 | 2.28 | 5.75 | 34.40 | 54.90 | 10.20 | 7.36 | 40.40 | 60.30 | 5.64 | 
| Zero-Padding[ |  2017 | 14.80 | 54.10 | 71.30 | 15.90 | 19.10 | 61.40 | 78.40 | 10.90 | 20.60 | 68.40 | 85.80 | 26.90 | 24.40 | 75.90 | 91.30 | 18.60 | 
| D-HSME[ |  2019 | 20.68 | 62.74 | 77.95 | 23.12 | - | - | - | - | - | - | - | - | - | - | - | - | 
| cmGAN[ |  2018 | 27.00 | 67.50 | 80.60 | 27.80 | 31.50 | 72.70 | 85.00 | 22.30 | 31.60 | 77.20 | 89.20 | 42.20 | 37.00 | 80.90 | 92.10 | 32.80 | 
| eBDTR[ |  2020 | 27.82 | 67.34 | 81.34 | 28.42 | - | - | - | - | - | - | - | - | - | - | - | - | 
| D2RL[ |  2019 | 28.90 | 70.60 | 82.40 | 29.20 | - | - | - | - | - | - | - | - | - | - | - | - | 
| JSIA-ReID[ |  2020 | 38.10 | 80.70 | 89.90 | 36.90 | 45.10 | 85.70 | 93.80 | 29.50 | 43.80 | 86.20 | 94.20 | 52.90 | 52.70 | 91.10 | 96.40 | 42.70 | 
| AlignGAN[ |  2019 | 42.40 | 85.00 | 93.70 | 40.70 | 51.50 | 89.40 | 95.70 | 33.90 | 45.90 | 87.60 | 94.40 | 54.30 | 57.10 | 92.70 | 97.40 | 45.30 | 
| X modality[ |  2020 | 49.92 | 89.79 | 95.96 | 50.73 | - | - | - | - | - | - | - | - | - | - | - | - | 
| MACE[ |  2020 | 51.64 | 87.25 | 94.44 | 50.11 | - | - | - | - | 57.35 | 93.02 | 97.47 | 64.79 | - | - | - | - | 
| DDAG[ |  2020 | 54.75 | 90.39 | 95.81 | 53.02 | - | - | - | - | 61.02 | 94.06 | 98.41 | 67.98 | - | - | - | - | 
| SIM[ |  2020 | 56.93 | - | - | 60.88 | - | - | - | - | - | - | - | - | - | - | - | - | 
| NFS[ |  2021 | 56.91 | 91.34 | 96.52 | 55.45 | 63.51 | 94.42 | 97.81 | 48.56 | 62.79 | 96.53 | 99.07 | 69.79 | 70.03 | 97.70 | 99.51 | 61.45 | 
| CICL[ |  2021 | 57.20 | 94.30 | 98.40 | 59.30 | 60.70 | 95.20 | 98.60 | 52.60 | 66.60 | 98.80 | 99.70 | 74.70 | 73.80 | 99.40 | 99.90 | 68.30 | 
| cm-SSFT[ |  2020 | 61.60 | 89.20 | 93.90 | 63.20 | 63.40 | 91.20 | 95.70 | 62.00 | 70.50 | 94.90 | 97.70 | 72.60 | 73.00 | 96.30 | 99.10 | 72.40 | 
| CANet[ |  2021 | 69.88 | 95.71 | 98.46 | 66.89 | - | - | - | - | 76.26 | 97.88 | 99.49 | 80.37 | - | - | - | - | 
| 本文方法 | 2022 | 68.53 | 95.80 | 97.87 | 66.27 | 66.81 | 95.63 | 98.17 | 62.80 | 74.65 | 97.02 | 99.21 | 79.33 | 78.91 | 97.53 | 99.14 | 75.27 | 
| 方法 | Single-shot and All-search | |
|---|---|---|
| Rank-1 | mAP | |
| Baseline | 58.17 | 56.69 | 
| Raw-SpaceAttention | 65.67 | 62.50 | 
| Raw-ChannelAttention | 63.43 | 62.41 | 
| Raw-DuanlAttention | 69.53 | 67.52 | 
| Residual-SpaceAttention | 68.34 | 66.57 | 
| Residual-ChannelAttention | 67.80 | 66.23 | 
| Residual-DuanlAttention | 68.53 | 66.27 | 
表3 在SYSU0-MM01数据集上的残差增强注意力消融实验(%)
Table 3 Ablation study of residual enhanced attention on SYSU-MM01 (%)
| 方法 | Single-shot and All-search | |
|---|---|---|
| Rank-1 | mAP | |
| Baseline | 58.17 | 56.69 | 
| Raw-SpaceAttention | 65.67 | 62.50 | 
| Raw-ChannelAttention | 63.43 | 62.41 | 
| Raw-DuanlAttention | 69.53 | 67.52 | 
| Residual-SpaceAttention | 68.34 | 66.57 | 
| Residual-ChannelAttention | 67.80 | 66.23 | 
| Residual-DuanlAttention | 68.53 | 66.27 | 
| 方法 | Single-shot and All-search | |
|---|---|---|
| Rank-1 | mAP | |
| 参数共享的单流网络 | 54.33 | 51.19 | 
| 参数完全独立的双流网络 | 19.50 | 44.70 | 
| 本文的网络结构 | 58.17 | 56.69 | 
表4 在SYSU0-MM01数据集上的网络结构消融实验(%)
Table 4 Ablation study of network structure on SYSU-MM01 (%)
| 方法 | Single-shot and All-search | |
|---|---|---|
| Rank-1 | mAP | |
| 参数共享的单流网络 | 54.33 | 51.19 | 
| 参数完全独立的双流网络 | 19.50 | 44.70 | 
| 本文的网络结构 | 58.17 | 56.69 | 
																													图5 空间注意力响应图((a)基于残差增强的空间注意力;(b)现有的空间注意力)
Fig. 5 Spatial attention visualization ((a) Spatial attention based on residual enhancement; (b) Existing spatial attention)
| [1] |  
											  YE M, SHEN J, LIN G, et al.  Deep learning for person re-identification: a survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 2872-2893. 
											 												 DOI URL  | 
										
| [2] | 杨文娟, 王文明, 王全玉, 等. 基于感知哈希和视觉词袋模型的图像检索方法[J]. 图学学报, 2019, 40(3): 519-524. | 
| YANG W J, WANG W M, WANG Q Y, et al. Image retrieval method based on perceptual hash algorithm and bag of visual words[J]. Journal of Graphics, 2019, 40(3): 519-524 (in Chinese). | |
| [3] | ZHAO H, TIAN M, SUN S, et al. Spindle net: person re-identification with human body region guided feature decomposition and fusion[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1077-1085. | 
| [4] | GE Y X, ZHU F, CHEN D P, et al. Self-paced contrastive learning with hybrid memory for domain adaptive object re-id[EB/OL]. [2021-12-08]. https://blog.csdn.net/NGUever15/article/details/120556059. | 
| [5] | CHEN H, WANG Y, LAGADEC B. Joint generative and contrastive learning for unsupervised person re-identification[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 2004-2013. | 
| [6] | WU A, ZHENG W, YU H, et al. RGB-infrared cross-modality person re-identification[C]//2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 5380-5389. | 
| [7] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. | 
| [8] | WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C]// 2018 European Conference on Computer Vision. Cham: Springer International Publishin, 2018: 3-19. | 
| [9] | DAI P Y, JI R R, WANG H, WU Q, el al. Cross-modality person re-identification with generative adversarial training[EB/OL]. [2021-12-08]. https://www.ijcai.org/Proceedings/2018/0094.pdf. | 
| [10] | WANG Z X, WANG Z, ZHENG Y. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 618-626. | 
| [11] | WANG G, ZHANG T, CHENG J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3623-3632. | 
| [12] | ZHAO Z, LIU B, CHU Q, et al. Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3520-3528. | 
| [13] | YE M, LAN X, WANG Z, et al. Bi-directional center- constrained top-ranking for visible thermal person re-identification[C]//IEEE Transactions on Information Pernsics and Security. New York: IEEE Press, 2020: 407-419. | 
| [14] | HAO Y, WANG N, LI J, el al. Hsme: hypersphere manifold embedding for visible thermal person re-identification[C]//The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 8385-8392. | 
| [15] | JIA M X, ZHAI Y P, LU S J, et al. A similarity inference metric for RGB-infrared cross-modality person re-identification[EB/OL]. [2021-12-08]. https://arxiv.org/abs/2007.01504 | 
| [16] | LI D, WEI X, HONG X, el al. Infrared-visible cross-modal person re-identification with an x modality[C]//The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 4610-4617. | 
| [17] | YE M, SHEN J, CRANDALL D, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//2020 European Conference on Computer Vision - ECCV 2020. Cham: Springer International Publishin, 2020: 229-247. | 
| [18] | CHEN Y, WAN L, LI Z, et al. Neural feature search for rgb-infrared person re-identification[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New York: IEEE Press, 2021: 587-597. | 
| [19] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[EB/OL]. [2021-12-08]. https://blog.csdn.net/toda666/article/details/80384915. | 
| [20] | LUO H, GU Y, LIAO X, et al. Bag of tricks and a strong baseline for deep person re-identification[EB/OL]. [2021-12-08]. https://openaccess.thecvf.com/content_CVPRW_2019/papers/TRMTMCT/Luo_Bag_of_Tricks_and_a_Strong_Baseline_for_Deep_Person_CVPRW_2019_paper.pdf. | 
| [21] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2005: 886-893. | 
| [22] | NGUYEN D T, HONG H G, KIM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors: Basel, Switzerland, 2017, 17(3): E605. | 
| [23] | WANG G, ZHANG T, YANG Y, et al. Cross-modality paired-images generation for RGB-infrared person re-identification[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12144-12151. | 
| [24] | DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 248-255. | 
| [25] | KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL]. [2021-12-08]. https://arxiv.org/abs/1412.6980. | 
| [26] | YE M, LAN X, LENG Q, et al. Cross-modality person re-identification via modality-aware collaborative ensemble learning[C]// 2020 IEEE Transactions on Image Processing. New York: IEEE Press, 2020: 9387-9399. | 
| [27] | LU Y, WU Y, LIU B, et al. Cross-modality person re-identification with shared-specific feature transfer[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 13376-13386. | 
| [28] | YE M, RUAN W J, DU B, et al. Channel augmented joint learning for visible-infrared recognition[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 13567-13576. | 
| [29] |  
											  SELVARAJU R, COGSWELL M, DAS A. Grad-cam: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. 
											 												 DOI  | 
										
| [1] | 杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867. | 
| [2] | 蒋武君, 支力佳, 张少敏, 周涛. 基于通道残差嵌套U结构的CT影像肺结节分割方法[J]. 图学学报, 2023, 44(5): 879-889. | 
| [3] | 宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭. 基于改进教师学生网络的隧道火灾检测[J]. 图学学报, 2023, 44(5): 978-987. | 
| [4] | 毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639. | 
| [5] | 李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666. | 
| [6] | 李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709. | 
| [7] | 邓渭铭, 杨铁军, 李纯纯, 黄琳. 基于神经网络架构搜索的铭牌目标检测方法[J]. 图学学报, 2023, 44(4): 718-727. | 
| [8] | 余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738. | 
| [9] | 胡欣, 周运强, 肖剑, 杨杰. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437. | 
| [10] | 郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464. | 
| [11] | 罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472. | 
| [12] | 李雨, 闫甜甜, 周东生, 魏小鹏. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481. | 
| [13] | 肖天行, 吴静静. 基于残差和特征分块注意力的激光打码字符分割[J]. 图学学报, 2023, 44(3): 482-491. | 
| [14] | 吴文欢, 张淏坤. 融合空间十字注意力与通道注意力的语义分割网络[J]. 图学学报, 2023, 44(3): 531-539. | 
| [15] | 杨柳, 吴晓群. 基于深度学习的三维形状补全研究综述[J]. 图学学报, 2023, 44(2): 201-215. | 
| 阅读次数 | ||||||
| 
												        	全文 | 
											        	
												        	 | 
													|||||
| 
												        	摘要 | 
												        
															 | 
													|||||