基于同质中间模态的跨模态行人再识别方法

doi:10.11996/JG.j.2095-302X.2024040670

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 670-682.DOI: 10.11996/JG.j.2095-302X.2024040670

• 图像处理与计算机视觉 • 上一篇下一篇

基于同质中间模态的跨模态行人再识别方法

罗智徽¹(), 胡海涛¹^,², 马潇峰¹, 程文刚¹^,²()

1.华北电力大学控制与计算机工程学院，北京 102206
2.复杂能源系统智能计算教育部工程研究中心，河北保定 071003

收稿日期:2024-03-07 接受日期:2024-06-20 出版日期:2024-08-31 发布日期:2024-09-03
通讯作者:程文刚(1977-)，男，副教授，博士。主要研究方向为多媒体信息处理。E-mail：wgcheng@ncepu.edu.cn
第一作者:罗智徽(1999-)，男，硕士研究生。主要研究方向为跨模态行人再识别。E-mail：zhluo@ncepu.edu.cn
基金资助:
国家重点研发计划项目(2023YFB3812100);教育部教育管理信息中心项目(MOE-CIEM-20240013)

A network based on the homogeneous middle modality for cross-modality person re-identification

LUO Zhihui¹(), HU Haitao¹^,², MA Xiaofeng¹, CHENG Wengang¹^,²()

1. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
2. Engineering Research Center of Intelligent Computing for Complex Energy Systems, Ministry of Education, Baoding Hebei 071003, China

Received:2024-03-07 Accepted:2024-06-20 Published:2024-08-31 Online:2024-09-03
Contact: CHENG Wengang (1977-), associate professor, Ph.D. His main research interest covers multimedia information processing. E-mail：wgcheng@ncepu.edu.cn
First author：LUO Zhihui (1999-), master student. His main research interest covers cross-modality person re-identification. E-mail：zhluo@ncepu.edu.cn
Supported by:
National Key R&D Program of China(2023YFB3812100);Project of the Education Management Information Center Ministry of Education(MOE-CIEM-20240013)

摘要/Abstract

摘要：

可见光-红外跨模态行人再识别(VI-ReID)旨在对不同摄像头采集同一行人的可见光图像和红外图像进行检索与匹配。除了存在可见光行人再识别(ReID)中因位姿、视角、局部遮挡等造成的模态内差异外，可见光图像和红外图像的模态间差异是VI-ReID的主要挑战。现有方法通常对2种模态的图像进行联合特征学习来缩小模态间差异，忽略了可见光和红外两种模态图像在通道上的本质不同。为此，本文试图从2种模态共同生成一种中间模态来辅助缩小模态间差异，并在标准ViT(vision transformer)网络上通过局部特征和全局特征的融合来优化特征嵌入学习。首先，设计同质中间模态生成器，通过可见光图像和红外图像共同生成同质中间模态(H-modality)图像，将3种模态图像投影到统一的特征空间进行联合约束，从而借助中间模态缩小可见光模态和红外模态间的差异，实现图像级对齐。进一步提出一种基于同质中间模态的Transformer跨模态行人再识别方法，使用ViT提取全局特征，设计一个局部分支以增强网络的局部感知能力。在全局特征提取中，为了增强全局特征的多样性，引入头部多样性模块(head enrich module)使不同的头聚合图像不同的模式。该方法融合全局特征与局部特征，能够提高模型的判别能力，在SYSU-MM01和RegDB数据集上的rank-1/mAP分别达到67.68%/64.37%和86.16%/79.11%，优于现有大多数最前沿的方法。

关键词: 行人再识别, 跨模态, Transformer, 中间模态, 特征融合

Abstract:

Visible-infrared cross-modality person re-identification (VI-ReID) aims to retrieve and match visible and infrared images of the same person captured by different cameras. In addition to addressing the intra-modality discrepancies caused by various factors such as viewpoint, pose, and scale variations in person re-identification, the modality discrepancy between the visible and infrared images represents a significant challenge for VI-ReID. Existing methods usually constrain only the features of the two modalities to reduce modality differences, while ignoring the essential differences in the imaging mechanism of cross-modality images. To address this, this paper attempted to narrow the discrepancy between modalities by jointly generating an intermediate modality from two modalities and optimizing feature learning on a vision Transformer (ViT)-based network through the fusion of local and global features. A feature fusion network based on the homogeneous middle modality (H-modality) was proposed for VI-ReID. Firstly, an H-modality generator was designed, using a parameter-sharing encoder-decoder structure, constrained by distribution consistency loss to bring the generated images closer in feature space. By jointly generating H-modality images from visible and infrared images, the three modal images were projected into a unified feature space for joint constraining, thereby reducing the discrepancy between visible and infrared modalities and achieving image-level alignment. Furthermore, a transformer-based VI-ReID method based on the H-modality was proposed, with an additional local branch to enhance the network’s local perception capability. In global feature extraction, a head enrich module was introduced to push multiple heads in the class token to obtain diverse patterns in the last transformer block. The method combined global features with local features, improving the model’s discriminative ability. The effect of each improvement was investigated through ablation experiments, where different combinations of Sliding window, H-modality, local feature, and global feature enhancements were designed on the baseline ViT model. The results indicated that each improvement led to performance gains, demonstrating the effectiveness of the proposed method. The proposed method achieved rank-1/mAP of 67.68%/64.37% and 86.16%/79.11% on the SYSU-MM01 and RegDB datasets, respectively, outperforming most state-of-the-art methods. The proposed H-modality can effectively reduce the modality discrepancy between visible and infrared images, and the feature fusion network can obtain more discriminative features. Extensive experiments on the SYSU-MM01 and RegDB datasets have demonstrated the superior performance of the proposed network compared with the state-of-the-art methods.

Key words: person re-identification, cross-modality, Transformer, middle modality, feature fusion

中图分类号:

TP391

罗智徽, 胡海涛, 马潇峰, 程文刚. 基于同质中间模态的跨模态行人再识别方法[J]. 图学学报, 2024, 45(4): 670-682.

LUO Zhihui, HU Haitao, MA Xiaofeng, CHENG Wengang. A network based on the homogeneous middle modality for cross-modality person re-identification[J]. Journal of Graphics, 2024, 45(4): 670-682.

图/表 11

参考文献 54

[1]	HAO C F, LI Y F, SUN J, et al. MFEN: multi-scale feature expansion network for visible-infrared person re-identification[C]// Proceedings of the International Conference on Computer Vision and Deep Learning. New York: ACM, 2024: 1-6.
[2]	REN K, ZHANG L. Implicit discriminative knowledge learning for visible-infrared person re-identification[EB/OL]. [2024-03-21]. http://arxiv.org/html/2403.11708v2.
[3]	WU A C, ZHENG W S, YU H X, et al. RGB-infrared cross-modality person re-identification[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 5380-5389.
[4]	YE M, LAN X Y, LI J W, et al. Hierarchical discriminative learning for visible thermal person re-identification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 7501-7508.
[5]	QIAN Y H, YANG X, TANG S K. Dual-space aggregation learning and random erasure for visible infrared person re-identification[J]. IEEE Access, 2023, 11: 75440-75450.
[6]	WANG G A, ZHANG T Z, CHENG J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3623-3632.
[7]	ZHONG X, LU T Y, HUANG W X, et al. Grayscale enhancement colorization network for visible-infrared person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1418-1430.
[8]	ZHANG Q, LAI C Z, LIU J N, et al. FMCNet: feature-level modality compensation for visible-infrared person re-identification[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 7349-7358.
[9]	LI D G, WEI X, HONG X P, et al. Infrared-visible cross-modal person re-identification with an X modality[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4610-4617.
[10]	YE M, SHEN J B, SHAO L. Visible-infrared person re-identification via homogeneous augmented tri-modal learning[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 728-739.
[11]	LIU H J, XIA D X, JIANG W. Towards homogeneous modality learning and multi-granularity information exploration for visible-infrared person re-identification[J]. IEEE Journal of Selected Topics in Signal Processing, 2023, 17(3): 545-559.
[12]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2024-01-20]. http://arxiv.org/abs/1706.03762.
[13]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16 x 16 words: transformers for image recognition at scale[EB/OL]. [2024-01-20]. http://arxiv.org/abs/2010.11929.
[14]	WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7794-7803.
[15]	ZHANG Z Z, LAN C L, ZENG W J, et al. Relation-aware global attention for person re-identification[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3186-3195.
[16]	HE S T, LUO H, WANG P C, et al. TransReID: transformer-based object re-identification[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14993-15002.
[17]	LIANG T F, JIN Y, LIU W, et al. Cross-modality transformer with modality mining for visible-infrared person re-identification[J]. IEEE Transactions on Multimedia, 2023, 25: 8432-8444.
[18]	SUN Y F, ZHENG L, YANG Y, et al. Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)[C]// European Conference on Computer Vision. Cham: Springer, 2018: 501-518.
[19]	WANG G S, YUAN Y F, CHEN X, et al. Learning discriminative features with multiple granularities for person re-identification[C]// The 26th ACM international conference on Multimedia. New York: ACM, 2018: 274-282.
[20]	ZHENG F, DENG C, SUN X, et al. Pyramidal person re-IDentification via multi-loss dynamic training[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 8506-8514.
[21]	GONG J H, ZHAO S Y, LAM K M, et al. Spectrum-irrelevant fine-grained representation for visible-infrared person re-identification[J]. Computer Vision and Image Understanding, 2023, 232: 103703.
[22]	QIU L X, CHEN S, YAN Y, et al. High-order structure based middle-feature learning for visible-infrared person re-identification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(5): 4596-4604.
[23]	TAN L, DAI P Y, JI R R, et al. Dynamic prototype mask for occluded person re-identification[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 531-540.
[24]	HU W P, LIU B H, ZENG H T, et al. Adversarial decoupling and modality-invariant representation learning for visible-infrared person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(8): 5095-5109.
[25]	YANG B, CHEN J, YE M. Top-K visual tokens transformer: selecting tokens for visible-infrared person re-identification[C]// ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2023: 1-5.
[26]	WANG H P, WU W. Attention based Dual Stream Modality-aware model for Visible-Infrared Person re-identification[C]// 2023 26th International Conference on Computer Supported Cooperative Work in Design. New York: IEEE Press, 2023: 897-902.
[27]	CHEN X M, ZHENG X T, LU X Q. Identity feature disentanglement for visible-infrared person re-identification[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2023, 19(6): 1-20.
[28]	邵文斌, 刘玉杰, 孙晓瑞, 等. 基于残差增强注意力的跨模态行人重识别[J]. 图学学报, 2023, 44(1): 33-40. DOI
	SHAO W B, LIU Y J, SUN X R, et al. Cross modality person re-identification based on residual enhanced attention[J]. Journal of Graphics, 2023, 44(1): 33-40 (in Chinese).
[29]	DAI P Y, JI R, WANG H B, et al. Cross-modality person re-identification with generative adversarial training[C]// The 27th International Joint Conference on Artificial Intelligence. New York: ACM, 2018: 677-683.
[30]	YE M, LAN X Y, LENG Q M, et al. Cross-modality person re-identification via modality-aware collaborative ensemble learning[J]. IEEE Transactions on Image Processing, 2020, 29: 9387-9399.
[31]	KANSAL K, SUBRAMANYAM A V, WANG Z, et al. SDL: spectrum-disentangled representation learning for visible- infrared person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(10): 3422-3432.
[32]	LU H, ZOU X Z, ZHANG P P. Learning progressive modality-shared transformers for effective visible-infrared person re-identification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(2): 1835-1843.
[33]	LIU H J, MA S, XIA D X, et al. SFANet: a spectrum-aware feature augmentation network for visible-infrared person reidentification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 1958-1971.
[34]	DAI L C, LUO Z M, LI S Z. Exploring part features for unsupervised visible-infrared person re-identification[C]// The 1st ICMR Workshop on Multimedia Object re-Identification. New York: ACM, 2024: 1-5.
[35]	耿圆, 谭红臣, 李敬华, 等. 基于视觉信息积累的行人重识别网络[J]. 图学学报, 2022, 43(6): 1193-1200.
	GENG Y, TAN H C, LI J H, et al. Visual information accumulation network for person re-identification[J]. Journal of Graphics, 2022, 43(6): 1193-1200 (in Chinese). DOI
[36]	ZHU Y X, YANG Z, WANG L, et al. Hetero-Center loss for cross-modality person re-identification[J]. Neurocomputing, 2020, 386: 97-109.
[37]	DAI H P, XIE Q, LI J C, et al. Visible-infrared person re-identification with human body parts assistance[C]// 2021 International Conference on Multimedia Retrieval. New York: ACM, 2021: 631-637.
[38]	PARK H, LEE S, LEE J, et al. Learning by Aligning: visible-Infrared Person re-identification using Cross-Modal Correspondences[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 12046-12055.
[39]	QIAN Y H, TANG S K. Pose attention-guided paired-images generation for visible-infrared person re-identification[J]. IEEE Signal Processing Letters, 2024, 31: 346-350.
[40]	GWON S, KIM S, SEO K. Balanced and essential modality-specific and modality-shared representations for visible-infrared person re-identification[J]. IEEE Signal Processing Letters, 2024, 31: 491-495.
[41]	LING Y G, ZHONG Z, LUO Z M, et al. Class-aware modality mix and center-guided metric learning for visible-thermal person re-identification[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 889-897.
[42]	ZHANG Y K, YAN Y, LU Y, et al. Towards a unified middle modality learning for visible-infrared person re-identification[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 788-796.
[43]	ZHAO J Q, WANG H Z, ZHOU Y, et al. Spatial-channel enhanced transformer for visible-infrared person re-identification[J]. IEEE Transactions on Multimedia, 2023, 25: 3668-3680.
[44]	WU R Q, JIAO B L, WANG W X, et al. Enhancing visible-infrared person re-identification with modality- and instance-aware visual prompt learning[C]// The 2024 International Conference on Multimedia Retrieval. New York: ACM, 2024: 579-588.
[45]	CHEN C Q, YE M, QI M B, et al. Structure-aware positional transformer for visible-infrared person re-identification[J]. IEEE Transactions on Image Processing, 2022, 31: 2352-2364. DOI PMID
[46]	CHAI Z H, LING Y G, LUO Z M, et al. Dual-stream transformer with distribution alignment for visible-infrared person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(11): 6764-6776.
[47]	NGUYEN D T, HONG H G, KIM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605.
[48]	WU B T, FENG Y J, SUN Y F, et al. Feature aggregation via attention mechanism for visible-thermal person re-identification[J]. IEEE Signal Processing Letters, 2023, 30: 140-144.
[49]	TIAN X D, ZHANG Z Z, LIN S H, et al. Farewell to mutual information: variational distillation for cross-modal person re-identification[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1522-1531.
[50]	YANG B, CHEN J, YE M. Towards grand unified representation learning for unsupervised visible-infrared person re-identification[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 11069-11079.
[51]	LIU J L, SUN Y F, ZHU F, et al. Learning memory-augmented unidirectional metrics for cross-modality person re-identification[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 19366-19375.
[52]	SI T Z, HE F Z, LI P L, et al. Tri-modality consistency optimization with heterogeneous augmented images for visible-infrared person re-identification[J]. Neurocomputing, 2023, 523: 170-181.
[53]	LIU H, MIAO Z L, YANG B, et al. A base-derivative framework for cross-modality RGB-infrared person re-identification[C]// 2020 25th International Conference on Pattern Recognition. New York: IEEE Press, 2021: 7640-7646.
[54]	VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(11): 2579-2605.

方法	Venue	全搜索			室内搜索
方法	Venue	rank-1	rank-10	mAP	rank-1	rank-10	mAP
DMiR^[24]	TCSVT 2022	50.54	88.12	49.29	53.92	92.50	62.49
BDF^[53]	ICPR 2021	51.05	87.75	49.63	55.93	91.55	63.38
HAT^[10]	TIFS 2020	55.29	92.14	53.89	62.10	95.75	69.37
IFD^[27]	TOMM 2023	55.30	-	52.40	57.20	-	64.30
FAM+NNCLoss^[48]	SPL2023	55.75	87.51	51.52	58.24	91.08	65.65
DSAL^[5]	ACCESS 2023	58.16	90.43	55.43	60.48	93.25	66.94
ADSM^[26]	CSCWD 2023	59.69	91.68	57.84	64.20	94.33	70.46
VSD^[49]	CVPR 2021	60.02	94.18	58.80	66.05	96.59	72.98
GUR^[50]	ICCV 2023	60.95	-	56.99	64.22	-	69.49
MAUM-G^[51]	CVPR 2022	61.59	-	59.96	67.07	-	73.58
TCOM^[52]	NeuroC 2023	63.92	94.39	60.71	68.35	97.37	73.08
FMCNet^[8]	CVPR 2022	66.34	-	62.51	68.15	-	74.09
PMT^[32]	AAAI 2023	67.53	95.36	64.98	71.66	96.73	76.52
本文		67.68	95.42	64.37	70.82	97.83	76.64

方法	Venue	全搜索			室内搜索
方法	Venue	rank-1	rank-10	mAP	rank-1	rank-10	mAP
DMiR^[24]	TCSVT 2022	50.54	88.12	49.29	53.92	92.50	62.49
BDF^[53]	ICPR 2021	51.05	87.75	49.63	55.93	91.55	63.38
HAT^[10]	TIFS 2020	55.29	92.14	53.89	62.10	95.75	69.37
IFD^[27]	TOMM 2023	55.30	-	52.40	57.20	-	64.30
FAM+NNCLoss^[48]	SPL2023	55.75	87.51	51.52	58.24	91.08	65.65
DSAL^[5]	ACCESS 2023	58.16	90.43	55.43	60.48	93.25	66.94
ADSM^[26]	CSCWD 2023	59.69	91.68	57.84	64.20	94.33	70.46
VSD^[49]	CVPR 2021	60.02	94.18	58.80	66.05	96.59	72.98
GUR^[50]	ICCV 2023	60.95	-	56.99	64.22	-	69.49
MAUM-G^[51]	CVPR 2022	61.59	-	59.96	67.07	-	73.58
TCOM^[52]	NeuroC 2023	63.92	94.39	60.71	68.35	97.37	73.08
FMCNet^[8]	CVPR 2022	66.34	-	62.51	68.15	-	74.09
PMT^[32]	AAAI 2023	67.53	95.36	64.98	71.66	96.73	76.52
本文		67.68	95.42	64.37	70.82	97.83	76.64

方法	Venue	rank-1	rank-10	mAP
HAT^[10]	TIFS 2020	71.83	87.16	67.56
VSD^[49]	CVPR 2021	73.20	-	71.60
GUR^[50]	ICCV 2023	73.91	-	70.23
DMiR^[24]	TCSVT 2022	75.79	89.86	69.97
SFANet^[33]	TNNLS 2023	76.31	91.02	68.00
IFD^[27]	TOMM 2023	76.90	-	72.30
BDF^[53]	ICPR 2021	80.67	87.72	78.83
SIFR^[21]	CVIU 2023	81.73	-	75.07
MAUM-G^[51]	CVPR 2022	83.39	-	78.75
TVTR^[25]	ICASSP 2023	84.10	-	79.50
PMT^[32]	AAAI 2023	84.83	-	76.55
DSAL^[5]	ACCESS 2023	86.45	94.36	80.20
FAM+NNCLoss^[48]	SPL 2023	87.31	95.67	76.70
本文		86.16	95.79	79.11

方法	Venue	rank-1	rank-10	mAP
HAT^[10]	TIFS 2020	71.83	87.16	67.56
VSD^[49]	CVPR 2021	73.20	-	71.60
GUR^[50]	ICCV 2023	73.91	-	70.23
DMiR^[24]	TCSVT 2022	75.79	89.86	69.97
SFANet^[33]	TNNLS 2023	76.31	91.02	68.00
IFD^[27]	TOMM 2023	76.90	-	72.30
BDF^[53]	ICPR 2021	80.67	87.72	78.83
SIFR^[21]	CVIU 2023	81.73	-	75.07
MAUM-G^[51]	CVPR 2022	83.39	-	78.75
TVTR^[25]	ICASSP 2023	84.10	-	79.50
PMT^[32]	AAAI 2023	84.83	-	76.55
DSAL^[5]	ACCESS 2023	86.45	94.36	80.20
FAM+NNCLoss^[48]	SPL 2023	87.31	95.67	76.70
本文		86.16	95.79	79.11

序号	实验设置	rank-1	rank-10	rank-20	mAP
1	基线(CNN)	49.89	86.67	93.70	49.13
2	基线(CNN)+辅助灰度模态	51.64	87.16	94.41	50.31
3	基线(CNN)+X模态	52.21	87.46	94.62	50.84
4	基线(CNN)+H模态	54.58	89.41	95.13	53.12

基于同质中间模态的跨模态行人再识别方法

A network based on the homogeneous middle modality for cross-modality person re-identification

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 54

相关文章 15

编辑推荐

Metrics

本文评价

步长	rank-1/%	rank-10/%	rank-20/%	mAP/%	平均单轮训练时长/min
16 (无重合)	52.14	88.46	94.98	51.64	14.52
14	54.93	89.48	95.06	53.46	16.45
12	55.88	90.53	95.95	54.23	19.59
10	56.30	90.09	95.56	54.68	25.86

[1]	吴沛宸 , 袁立宁 , 胡皓 , 刘钊 , 郭放 . 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929.
[2]	刘丽, 张起凡, 白宇昂, 黄凯烨. 结合Swin Transformer的多尺度遥感图像变化检测研究[J]. 图学学报, 2024, 45(5): 941-956.
[3]	章东平 , 魏杨悦 , 何数技 , 徐云超 , 胡海苗 , 黄文君 . 特征融合与层间传递：一种基于Anchor DETR改进的目标检测方法[J]. 图学学报, 2024, 45(5): 968-978.
[4]	孙己龙 , 刘勇 , 周黎伟 , 路鑫 , 侯小龙 , 王亚琼 , 王志丰 . 基于DCNv2和Transformer Decoder的隧道衬砌裂缝高效检测模型研究[J]. 图学学报, 2024, 45(5): 1050-1061.
[5]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[6]	李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
[7]	艾列富, 陶勇, 蒋常玉. 基于全局注意力的正交融合图像描述符[J]. 图学学报, 2024, 45(3): 472-481.
[8]	黄友文, 林志钦, 章劲, 陈俊宽. 结合坐标Transformer的轻量级人体姿态估计算法[J]. 图学学报, 2024, 45(3): 516-527.
[9]	石敏, 禚心如, 孙碧莲, 韩国庆, 朱登明. 适用于不同款式的无监督服装动画预测[J]. 图学学报, 2024, 45(3): 539-547.
[10]	何柳, 安然, 刘姝妍, 李润岐, 陶剑, 曾照洋. 基于知识图谱的航空多模态数据组织与知识发现技术研究[J]. 图学学报, 2024, 45(2): 300-307.
[11]	李佳琦, 王辉, 郭宇. 基于Transformer的三角形网格分类分割网络[J]. 图学学报, 2024, 45(1): 78-89.
[12]	崔克彬, 焦静颐. 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法[J]. 图学学报, 2024, 45(1): 112-125.
[13]	吕衡, 杨鸿宇. 一种基于时空运动信息交互建模的三维人体姿态估计方法[J]. 图学学报, 2024, 45(1): 159-168.
[14]	黄少年, 文沛然, 全琪, 陈荣元. 基于多支路聚合的帧预测轻量化视频异常检测[J]. 图学学报, 2023, 44(6): 1173-1182.
[15]	张丽媛, 赵海蓉, 何巍, 唐雄风. 融合全局-局部注意模块的Mask R-CNN膝关节囊肿检测方法[J]. 图学学报, 2023, 44(6): 1183-1190.