Journal of Graphics ›› 2026, Vol. 47 ›› Issue (2): 275-285.DOI: 10.11996/JG.j.2095-302X.2026020275
• Image Processing and Computer Vision • Previous Articles Next Articles
ZHOU Tenglong, YANG Wenjie(
), YIN Shaohua, YU Yuanlong
Received:2025-10-18
Accepted:2025-12-05
Online:2026-04-30
Published:2026-05-20
Contact:
YANG Wenjie
Supported by:CLC Number:
ZHOU Tenglong, YANG Wenjie, YIN Shaohua, YU Yuanlong. Text-to-image person re-identification based on multi-granularity color learning[J]. Journal of Graphics, 2026, 47(2): 275-285.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2026020275
| 模型 | 来源 | CUHK-PEDES | ICFG-PEDES | RSTPReid | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R@1/% | R@5/% | R@10/% | mAP/% | R@1/% | R@5/% | R@10/% | mAP/% | R@1/% | R@5/% | R@10/% | mAP/% | ||
| IRRA[ | CVPR23 | 73.38 | 89.93 | 93.71 | 66.13 | 63.46 | 80.25 | 85.82 | 38.06 | 63.46 | 80.25 | 85.82 | 38.06 |
| PLOT[ | ECCV24 | 75.28 | 90.42 | 94.12 | ─ | 65.76 | 81.39 | 86.73 | ─ | 65.76 | 81.39 | 86.73 | ─ |
| RaSa[ | IJCAI23 | 76.51 | 90.29 | 94.25 | 69.38 | 65.28 | 80.40 | 85.12 | 41.29 | 65.28 | 80.40 | 85.12 | 41.29 |
| APTM[ | MM23 | 76.53 | 90.04 | 94.15 | 66.91 | 68.51 | 82.99 | 87.56 | 41.22 | 68.51 | 82.99 | 87.56 | 41.22 |
| CFAM[ | CVPR24 | 75.60 | 90.53 | 94.36 | 67.27 | 65.38 | 81.17 | 86.35 | 39.42 | 62.45 | 83.50 | 91.10 | 49.50 |
| CADA[ | TMM24 | 78.37 | 91.57 | 94.58 | 68.87 | 67.81 | 82.34 | 87.14 | 39.85 | 69.60 | 86.75 | 92.40 | 52.74 |
| ICL[ | CVPR25 | 77.91 | 90.27 | 94.14 | 69.13 | 69.02 | 82.45 | 87.36 | 41.21 | 70.55 | 85.95 | 91.65 | 53.68 |
| MGCL | Ours | 78.68 | 91.08 | 94.75 | 70.16 | 70.31 | 83.58 | 87.86 | 44.78 | 70.84 | 87.88 | 93.70 | 55.32 |
Table 1 Performance comparison between MGCL and the state of the art methods
| 模型 | 来源 | CUHK-PEDES | ICFG-PEDES | RSTPReid | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R@1/% | R@5/% | R@10/% | mAP/% | R@1/% | R@5/% | R@10/% | mAP/% | R@1/% | R@5/% | R@10/% | mAP/% | ||
| IRRA[ | CVPR23 | 73.38 | 89.93 | 93.71 | 66.13 | 63.46 | 80.25 | 85.82 | 38.06 | 63.46 | 80.25 | 85.82 | 38.06 |
| PLOT[ | ECCV24 | 75.28 | 90.42 | 94.12 | ─ | 65.76 | 81.39 | 86.73 | ─ | 65.76 | 81.39 | 86.73 | ─ |
| RaSa[ | IJCAI23 | 76.51 | 90.29 | 94.25 | 69.38 | 65.28 | 80.40 | 85.12 | 41.29 | 65.28 | 80.40 | 85.12 | 41.29 |
| APTM[ | MM23 | 76.53 | 90.04 | 94.15 | 66.91 | 68.51 | 82.99 | 87.56 | 41.22 | 68.51 | 82.99 | 87.56 | 41.22 |
| CFAM[ | CVPR24 | 75.60 | 90.53 | 94.36 | 67.27 | 65.38 | 81.17 | 86.35 | 39.42 | 62.45 | 83.50 | 91.10 | 49.50 |
| CADA[ | TMM24 | 78.37 | 91.57 | 94.58 | 68.87 | 67.81 | 82.34 | 87.14 | 39.85 | 69.60 | 86.75 | 92.40 | 52.74 |
| ICL[ | CVPR25 | 77.91 | 90.27 | 94.14 | 69.13 | 69.02 | 82.45 | 87.36 | 41.21 | 70.55 | 85.95 | 91.65 | 53.68 |
| MGCL | Ours | 78.68 | 91.08 | 94.75 | 70.16 | 70.31 | 83.58 | 87.86 | 44.78 | 70.84 | 87.88 | 93.70 | 55.32 |
| 序号 | 部件 | CUHK-PEDES | ICFG-PEDES | RSTPReid | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bsl | CCM | CPMC | CRD | R@1/% | mAP/% | R@1/% | mAP/% | R@1/% | mAP/% | |
| 1 | √ | 76.52 | 68.72 | 68.02 | 43.28 | 68.50 | 53.16 | |||
| 2 | √ | √ | 77.54 | 69.40 | 69.10 | 43.96 | 69.75 | 54.22 | ||
| 3 | √ | √ | 77.68 | 69.34 | 69.24 | 43.98 | 69.96 | 54.50 | ||
| 4 | √ | √ | 77.47 | 69.28 | 69.07 | 43.58 | 69.62 | 54.37 | ||
| 5 | √ | √ | √ | √ | 78.68 | 70.16 | 70.31 | 44.78 | 70.84 | 55.32 |
Table 2 Ablation study on the key components of MGCL
| 序号 | 部件 | CUHK-PEDES | ICFG-PEDES | RSTPReid | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bsl | CCM | CPMC | CRD | R@1/% | mAP/% | R@1/% | mAP/% | R@1/% | mAP/% | |
| 1 | √ | 76.52 | 68.72 | 68.02 | 43.28 | 68.50 | 53.16 | |||
| 2 | √ | √ | 77.54 | 69.40 | 69.10 | 43.96 | 69.75 | 54.22 | ||
| 3 | √ | √ | 77.68 | 69.34 | 69.24 | 43.98 | 69.96 | 54.50 | ||
| 4 | √ | √ | 77.47 | 69.28 | 69.07 | 43.58 | 69.62 | 54.37 | ||
| 5 | √ | √ | √ | √ | 78.68 | 70.16 | 70.31 | 44.78 | 70.84 | 55.32 |
| 模型 | 颜色抖动 | R@1/% | R@5/% | R@10/% | mAP/% |
|---|---|---|---|---|---|
| MGCL | w/o | 70.84 | 87.88 | 93.70 | 55.32 |
| w | 45.05 | 71.30 | 79.55 | 26.26 | |
| Bsl | w/o | 68.50 | 86.35 | 91.45 | 53.16 |
| w | 41.90 | 67.55 | 76.90 | 24.23 |
Table 3 Ablation study on color jitter
| 模型 | 颜色抖动 | R@1/% | R@5/% | R@10/% | mAP/% |
|---|---|---|---|---|---|
| MGCL | w/o | 70.84 | 87.88 | 93.70 | 55.32 |
| w | 45.05 | 71.30 | 79.55 | 26.26 | |
| Bsl | w/o | 68.50 | 86.35 | 91.45 | 53.16 |
| w | 41.90 | 67.55 | 76.90 | 24.23 |
| 模型 | 参数量/M | 计算量/GFLOPs | 推理速度/ms |
|---|---|---|---|
| IRRA[ | 194.5 | 13.0 | 13.4 |
| RaSa[ | 210.2 | 58.1 | 19.8 |
| MGCL | 285.6 | 63.5 | 22.5 |
Table 4 Model complexity and inference efficiency
| 模型 | 参数量/M | 计算量/GFLOPs | 推理速度/ms |
|---|---|---|---|
| IRRA[ | 194.5 | 13.0 | 13.4 |
| RaSa[ | 210.2 | 58.1 | 19.8 |
| MGCL | 285.6 | 63.5 | 22.5 |
| [1] | 耿圆, 谭红臣, 李敬华, 等. 基于视觉信息积累的行人重识别网络[J]. 图学学报, 2022, 43(6): 1193-1200. |
|
GENG Y, TAN H C, LI J H, et al. Visual information accumulation network for person re-identification[J]. Journal of Graphics, 2022, 43(6): 1193-1200 (in Chinese).
DOI |
|
| [2] | 张云鹏, 王洪元, 张继, 等. 近邻中心迭代策略的单标注视频行人重识别[J]. 软件学报, 2021, 32(12): 4025-4035. |
| ZHANG Y P, WANG H Y, ZHANG J, et al. One-shot video-based person re-identification based on neighborhood center iteration strategy[J]. Journal of Software, 2021, 32(12): 4025-4035 (in Chinese). | |
| [3] |
杨文娟, 王文明, 王全玉, 等. 基于感知哈希和视觉词袋模型的图像检索方法[J]. 图学学报, 2019, 40(3): 519-524.
DOI |
| YANG W J, WANG W M, WANG Q Y, et al. Image retrieval method based on perceptual hash algorithm and bag of visual words[J]. Journal of Graphics, 2019, 40(3): 519-524 (in Chinese). | |
| [4] | LI S, XIAO T, LI H S, et al. Person search with natural language description[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1970-1979. |
| [5] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-14) [2025-08-18]. https://arxiv.org/abs/1409.1556. |
| [6] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
| [7] | GRAVES A. Long short-term memory[M]//GRAVES A. Supervised Sequence Labelling with Recurrent Neural Networks. Heidelberg: Springer, 2012: 37-45. |
| [8] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Albuquerque: ACL, 2019: 4171-4186. |
| [9] | LI J N, SELVARAJU R R, GOTMARE A D, et al. Align before fuse: vision and language representation learning with momentum distillation[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 742. |
| [10] | LI J N, LI D X, XIONG C M, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[J/OL]. [2025-08-17]. https://proceedings.mlr.press/v162/li22n.html. |
| [11] | ZHANG Y, LU H C. Deep cross-modal projection learning for image-text matching[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 686-701. |
| [12] | ZHENG Z D, ZHENG L, GARRETT M, et al. Dual-path convolutional image-text embeddings with instance loss[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(2): 51. |
| [13] | JIANG D, YE M. Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 2787-2797. |
| [14] | BAI Y, CAO M, GAO D M, et al. RaSa: relation and sensitivity aware representation learning for text-based person search[EB/OL]. [2025-08-17]. https://dl.acm.org/doi/10.24963/ijcai.2023/62. |
| [15] | YANG S Y, ZHOU Y N, ZHENG Z D, et al. Towards unified text-based person retrieval: a large-scale multi-attribute and language search benchmark[C]// The 31st ACM International Conference on Multimedia. New York: ACM, 2023: 4492-4501. |
| [16] | PARK J, KIM D, JEONG B, et al. PLOT: text-based person search with part slot attention for corresponding part discovery[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2025: 474-490. |
| [17] | LOCATELLO F, WEISSENBORN D, UNTERTHINER T, et al. Object-centric learning with slot attention[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 967. |
| [18] |
SWAIN M J, BALLARD D H. Color indexing[J]. International Journal of Computer Vision, 1991, 7(1): 11-32.
DOI URL |
| [19] | STRICKER M A, ORENGO M. Similarity of color images[C]// SPIE 2420, Storage and Retrieval for Image and Video Databases III. Bellingham: SPIE, 1995: 381-392. |
| [20] | ZHANG R, ISOLA P, EFROS A A. Colorful image colorization[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 649-666. |
| [21] | KANG X Y, YANG T, OUYANG W Q, et al. DDColor: towards photo-realistic image colorization via dual decoders[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 328-338. |
| [22] | GOMEZ-VILLA A, HERNÁNDEZ-CÁMARA P, BUTT M A, et al. Color names in vision-language models[EB/OL]. [2025-09-26]. https://arxiv.org/abs/2509.22524. |
| [23] | BAI J Z, BAI S, CHU Y F, et al. Qwen technical report[EB/OL]. [2025-09-28]. https://arxiv.org/abs/2309.16609. |
| [24] | WANG W H, BAO H B, HUANG S H, et al. MiniLMv2: multi-head self-attention relation distillation for compressing pretrained transformers[C]// Findings of the Association for Computational Linguistics. Albuquerque: ACL, 2021: 2140-2151. |
| [25] | ZUO J L, ZHOU H Y, NIE Y, et al. UFineBench: towards text-based person retrieval with ultra-fine granularity[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 22010-22019. |
| [26] |
LIN D X, PENG Y X, MENG J K, et al. Cross-modal adaptive dual association for text-to-image person retrieval[J]. IEEE Transactions on Multimedia, 2024, 26: 6609-6620.
DOI URL |
| [27] | QIN Y, CHEN C, FU Z H, et al. Human-centered interactive learning via MLLMs for text-to-image person re-identification[C]// 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2025: 14390-14399. |
| [28] | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 618-626. |
| [1] | ZHOU Qiang, HUANG Yaoqiu, SHI Weimin, ZHOU Zhong. Video attractiveness assessment method for scenic live stream recommendations [J]. Journal of Graphics, 2026, 47(2): 264-274. |
| [2] | BAO Yongtang, WANG Moqin, WANG Zhihui, MA Guangxiao. Perceptually-aligned panoramic image quality assessment via global semantic feature fusion [J]. Journal of Graphics, 2026, 47(2): 332-340. |
| [3] | LU Yaguang, SHEN Xukun, HU Yong. 3D scene-graph generation via vision-language model distillation and large language model parsing [J]. Journal of Graphics, 2026, 47(2): 360-367. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||