| [1] |
耿圆, 谭红臣, 李敬华, 等. 基于视觉信息积累的行人重识别网络[J]. 图学学报, 2022, 43(6): 1193-1200.
|
|
GENG Y, TAN H C, LI J H, et al. Visual information accumulation network for person re-identification[J]. Journal of Graphics, 2022, 43(6): 1193-1200 (in Chinese).
DOI
|
| [2] |
张云鹏, 王洪元, 张继, 等. 近邻中心迭代策略的单标注视频行人重识别[J]. 软件学报, 2021, 32(12): 4025-4035.
|
|
ZHANG Y P, WANG H Y, ZHANG J, et al. One-shot video-based person re-identification based on neighborhood center iteration strategy[J]. Journal of Software, 2021, 32(12): 4025-4035 (in Chinese).
|
| [3] |
杨文娟, 王文明, 王全玉, 等. 基于感知哈希和视觉词袋模型的图像检索方法[J]. 图学学报, 2019, 40(3): 519-524.
DOI
|
|
YANG W J, WANG W M, WANG Q Y, et al. Image retrieval method based on perceptual hash algorithm and bag of visual words[J]. Journal of Graphics, 2019, 40(3): 519-524 (in Chinese).
|
| [4] |
LI S, XIAO T, LI H S, et al. Person search with natural language description[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1970-1979.
|
| [5] |
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-14) [2025-08-18]. https://arxiv.org/abs/1409.1556.
|
| [6] |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
|
| [7] |
GRAVES A. Long short-term memory[M]//GRAVES A. Supervised Sequence Labelling with Recurrent Neural Networks. Heidelberg: Springer, 2012: 37-45.
|
| [8] |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Albuquerque: ACL, 2019: 4171-4186.
|
| [9] |
LI J N, SELVARAJU R R, GOTMARE A D, et al. Align before fuse: vision and language representation learning with momentum distillation[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 742.
|
| [10] |
LI J N, LI D X, XIONG C M, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[J/OL]. [2025-08-17]. https://proceedings.mlr.press/v162/li22n.html.
|
| [11] |
ZHANG Y, LU H C. Deep cross-modal projection learning for image-text matching[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 686-701.
|
| [12] |
ZHENG Z D, ZHENG L, GARRETT M, et al. Dual-path convolutional image-text embeddings with instance loss[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(2): 51.
|
| [13] |
JIANG D, YE M. Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 2787-2797.
|
| [14] |
BAI Y, CAO M, GAO D M, et al. RaSa: relation and sensitivity aware representation learning for text-based person search[EB/OL]. [2025-08-17]. https://dl.acm.org/doi/10.24963/ijcai.2023/62.
|
| [15] |
YANG S Y, ZHOU Y N, ZHENG Z D, et al. Towards unified text-based person retrieval: a large-scale multi-attribute and language search benchmark[C]// The 31st ACM International Conference on Multimedia. New York: ACM, 2023: 4492-4501.
|
| [16] |
PARK J, KIM D, JEONG B, et al. PLOT: text-based person search with part slot attention for corresponding part discovery[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2025: 474-490.
|
| [17] |
LOCATELLO F, WEISSENBORN D, UNTERTHINER T, et al. Object-centric learning with slot attention[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 967.
|
| [18] |
SWAIN M J, BALLARD D H. Color indexing[J]. International Journal of Computer Vision, 1991, 7(1): 11-32.
DOI
URL
|
| [19] |
STRICKER M A, ORENGO M. Similarity of color images[C]// SPIE 2420, Storage and Retrieval for Image and Video Databases III. Bellingham: SPIE, 1995: 381-392.
|
| [20] |
ZHANG R, ISOLA P, EFROS A A. Colorful image colorization[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 649-666.
|
| [21] |
KANG X Y, YANG T, OUYANG W Q, et al. DDColor: towards photo-realistic image colorization via dual decoders[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 328-338.
|
| [22] |
GOMEZ-VILLA A, HERNÁNDEZ-CÁMARA P, BUTT M A, et al. Color names in vision-language models[EB/OL]. [2025-09-26]. https://arxiv.org/abs/2509.22524.
|
| [23] |
BAI J Z, BAI S, CHU Y F, et al. Qwen technical report[EB/OL]. [2025-09-28]. https://arxiv.org/abs/2309.16609.
|
| [24] |
WANG W H, BAO H B, HUANG S H, et al. MiniLMv2: multi-head self-attention relation distillation for compressing pretrained transformers[C]// Findings of the Association for Computational Linguistics. Albuquerque: ACL, 2021: 2140-2151.
|
| [25] |
ZUO J L, ZHOU H Y, NIE Y, et al. UFineBench: towards text-based person retrieval with ultra-fine granularity[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 22010-22019.
|
| [26] |
LIN D X, PENG Y X, MENG J K, et al. Cross-modal adaptive dual association for text-to-image person retrieval[J]. IEEE Transactions on Multimedia, 2024, 26: 6609-6620.
DOI
URL
|
| [27] |
QIN Y, CHEN C, FU Z H, et al. Human-centered interactive learning via MLLMs for text-to-image person re-identification[C]// 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2025: 14390-14399.
|
| [28] |
SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 618-626.
|