欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (6): 1281-1291.DOI: 10.11996/JG.j.2095-302X.2025061281

• 图像处理与计算机视觉 • 上一篇    下一篇

基于周期一致性和动态记忆增强的无监督无人机目标跟踪

肖凯1,2(), 袁玲1,2, 储珺1,2()   

  1. 1 南昌航空大学图像处理与模式识别江西省重点实验室江西 南昌 330063
    2 南昌航空大学软件学院江西 南昌 330063
  • 收稿日期:2025-03-06 接受日期:2025-05-07 出版日期:2025-12-30 发布日期:2025-12-27
  • 通讯作者:储珺(1967-),女,教授,博士。主要研究方向为计算机视觉、图像处理和深度学习。E-mail:chuj@nchu.edu.cn
  • 第一作者:肖凯(1999-),男,硕士研究生。主要研究方向为目标跟踪和无监督学习。E-mail:xiaok9900@163.com
  • 基金资助:
    江西省研究生创新专项(YC2023-S747)

Unsupervised cycle-consistent learning with dynamic memory-augmented for unmanned aerial vehicle videos tracking

XIAO Kai1,2(), YUAN Ling1,2, CHU Jun1,2()   

  1. 1 Jiangxi Provincial Key Laboratory of Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang Jiangxi 330063, China
    2 School of Software Engineering, Nanchang Hangkong University, Nanchang Jiangxi 330063, China
  • Received:2025-03-06 Accepted:2025-05-07 Published:2025-12-30 Online:2025-12-27
  • First author:XIAO Kai (1999-), master student. His main research interests cover object tracking and unsupervised learning. E-mail:xiaok9900@163.com
  • Supported by:
    Innovative Special Fund for Graduate Students in Jiangxi Province(YC2023-S747)

摘要:

针对无人机(UAV)视频数据集采集成本高,现有数据普遍存在规模有限、应用场景单一,且现有无监督目标跟踪方法通常只用于通用数据集设计,对UAV的复杂场景难以学习可靠信息等问题,提出一种无监督UAV目标跟踪模型,其基于时间周期一致性与动态记忆增强。首先,将显著性目标检测引入无标签的对象发现,并与无监督光流技术结合,引入基于图像熵的动态规划,提高伪标签的质量。其次,为视频中的每一帧定义权重,并利用这些权重进行单帧训练,以更全面地利用每一帧的信息。最后,借鉴长短期记忆网络的思想,将记忆队列转变为动态记忆队列。设计自注意力分支且作为记忆队列的门控机制,并控制队列的记忆与遗忘,在不增加队列长度的同时,实现长跨度下的目标特征变化学习。该方法在UAV数据集上的准确率达到了68%,领先于其他无监督跟踪器,与一般有监督跟踪器的性能持平。在一般场景数据集上也与其他无监督跟踪器性能近似,准确率达到77%。在UAV数据集和一般场景数据集上的实验结果表明,其在快速运动和大尺度变化场景性能方面有较好提高。

关键词: 目标跟踪, 无人机, 无监督学习, 注意力机制, 孪生网络

Abstract:

The collection of UAV (unmanned aerial vehicle) video datasets is costly and faces issues such as limited quantity, low quality, and scenario constraints. To address these challenges, an unsupervised UAV-object-tracking model based on temporal cycle consistency and dynamic memory enhancement was proposed. First, salient-object detection was introduced for unlabeled object discovery. By combining salient object detection with unsupervised optical flow techniques and incorporating dynamic programming based on image entropy, the quality of pseudo-labels was improved. Second, a weight is defined for each frame in the video, and these weights are utilized for single-frame training to fully leverage the information from all frames. Finally, inspired by long short-term memory (LSTM) networks, the memory queue was transformed into a dynamic memory queue, along with a self-attention branch designed to control its updates. Target-features changes over long spans were learned without increasing the queue length. The proposed method achieved 68% accuracy on UAV datasets, outperforming other unsupervised trackers and matching typical supervised-tracker performance. On general scene datasets, it attained 77?% accuracy, comparable to other unsupervised trackers. Experimental results on both UAV and general scene datasets demonstrated that the proposed method achieved excellent performance in scenarios involving rapid motion and large-scale variations.

Key words: object tracking, unmanned aerial vehicle, unsupervised learning, attention mechanism, twin network

中图分类号: