欢迎访问《图学学报》 分享到:

图学学报

• 计算机图形学与虚拟现实 • 上一篇    下一篇

一种面向无人机视频的多尺度摘要的设计与实现

  

  1. (1. 中国石化销售有限公司华南分公司,广东 广州 510000; 
    2. 中国科学院大学计算机科学与技术学院,北京 100190; 
    3. 中国科学院软件研究所人机交互北京市重点实验室,北京 100190)
  • 出版日期:2020-04-30 发布日期:2020-05-15
  • 基金资助:
    国家自然科学基金项目(2018YFC0809303)

Design and implementation of a multi-scale summarization for  unmanned aerial vehicle videos

  1. (1. South China Branch of Sinopec Sales Co., Ltd., Guangzhou Guangdong 510000, China;
    2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China;
    3. Beijing Key Laboratory of Human-Computer Interaction, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China)
  • Online:2020-04-30 Published:2020-05-15

摘要: 无人机视频是利用无人机航拍得到的一类重要的视频资源,被广泛运用于地面目 标的监测。但是,无人机视频的视野辽阔、不具有目标针对性的拍摄特点,使其存在大量时空 冗余,传统的视频交互手段显得十分低效。为此,提出了一种面向无人机视频的多尺度螺旋摘 要。首先,基于 YOLOv3 算法,训练能检测无人机视角的行人、车辆等目标的模型。然后,提 出了基于关键帧的视频目标检测算法,根据改进后的基于颜色特征的关键帧提取算法提取涵盖 视频关键信息的关键帧,并将检测模型应用于关键帧,高效获取整个视频的目标检测结果。之 后,从关键帧中提取相应的关键区域,作为摘要的呈现单元,并以螺旋的形式从内向外地将摘 要单元逐一呈现,辅以基于关键帧的视频定位和尺度缩放功能。最后,开发了草图注释、目标 分布螺旋、双螺旋播放等新颖的交互工具,满足用户的潜在需求,共同实现面向无人机视频的 高效交互。

关键词: 无人机, 视频摘要, 视频目标检测, 小目标检测, 螺旋摘要, 视频交互

Abstract: Unmanned aerial vehicle (UAV) videos, an important video resources captured by unmanned aerial vehicles, are now being widely used in ground target monitoring. However, there’s usually a large amount of space-time redundancy in UAV videos due to their grand view and unspecified targets, making the traditional methods of video interaction inefficient to get usable details. To solve the problem, a multi-scale spiral summarization for UAV videos was proposed. Firstly, we trained a detection model based on YOLOv3 algorithm to detect the small targets including pedestrians and vehicles from the UAV’s perspective. Then, we proposed a key-frame-based video object detection algorithm, by first extracting the key frames of the videos according to the improved color-feature-based key-frame-extraction algorithm, and then applying the model on the key frames to get the target detection results of the whole video. The key areas from the key frames were extracted as the displaying units of video summarization in a spiral form from the inside out with basic functions including key-frame-based video location and dynamic scaling. At last, some novel extended interaction tools were developed including sketch annotation, object distribution spiral and double spiral player, aiming to meet the users’ potential needs, and help them interact with the UAV videos more efficiently.

Key words:  unmanned aerial vehicle, video summarization, video object detection, small object detection, spiral summarization, video interaction