欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (4): 739-745.DOI: 10.11996/JG.j.2095-302X.2025040739

• 图像处理与计算机视觉 • 上一篇    下一篇

光影智绘:基于SAM的视频阴影鲁棒抽取

陈东(), 李昌隆, 杜振龙(), 宋爽, 李晓丽   

  1. 南京工业大学计算机与信息工程学院(人工智能学院),江苏 南京 211816
  • 收稿日期:2024-08-30 修回日期:2025-01-05 出版日期:2025-08-30 发布日期:2025-08-11
  • 通讯作者:杜振龙(1971-),男,教授,博士。主要研究方向为计算机图形学、计算机视觉。E-mail:duzhl-cad@163.com
  • 第一作者:陈东(1978-),男,讲师,硕士。主要研究方向为计算机图形学、计算机视觉。E-mail:chendong@njtech.edu.cn
  • 基金资助:
    国家自然科学基金(62202221);国家自然科学基金(61672279)

Intelligent depiction to illumination and shadow: robust video shadow extraction based on SAM

CHEN Dong(), LI Changlong, DU Zhenlong(), SONG Shuang, LI Xiaoli   

  1. College of Computer and Information Engineering (College of Artificial Intelligence), Nanjing Tech University, Nanjing Jiangsu 211816, China
  • Received:2024-08-30 Revised:2025-01-05 Published:2025-08-30 Online:2025-08-11
  • First author:CHEN Dong (1978-), lecturer, master. His main research interests cover computer graphics and computer vision. E-mail:chendong@njtech.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62202221);National Natural Science Foundation of China(61672279)

摘要:

针对传统方法对于光照变化和物体遮挡引起复杂的、动态变化阴影处理易致阴影检测的准确率和鲁棒性较低问题,提出了一种基于分割万物模型(SAM)的视频阴影检测方法,对SAM解码器进行微调,使其更适合阴影检测;利用SAM提取关键帧阴影区域,引入XMem模型,结合感觉记忆、短时记忆和长时记忆联合前后帧信息,给出优化和稳定视频阴影检测结果。实验结果表明:在ViSha数据集的阴影实验结果与传统方法相比,该方法的均值绝对误差降低了约31.8%,交并比提升了约19.7%;定性和定量结果表明本方法不仅提升了视频阴影处理的准确率,并表现出较好的鲁棒性。

关键词: 阴影检测, 语义分割, 视频对象分割, SAM, XMem

Abstract:

A video shadow detection method based on the segmented anything model (SAM) is proposed to address the problem of low accuracy and robustness of traditional methods in handling complex and dynamic shadows caused by lighting variations and object occlusions.. The SAM decoder is fine tuned to better adopt to shadow detection, leveraging SAM’s accurate segmentation ability to extract shadow area in key frames, XMem model, incorporatingsensory memory, short-term memory, and long-term memory, is introduced to integrate information from adjacent frames, thereby optimizing and stabilizing shadow detection results. Experimental results show that the proposed method reduces the mean absolute error by approximately 31.8% and improves the intersection over-union ratio by about 19.7% compared to traditional approaches. Both qualitative and quantitative evaluations indicate that the proposed method not only improves the accuracy of video shadow detection but also exhibits superior robustness.

Key words: video shadow detection, semantic segmentation, VOS, SAM, XMem

中图分类号: