光影智绘：基于SAM的视频阴影鲁棒抽取

doi:10.11996/JG.j.2095-302X.2025040739

图学学报 ›› 2025, Vol. 46 ›› Issue (4): 739-745.DOI: 10.11996/JG.j.2095-302X.2025040739

• 图像处理与计算机视觉 • 上一篇下一篇

光影智绘：基于SAM的视频阴影鲁棒抽取

陈东(), 李昌隆, 杜振龙(), 宋爽, 李晓丽

南京工业大学计算机与信息工程学院(人工智能学院)，江苏南京 211816

收稿日期:2024-08-30 修回日期:2025-01-05 出版日期:2025-08-30 发布日期:2025-08-11
通讯作者:杜振龙(1971-)，男，教授，博士。主要研究方向为计算机图形学、计算机视觉。E-mail：duzhl-cad@163.com
第一作者:陈东(1978-)，男，讲师，硕士。主要研究方向为计算机图形学、计算机视觉。E-mail：chendong@njtech.edu.cn
基金资助:
国家自然科学基金(62202221);国家自然科学基金(61672279)

Intelligent depiction to illumination and shadow: robust video shadow extraction based on SAM

CHEN Dong(), LI Changlong, DU Zhenlong(), SONG Shuang, LI Xiaoli

College of Computer and Information Engineering (College of Artificial Intelligence), Nanjing Tech University, Nanjing Jiangsu 211816, China

Received:2024-08-30 Revised:2025-01-05 Published:2025-08-30 Online:2025-08-11
First author：CHEN Dong (1978-), lecturer, master. His main research interests cover computer graphics and computer vision. E-mail：chendong@njtech.edu.cn
Supported by:
National Natural Science Foundation of China(62202221);National Natural Science Foundation of China(61672279)

摘要/Abstract

摘要：

针对传统方法对于光照变化和物体遮挡引起复杂的、动态变化阴影处理易致阴影检测的准确率和鲁棒性较低问题，提出了一种基于分割万物模型(SAM)的视频阴影检测方法，对SAM解码器进行微调，使其更适合阴影检测；利用SAM提取关键帧阴影区域，引入XMem模型，结合感觉记忆、短时记忆和长时记忆联合前后帧信息，给出优化和稳定视频阴影检测结果。实验结果表明：在ViSha数据集的阴影实验结果与传统方法相比，该方法的均值绝对误差降低了约31.8%，交并比提升了约19.7%；定性和定量结果表明本方法不仅提升了视频阴影处理的准确率，并表现出较好的鲁棒性。

关键词: 阴影检测, 语义分割, 视频对象分割, SAM, XMem

Abstract:

A video shadow detection method based on the segmented anything model (SAM) is proposed to address the problem of low accuracy and robustness of traditional methods in handling complex and dynamic shadows caused by lighting variations and object occlusions.. The SAM decoder is fine tuned to better adopt to shadow detection, leveraging SAM’s accurate segmentation ability to extract shadow area in key frames, XMem model, incorporatingsensory memory, short-term memory, and long-term memory, is introduced to integrate information from adjacent frames, thereby optimizing and stabilizing shadow detection results. Experimental results show that the proposed method reduces the mean absolute error by approximately 31.8% and improves the intersection over-union ratio by about 19.7% compared to traditional approaches. Both qualitative and quantitative evaluations indicate that the proposed method not only improves the accuracy of video shadow detection but also exhibits superior robustness.

Key words: video shadow detection, semantic segmentation, VOS, SAM, XMem

中图分类号:

TP391.41

陈东, 李昌隆, 杜振龙, 宋爽, 李晓丽. 光影智绘：基于SAM的视频阴影鲁棒抽取[J]. 图学学报, 2025, 46(4): 739-745.

CHEN Dong, LI Changlong, DU Zhenlong, SONG Shuang, LI Xiaoli. Intelligent depiction to illumination and shadow: robust video shadow extraction based on SAM[J]. Journal of Graphics, 2025, 46(4): 739-745.

图/表 8

图1 本文提出的视频阴影提取网络

Fig. 1 Video shadow extraction network proposed by the paper

图2 微调后的SAM与原始SAM比较((a) 输入图像；(b) 原SAM阴影检测；(c) 微调后SAM阴影检测)

Fig. 2 Comparison between fine-tuned SAM and original SAM ((a) Input image; (b) Original SAM shadow detection; (c) Fine-tuned SAM shadow detection)

图3 Xmem框架图

Fig. 3 Xmem framework diagram

表1 实验结果比较

Table 1 Comparison experimental results

方法	任务	MAE↓	F-measure↑	IoU↑	BER↓	Temporal- level↑
FPN	SP	0.048	0.712	0.513	19.52	74.31
PSPNet	SP	0.054	0.651	0.476	19.83	76.62
DSS	SOD	0.049	0.703	0.503	19.84	75.04
MGA	SOD	0.065	0.601	0.398	25.75	77.81
PDBM	VOS	0.066	0.623	0.465	19.77	80.03
COSNet	VOS	0.039	0.707	0.512	20.51	78.31
DSD	ISD	0.046	0.701	0.516	19.89	74.65
FSD	ISD	0.058	0.682	0.491	20.58	74.87
TVSD	VSD	0.033	0.760	0.583	17.71	78.25
SC-Cor	VSD	0.042	0.769	0.615	13.61	81.46
STICT	VSD	0.046	0.702	0.640	16.60	79.61
SCOTCH	VSD	0.029	0.793	0.672	9.07	80.31
本文方法	VSD	0.020	0.821	0.698	11.24	82.21

图4 不同方法生成的视频阴影检测结果对比((a) 输入帧；(b) PSPNet；(c) DSSt；(d) COSNet；(e) TVSD；(f) STICT；(g) SC-Cor；(h) 本文方法；(i) 真实值)

Fig. 4 Comparison within results of video shadow detection produced by different methods ((a) Input; (b) PSPNet; (c) DSSt; (d) COSNet; (e) TVSD; (f) STICT; (g) SC-Cor; (h) Ours; (i) Ground truth)

图5 多场景XMem生成的阴影掩码

Fig. 5 Shadow masks generated by XMem in multiple scenarios

表2 与VSD方法性能对比

Table 2 Performance comparison with VSD method

方法	模型大小/MB	计算复杂度	推理时间/min
TVSD	243.32	158.89	32.4
STCIT	104.68	40.99	13.5
SC-Cor	232.63	218.40	21.8
SCOTCH	211.79	122.46	9.2
本文方法	93.73	16.32	7.8

图6 视频动态阴影检测((a, c) 视频原帧画面；(b, d) 输出的阴影蒙版)

Fig. 6 Video shadow detection ((a, c) Original video frame; (b, d) Output shadow mask)

参考文献 23

[1]	ZHU L, XU K, KE Z H, et al. Mitigating intensity bias in shadow detection via feature decomposition and reweighting[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 4682-4691.
[2]	ZHU Y R, FU X Y, CAO C Z, et al. Single image shadow detection via complementary mechanism[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 6717-6726.
[3]	ZHANG X E, BARRON J T, TSAI Y T, et al. Portrait shadow manipulation[J]. ACM Transactions on Graphics (TOG), 2020, 39(4): 78.
[4]	CHEN Z H, ZHU L, WAN L, et al. A multi-task mean teacher for semi-supervised shadow detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5610-5619.
[5]	LIAO J W, LIU Y L, XING G Y, et al. Shadow detection via predicting the confidence maps of shadow detection methods[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 704-712.
[6]	LIN J H, WANG L S. Spatial-temporal fusion network for fast video shadow detection[C]// The 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry. New York: ACM, 2022: 2.
[7]	魏后胜, 黄雯嘉, 董琦, 等. 面向增强现实的移动视点下室外视频的阴影检测[J]. 计算机辅助设计与图形学学报, 2019, 31(6): 997-1006.
	WEI H S, HUANG W J, DONG Q, et al. Detecting shadows from outdoor videos under moving viewpoints for augmented reality[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(6): 997-1006 (in Chinese).
[8]	CHEN Z H, WAN L, ZHU L, et al. Triple-cooperative video shadow detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 2714-2723.
[9]	DING X P, YANG J W, HU X W, et al. Learning shadow correspondence for video shadow detection[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 705-722.
[10]	LIU L H, PROST J, ZHU L, et al. SCOTCH and SODA: a transformer video shadow detection framework[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10449-10458.
[11]	WEI H S, XING G Y, LIAO J W, et al. Structure-aware spatial-temporal interaction network for video shadow detection[C]// The 33rd International Joint Conference on Artificial Intelligence. Jeju: IJCAI, 2024: 158
[12]	牟琦, 张寒, 何志强, 等. 基于深度估计和特征融合的尺度自适应目标跟踪算法[J]. 图学学报, 2021, 42(4): 563-571.
	MU Q, ZHANG H, HE Z Q, et al. Scale adaptive target tracking algorithm based on depth estimation and feature fusion[J]. Journal of Graphics, 2021, 42(4): 563-571 (in Chinese).
[13]	XU X H, WANG J L, LI X, et al. Reliable propagation- correction modulation for video object segmentation[C]// The 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2022: 2946-2954.
[14]	KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]// IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 4015-4026.
[15]	CHENG H K, SCHWING A G. XMem: long-term video object segmentation with an atkinson-shiffrin memory model[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 640-658.
[16]	DEB K, SUNY A H. Shadow detection and removal based on YCbCr color space[J]. Smart Computing Review, 2014, 4(1): 23-33.
[17]	KHAN S H, BENNAMOUN M, SOHEL F, et al. Automatic shadow detection and removal from a single image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(3): 431-446. DOI PMID
[18]	曲海成, 佟畅, 刘万军. 注意力与多尺度融合的图像阴影去除算法[J]. 计算机工程与应用, 2022, 58(16): 234-241. DOI
	QU H C, TONG C, LIU W J. Image shadow removal algorithm based on attention and multi-scale fusion[J]. Computer Engineering and Applications, 2022, 58(16): 234-241 (in Chinese). DOI
[19]	INOUE N, YAMASAKI T. Learning from synthetic shadows for shadow detection and removal[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4187-4197.
[20]	仇栋, 吴云超, 李蔚清, 等. 面向移动增强现实的室外阴影实时检测技术[J]. 图学学报, 2022, 43(1): 85-92.
	QIU D, WU Y C, LI W Q, et al. Real time outdoor shadow detection technology for mobile augmented reality[J]. Journal of Graphics, 2022, 43(1): 85-92 (in Chinese).
[21]	ZHU X Z, DAI J F, YUAN L, et al. Towards high performance video object detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7210-7218.
[22]	YANG Z X, WEI Y C, YANG Y. Collaborative video object segmentation by foreground-background integration[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 332-348.
[23]	OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 9225-9234.

光影智绘：基于SAM的视频阴影鲁棒抽取

Intelligent depiction to illumination and shadow: robust video shadow extraction based on SAM

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 23

相关文章 12

编辑推荐

Metrics

本文评价

[1]	崔丽莎, 宋志文, 姜晓恒, 马鑫, 陈恩庆, 徐明亮. 基于边界和语义感知的表面缺陷分割网络[J]. 图学学报, 2025, 46(3): 578-587.
[2]	李治寰, 宁小娟, 吕志勇, 石争浩, 金海燕, 王映辉, 周文明. DEMF-Net：基于双分支增强和多尺度融合的大规模点云语义分割[J]. 图学学报, 2025, 46(2): 259-269.
[3]	刘高屹, 胡瑞珍, 刘利刚. 基于2D特征蒸馏的3D高斯泼溅语义分割与编辑[J]. 图学学报, 2025, 46(2): 312-321.
[4]	张桂梅, 陶辉, 鲁飞飞, 彭昆. 基于双源判别器的域自适应城市场景语义分割[J]. 图学学报, 2023, 44(5): 907-917.
[5]	吴文欢, 张淏坤. 融合空间十字注意力与通道注意力的语义分割网络[J]. 图学学报, 2023, 44(3): 531-539.
[6]	黄志勇, 韩莎莎, 陈致君, 姚玉, 熊彪, 马凯. 一种用于视频对象分割的仿U形网络[J]. 图学学报, 2023, 44(1): 104-111.
[7]	崔振东, 李宗民, 杨树林, 刘玉杰, 李华. 基于语义分割引导的三维目标检测[J]. 图学学报, 2022, 43(6): 1134-1142.
[8]	范溢华 , 王永振 , 燕雪峰 , 宫丽娜 , 郭延文 , 魏明强 . 人脸识别任务驱动的低光照图像增强算法 [J]. 图学学报, 2022, 43(6): 1170-1181.
[9]	仇栋, 吴云超, 李蔚清, 苏智勇 . 面向移动增强现实的室外阴影实时检测技术 [J]. 图学学报, 2022, 43(1): 85-92.
[10]	姚翰, 殷雪峰, 李童, 张肇轩, 杨鑫, 尹宝才. 基于多任务模型的深度预测算法研究[J]. 图学学报, 2021, 42(3): 446-453.
[11]	郑顾平，王敏，李刚 . 基于注意力机制的多尺度融合航拍影像语义分割[J]. 图学学报, 2018, 39(6): 1069-1077.
[12]	刘丹1,杨风暴 1,卫红 2,李大威 1,韩晓峰 1. 基于多分类器的 C5.0 决策树植被分类方法[J]. 图学学报, 2017, 38(5): 722-728.