Intelligent depiction to illumination and shadow: robust video shadow extraction based on SAM

doi:10.11996/JG.j.2095-302X.2025040739

Abstract

Abstract:

A video shadow detection method based on the segmented anything model (SAM) is proposed to address the problem of low accuracy and robustness of traditional methods in handling complex and dynamic shadows caused by lighting variations and object occlusions.. The SAM decoder is fine tuned to better adopt to shadow detection, leveraging SAM’s accurate segmentation ability to extract shadow area in key frames, XMem model, incorporatingsensory memory, short-term memory, and long-term memory, is introduced to integrate information from adjacent frames, thereby optimizing and stabilizing shadow detection results. Experimental results show that the proposed method reduces the mean absolute error by approximately 31.8% and improves the intersection over-union ratio by about 19.7% compared to traditional approaches. Both qualitative and quantitative evaluations indicate that the proposed method not only improves the accuracy of video shadow detection but also exhibits superior robustness.

Key words: video shadow detection, semantic segmentation, VOS, SAM, XMem

CLC Number:

TP391.41

CHEN Dong, LI Changlong, DU Zhenlong, SONG Shuang, LI Xiaoli. Intelligent depiction to illumination and shadow: robust video shadow extraction based on SAM[J]. Journal of Graphics, 2025, 46(4): 739-745.

Figures/Tables 8

References 23

[1]	ZHU L, XU K, KE Z H, et al. Mitigating intensity bias in shadow detection via feature decomposition and reweighting[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 4682-4691.
[2]	ZHU Y R, FU X Y, CAO C Z, et al. Single image shadow detection via complementary mechanism[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 6717-6726.
[3]	ZHANG X E, BARRON J T, TSAI Y T, et al. Portrait shadow manipulation[J]. ACM Transactions on Graphics (TOG), 2020, 39(4): 78.
[4]	CHEN Z H, ZHU L, WAN L, et al. A multi-task mean teacher for semi-supervised shadow detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5610-5619.
[5]	LIAO J W, LIU Y L, XING G Y, et al. Shadow detection via predicting the confidence maps of shadow detection methods[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 704-712.
[6]	LIN J H, WANG L S. Spatial-temporal fusion network for fast video shadow detection[C]// The 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry. New York: ACM, 2022: 2.
[7]	魏后胜, 黄雯嘉, 董琦, 等. 面向增强现实的移动视点下室外视频的阴影检测[J]. 计算机辅助设计与图形学学报, 2019, 31(6): 997-1006.
	WEI H S, HUANG W J, DONG Q, et al. Detecting shadows from outdoor videos under moving viewpoints for augmented reality[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(6): 997-1006 (in Chinese).
[8]	CHEN Z H, WAN L, ZHU L, et al. Triple-cooperative video shadow detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 2714-2723.
[9]	DING X P, YANG J W, HU X W, et al. Learning shadow correspondence for video shadow detection[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 705-722.
[10]	LIU L H, PROST J, ZHU L, et al. SCOTCH and SODA: a transformer video shadow detection framework[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10449-10458.
[11]	WEI H S, XING G Y, LIAO J W, et al. Structure-aware spatial-temporal interaction network for video shadow detection[C]// The 33rd International Joint Conference on Artificial Intelligence. Jeju: IJCAI, 2024: 158
[12]	牟琦, 张寒, 何志强, 等. 基于深度估计和特征融合的尺度自适应目标跟踪算法[J]. 图学学报, 2021, 42(4): 563-571.
	MU Q, ZHANG H, HE Z Q, et al. Scale adaptive target tracking algorithm based on depth estimation and feature fusion[J]. Journal of Graphics, 2021, 42(4): 563-571 (in Chinese).
[13]	XU X H, WANG J L, LI X, et al. Reliable propagation- correction modulation for video object segmentation[C]// The 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2022: 2946-2954.
[14]	KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]// IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 4015-4026.
[15]	CHENG H K, SCHWING A G. XMem: long-term video object segmentation with an atkinson-shiffrin memory model[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 640-658.
[16]	DEB K, SUNY A H. Shadow detection and removal based on YCbCr color space[J]. Smart Computing Review, 2014, 4(1): 23-33.
[17]	KHAN S H, BENNAMOUN M, SOHEL F, et al. Automatic shadow detection and removal from a single image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(3): 431-446. DOI PMID
[18]	曲海成, 佟畅, 刘万军. 注意力与多尺度融合的图像阴影去除算法[J]. 计算机工程与应用, 2022, 58(16): 234-241. DOI
	QU H C, TONG C, LIU W J. Image shadow removal algorithm based on attention and multi-scale fusion[J]. Computer Engineering and Applications, 2022, 58(16): 234-241 (in Chinese). DOI
[19]	INOUE N, YAMASAKI T. Learning from synthetic shadows for shadow detection and removal[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4187-4197.
[20]	仇栋, 吴云超, 李蔚清, 等. 面向移动增强现实的室外阴影实时检测技术[J]. 图学学报, 2022, 43(1): 85-92.
	QIU D, WU Y C, LI W Q, et al. Real time outdoor shadow detection technology for mobile augmented reality[J]. Journal of Graphics, 2022, 43(1): 85-92 (in Chinese).
[21]	ZHU X Z, DAI J F, YUAN L, et al. Towards high performance video object detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7210-7218.
[22]	YANG Z X, WEI Y C, YANG Y. Collaborative video object segmentation by foreground-background integration[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 332-348.
[23]	OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 9225-9234.

方法	任务	MAE↓	F-measure↑	IoU↑	BER↓	Temporal- level↑
FPN	SP	0.048	0.712	0.513	19.52	74.31
PSPNet	SP	0.054	0.651	0.476	19.83	76.62
DSS	SOD	0.049	0.703	0.503	19.84	75.04
MGA	SOD	0.065	0.601	0.398	25.75	77.81
PDBM	VOS	0.066	0.623	0.465	19.77	80.03
COSNet	VOS	0.039	0.707	0.512	20.51	78.31
DSD	ISD	0.046	0.701	0.516	19.89	74.65
FSD	ISD	0.058	0.682	0.491	20.58	74.87
TVSD	VSD	0.033	0.760	0.583	17.71	78.25
SC-Cor	VSD	0.042	0.769	0.615	13.61	81.46
STICT	VSD	0.046	0.702	0.640	16.60	79.61
SCOTCH	VSD	0.029	0.793	0.672	9.07	80.31
本文方法	VSD	0.020	0.821	0.698	11.24	82.21

方法	任务	MAE↓	F-measure↑	IoU↑	BER↓	Temporal- level↑
FPN	SP	0.048	0.712	0.513	19.52	74.31
PSPNet	SP	0.054	0.651	0.476	19.83	76.62
DSS	SOD	0.049	0.703	0.503	19.84	75.04
MGA	SOD	0.065	0.601	0.398	25.75	77.81
PDBM	VOS	0.066	0.623	0.465	19.77	80.03
COSNet	VOS	0.039	0.707	0.512	20.51	78.31
DSD	ISD	0.046	0.701	0.516	19.89	74.65
FSD	ISD	0.058	0.682	0.491	20.58	74.87
TVSD	VSD	0.033	0.760	0.583	17.71	78.25
SC-Cor	VSD	0.042	0.769	0.615	13.61	81.46
STICT	VSD	0.046	0.702	0.640	16.60	79.61
SCOTCH	VSD	0.029	0.793	0.672	9.07	80.31
本文方法	VSD	0.020	0.821	0.698	11.24	82.21

方法	模型大小/MB	计算复杂度	推理时间/min
TVSD	243.32	158.89	32.4
STCIT	104.68	40.99	13.5
SC-Cor	232.63	218.40	21.8
SCOTCH	211.79	122.46	9.2
本文方法	93.73	16.32	7.8

方法	模型大小/MB	计算复杂度	推理时间/min
TVSD	243.32	158.89	32.4
STCIT	104.68	40.99	13.5
SC-Cor	232.63	218.40	21.8
SCOTCH	211.79	122.46	9.2
本文方法	93.73	16.32	7.8