基于SAM2的腹腔镜手术多目标自动分割方法

doi:10.11996/JG.j.2095-302X.2025050969

图学学报 ›› 2025, Vol. 46 ›› Issue (5): 969-979.DOI: 10.11996/JG.j.2095-302X.2025050969

• 图像处理与计算机视觉 • 上一篇下一篇

基于SAM2的腹腔镜手术多目标自动分割方法

刘成¹^,²(), 张家意¹^,²^,³, 袁烽¹^,², 张睿¹^,²^,³, 高欣²^,³()

¹ 中国科学技术大学生物医学工程学院(苏州)生命科学与医学部，江苏苏州 215163
² 中国科学院苏州生物医学工程技术研究所，江苏苏州 215163
³ 济南国科医工科技发展有限公司，山东济南 250101

收稿日期:2025-06-26 接受日期:2025-08-12 出版日期:2025-10-30 发布日期:2025-09-10
通讯作者:高欣(1975-)，男，研究员，博士。主要研究方向为基于智能计算的精准医疗、手术导航及机器人、低剂量锥束CT成像。E-mail：xingaosam@163.com
第一作者:刘成(2001-)，男，硕士研究生。主要研究方向为手术导航。E-mail：1011948636@qq.com
基金资助:
国家自然科学基金(82372052);国家自然科学基金(82402373);山东省自然科学基金(ZR2022QF071);山东省自然科学基金(ZR2022QF099);泰山产业创新领军人才项目(tscx202312131)

SAM2-based multi-objective automatic segmentation method for laparoscopic surgery

LIU Cheng¹^,²(), ZHANG Jiayi¹^,²^,³, YUAN Feng¹^,², ZHANG Rui¹^,²^,³, GAO Xin²^,³()

¹ School of Biomedical Engineering (Suzhou), Department of Life Sciences and Medicine, University of Science and Technology of China, Suzhou Jiangsu 215163, China
² Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou Jiangsu 215163, China
³ Jinan Guoke Medical Engineering and Technology Development Co., Ltd., Jinan Shandong 250101, China

Received:2025-06-26 Accepted:2025-08-12 Published:2025-10-30 Online:2025-09-10
First author：LIU Cheng (2001-), master student. His main research interest covers surgical navigation. E-mail：1011948636@qq.com
Supported by:
National Natural Science Foundation of China(82372052);National Natural Science Foundation of China(82402373);Science Foundation of Shandong(ZR2022QF071);Science Foundation of Shandong(ZR2022QF099);Taishan Industrial Experts Program(tscx202312131)

摘要/Abstract

摘要：

腹腔镜术中场景的自动分割是手术机器人实现自主操作的关键基础，当前仍面临三重挑战：手术目标间纹理高度相似且边界模糊，导致相似目标难以精确分割；从亚毫米级缝合线到厘米级脏器组织存在显著尺度差异，制约了多目标同步分割精度提升；运动伪影和烟雾遮挡等干扰进一步影响术中多目标完整分割的鲁棒性。为此，提出基于视觉大模型SAM2的腹腔镜手术多目标自动分割方法(SAM2-MSNet)。采用LoRA+微调策略优化SAM2图像编码器，高效适配腹腔镜图像的纹理特征表达；设计跨尺度特征同步提取模块，实现多尺度目标的精确分割；构建特征关系全局感知模块，增强网络对运动伪影及烟雾遮挡等干扰的鲁棒性；并引入方向梯度直方图驱动的伪标签辅助监督机制，显著提升目标边缘分割精度。实验结果表明，SAM2-MSNet在Endovis2018和AutoLaparo数据集上分别取得了70.2%和69.6%的平均交并比(mIoU)，和78.5%和75.0%的平均Dice系数(mDice)。在推理速度与SAM2-UNet相当(23帧/秒 VS. 25帧/秒)的前提下，其分割精度显著提升了3.0%和6.7% (mIoU)和2.8%和6.8% (mDice)。SAM2-MSNet实现了对腹腔镜手术场景高精度全自动分割，为手术机器人自主化进程提供了关键技术支撑。

关键词: 腹腔镜手术场景分割, 视觉大模型, 跨尺度特征同步提取, 特征关系全局感知, 伪标签辅助监督

Abstract:

Automatic segmentation in laparoscopic surgical scenes is a critical for enabling surgical robots to perform autonomous operations. However, this task faces three major challenges: the high similarity in texture and blurred boundaries of surgical targets, making accurate segmentation difficult; significant scale differences, which hinder the synchronous segmentation of multiple targets; and intraoperative interferences, such as motion artifacts and smoke occlusion, that affect segmentation completeness. To address these challenges, a multi-objective automatic segmentation method for laparoscopic surgery (SAM2-MSNet) based on the visual large model SAM2 was proposed. The network employed a LoRA+ fine-tuning strategy to optimize SAM2’s image encoder, enabling efficient adaptation to the texture features of laparoscopic images. A cross-scale feature synchronous extraction module was designed to realize accurate segmentation of multi-scale targets. Furthermore, a global perception module of feature relationships was constructed to enhance the anti-interference abilities, such as motion artifacts and smoke occlusion. Additionally, a pseudo-label-assisted supervision mechanism driven by directional gradient histograms significantly enhanced the accuracy of target edge segmentation. Experimental results demonstrated that SAM2-MSNet achieved a mean intersection over union (mIoU) of 70.2%/69.6% and a mean Dice coefficient (mDice) of 78.5%/75.0% on the Endovis2018 and AutoLaparo datasets. On the premise that the reasoning speed was equivalent to that of SAM2-UNet (23 frames per second vs. 25 frames per second), the segmentation accuracy was significantly improved by 3.0%/6.7% (mIoU) and 2.8%/6.8% (mDice). This work enabled high-precision automatic segmentation for laparoscopic surgical scenes, providing a robust technical foundation for the autonomous operation of surgical robots.

Key words: laparoscopic surgical scene segmentation, visual large model, synchronous extraction of cross-scale features, global perception of feature relationships, pseudo-label assisted supervision

中图分类号:

R656
TP391

刘成, 张家意, 袁烽, 张睿, 高欣. 基于SAM2的腹腔镜手术多目标自动分割方法[J]. 图学学报, 2025, 46(5): 969-979.

LIU Cheng, ZHANG Jiayi, YUAN Feng, ZHANG Rui, GAO Xin. SAM2-based multi-objective automatic segmentation method for laparoscopic surgery[J]. Journal of Graphics, 2025, 46(5): 969-979.

图/表 15

参考文献 22

[1]	ALLAN M, KONDO S, BODENSTEDT S, et al. 2018 robotic scene segmentation challenge[EB/OL]. [2025-04-26]. https://arxiv.org/abs/2001.11190v3.
[2]	ALLAN M, SHVETS A, KURMANN T, et al. 2017 robotic instrument segmentation challenge[EB/OL]. [2025-04-26]. https://arxiv.org/abs/1902.06426.
[3]	NI Z L, BIAN G B, LI Z, et al. Space squeeze reasoning and low-rank bilinear feature fusion for surgical image segmentation[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(7): 3209-3217.
[4]	JIN Y M, YU Y, CHEN C, et al. Exploring intra- and inter-video relation for surgical semantic scene segmentation[J]. IEEE Transactions on Medical Imaging, 2022, 41(11): 2991-3002.
[5]	LIU M, HAN Y B, WANG J Z, et al. LSKANet: long strip kernel attention network for robotic surgical scene segmentation[J]. IEEE Transactions on Medical Imaging, 2024, 43(4): 1308-1322.
[6]	KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3992-4003.
[7]	CHEN C, MIAO J Z, WU D F, et al. MA-SAM: modality-agnostic SAM adaptation for 3D medical image segmentation[J]. Medical Image Analysis, 2024, 98: 103310.
[8]	张新宇, 张家意, 高欣. ASC-Net: 腹腔镜视频中手术器械与脏器快速分割网络[J]. 图学学报, 2024, 45(4): 659-669. DOI
	ZHANG X Y, ZHANG J Y, GAO X. ASC-Net: fast segmentation network for surgical instruments and organs in laparoscopic video[J]. Journal of Graphics, 2024, 45(4): 659-669 (in Chinese). DOI
[9]	RAVI N, GABEUR V, HU Y T, et al. SAM 2:segment anything in images and videos[EB/OL]. [2025-04-26]. https://arxiv.org/abs/2408.00714v2.
[10]	HAYOU S, GHOSH N, YU B. LoRA+: efficient low rank adaptation of large models[EB/OL]. [2025-04-26]. https://arxiv.org/abs/2402.12354v1.
[11]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2005: 886-893.
[12]	BHATTARAI B, SUBEDI R, GAIRE R R, et al. Histogram of oriented gradients meet deep learning: a novel multi-task deep network for 2D surgical image semantic segmentation[J]. Medical Image Analysis, 2023, 85: 102747.
[13]	WANG Z Y, LU B, LONG Y H, et al. AutoLaparo: a new dataset of integrated multi-tasks for image-guided surgical automation in laparoscopic hysterectomy[C]// The 25th International Conference on Medical Image Computing and Computer Assisted Intervention. Cham: Springer, 2022: 486-496.
[14]	HU E J, SHEN Y L, WALLIS P, et al. LORA: low-rank adaptation of large language models[EB/OL]. [2025-04-26]. https://arxiv.org/abs/2106.09685v1.
[15]	XIONG X Y, WU Z H, TAN S Y, et al. SAM2-UNET: segment anything 2 makes strong encoder for natural and medical image segmentation[EB/OL]. [2025-04-26]. https://arxiv.org/abs/2408.08870.
[16]	RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]// The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer, 2015: 234-241.
[17]	IGLOVIKOV V, SHVETS A. TernausNet: U-net with VGG 11 encoder pre-trained on ImageNet for image segmentation[EB/OL]. [2025-04-26]. https://arxiv.org/abs/1801.05746.
[18]	CHAURASIA A, CULURCIELLO E. LinkNet: exploiting encoder representations for efficient semantic segmentation[C]// 2017 IEEE Visual Communications and Image Processing. New York: IEEE Press, 2017: 1-4.
[19]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder- decoder with atrous separable convolution for semantic image segmentation[C]// The 15th European Conference on Computer Vision. New York: IEEE Press, 2018: 833-851.
[20]	RAHMAN M M, MUNIR M, MARCULESCU R. EMCAD: efficient multi-scale convolutional attention decoding for medical image segmentation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 11769-11779.
[21]	CHEN J N, MEI J R, LI X H, et al. TransUNet: rethinking the U-Net architecture design for medical image segmentation through the lens of transformers[J]. Medical Image Analysis, 2024, 97: 103280.
[22]	RUAN J C, LI J C, XIANG S C. VM-UNeT: vision mamba UNet for medical image segmentation[EB/OL]. [2025-04-26]. https://arxiv.org/abs/2402.02491v2.

模型	SECF	GPFR	GHOG	Endovis2018		AutoLaparo		FPS/(帧/秒)↑
模型	SECF	GPFR	GHOG	mIoU / %↑	mDice / %↑	mIoU / %↑	mDice / %↑	FPS/(帧/秒)↑
基线模型	×	×	×	67.7	76.0	67.1	72.5	28
	√	×	×	68.1	76.4	68.2	73.8	25
	×	√	×	68.2	76.8	67.4	72.9	26
	×	×	√	68.7	77.0	67.7	73.2	28
	√	√	×	68.6	77.1	69.1	74.7	23
	√	×	√	68.9	77.2	68.8	74.1	25
	×	√	√	69.4	77.8	68.6	74.2	26
	√	√	√	70.2	78.5	69.6	75.0	23

模型	SECF	GPFR	GHOG	Endovis2018		AutoLaparo		FPS/(帧/秒)↑
模型	SECF	GPFR	GHOG	mIoU / %↑	mDice / %↑	mIoU / %↑	mDice / %↑	FPS/(帧/秒)↑
基线模型	×	×	×	67.7	76.0	67.1	72.5	28
	√	×	×	68.1	76.4	68.2	73.8	25
	×	√	×	68.2	76.8	67.4	72.9	26
	×	×	√	68.7	77.0	67.7	73.2	28
	√	√	×	68.6	77.1	69.1	74.7	23
	√	×	√	68.9	77.2	68.8	74.1	25
	×	√	√	69.4	77.8	68.6	74.2	26
	√	√	√	70.2	78.5	69.6	75.0	23

模型	Endovis2018			AutoLaparo
模型	mIoU / %↑	mDice/%↑	FPS/(帧/秒)↑	mIoU/%↑	mDice/%↑	FPS/(帧/秒)↑
UNet	35.6	45.3	299	23.9	28.5	342
TernausNet	40.6	50.1	315	30.5	35.4	264
LinkNet	54.4	63.6	186	41.4	46.1	162
DeepLabv3+	58.6	67.9	129	51.4	57.0	126
EMCAD	63.2	71.8	27	62.5	67.9	23
TransUNet	59.9	69.5	45	53.2	59.1	44
VM-UNet	51.6	60.9	51	29.0	34.2	45
MASAM*	69.2	77.0	-	-	-	-
SAM2-UNet	67.2	75.7	24	62.9	68.2	25
SAM2-MSNet	70.2	78.5	23	69.6	75.0	23

模型	Endovis2018			AutoLaparo
模型	mIoU / %↑	mDice/%↑	FPS/(帧/秒)↑	mIoU/%↑	mDice/%↑	FPS/(帧/秒)↑
UNet	35.6	45.3	299	23.9	28.5	342
TernausNet	40.6	50.1	315	30.5	35.4	264
LinkNet	54.4	63.6	186	41.4	46.1	162
DeepLabv3+	58.6	67.9	129	51.4	57.0	126
EMCAD	63.2	71.8	27	62.5	67.9	23
TransUNet	59.9	69.5	45	53.2	59.1	44
VM-UNet	51.6	60.9	51	29.0	34.2	45
MASAM*	69.2	77.0	-	-	-	-
SAM2-UNet	67.2	75.7	24	62.9	68.2	25
SAM2-MSNet	70.2	78.5	23	69.6	75.0	23

模型	IoU_c / %↑										mIoU_c / %↑
模型	杆轴	执行器	腕部	肾实质	肾膜	缝合线	夹子	缝合针	肠道	超声器	mIoU_c / %↑
UNet	66.0	34.1	35.7	38.3	2.2	9.5	51.9	0.0	11.5	0.3	24.9
TernausNet	74.2	40.9	43.3	35.2	5.9	6.0	52.8	0.0	19.2	9.1	28.6
LinkNet	80.7	49.2	54.0	61.8	16.3	17.4	67.8	0.0	39.9	13.4	40.1
DeepLabv3+	81.1	53.5	58.2	64.6	28.7	28.5	77.0	0.0	41.4	8.1	44.1
EMCAD	84.4	61.4	63.4	65.4	34.2	38.9	81.7	1.7	39.1	34.2	50.5
TransUNet	82.5	52.2	56.5	62.7	43.6	32.0	70.3	0.0	46.0	18.8	46.5
VM-UNet	78.2	47.7	41.9	62.3	21.3	25.0	30.9	0.0	43.1	0.0	35.0
SAM2-UNet	86.8	59.6	64.2	71.1	49.6	46.9	83.4	0.0	53.9	42.2	55.8
SAM2-MSNet	87.7	62.1	64.2	76.6	56.6	47.8	83.3	0.0	63.5	50.0	59.2

基于SAM2的腹腔镜手术多目标自动分割方法

SAM2-based multi-objective automatic segmentation method for laparoscopic surgery

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 22

相关文章 1

编辑推荐

Metrics

本文评价

模型	IoU_c / %↑									mIoU_c / %↑
	抓钳		电凝器		解剖抓钳		电钩		子宫
	杆轴	执行器	杆轴	执行器	杆轴	执行器	杆轴	执行器	子宫
UNet	16.8	11.8	46.3	40.1	11.2	6.6	13.9	3.0	15.8	18.4
TernausNet	20.2	24.1	49.5	50.7	12.3	13.5	14.0	5.9	16.8	30.5
LinkNet	31.4	41.7	67.4	64.3	17.8	23.7	24.3	3.1	22.8	32.9
DeepLabv3+	51.8	48.6	74.6	66.8	41.5	43.3	39.3	12.9	24.0	44.8
EMCAD	65.6	64.5	80.4	72.1	63.3	62.9	64.5	35.9	30.0	59.9
TransUNet	46.7	44.2	78.3	71.0	35.6	33.8	66.1	25.1	28.7	47.7
VM-UNet	22.0	18.9	53.3	43.6	13.5	7.4	24.9	8.5	18.6	23.4
SAM2-UNet	58.3	60.7	83.3	76.9	55.2	55.0	60.4	30.1	33.5	57.0
SAM2-MSNet	67.0	68.5	83.4	80.5	68.6	68.7	68.3	34.0	38.2	64.1