Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2025, Vol. 46 ›› Issue (5): 969-979.DOI: 10.11996/JG.j.2095-302X.2025050969

• Image Processing and Computer Vision • Previous Articles     Next Articles

SAM2-based multi-objective automatic segmentation method for laparoscopic surgery

LIU Cheng1,2(), ZHANG Jiayi1,2,3, YUAN Feng1,2, ZHANG Rui1,2,3, GAO Xin2,3()   

  1. 1 School of Biomedical Engineering (Suzhou), Department of Life Sciences and Medicine, University of Science and Technology of China, Suzhou Jiangsu 215163, China
    2 Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou Jiangsu 215163, China
    3 Jinan Guoke Medical Engineering and Technology Development Co., Ltd., Jinan Shandong 250101, China
  • Received:2025-06-26 Accepted:2025-08-12 Online:2025-10-30 Published:2025-09-10
  • Contact: GAO Xin
  • About author:First author contact:

    LIU Cheng (2001-), master student. His main research interest covers surgical navigation. E-mail:1011948636@qq.com

  • Supported by:
    National Natural Science Foundation of China(82372052);National Natural Science Foundation of China(82402373);Science Foundation of Shandong(ZR2022QF071);Science Foundation of Shandong(ZR2022QF099);Taishan Industrial Experts Program(tscx202312131)

Abstract:

Automatic segmentation in laparoscopic surgical scenes is a critical for enabling surgical robots to perform autonomous operations. However, this task faces three major challenges: the high similarity in texture and blurred boundaries of surgical targets, making accurate segmentation difficult; significant scale differences, which hinder the synchronous segmentation of multiple targets; and intraoperative interferences, such as motion artifacts and smoke occlusion, that affect segmentation completeness. To address these challenges, a multi-objective automatic segmentation method for laparoscopic surgery (SAM2-MSNet) based on the visual large model SAM2 was proposed. The network employed a LoRA+ fine-tuning strategy to optimize SAM2’s image encoder, enabling efficient adaptation to the texture features of laparoscopic images. A cross-scale feature synchronous extraction module was designed to realize accurate segmentation of multi-scale targets. Furthermore, a global perception module of feature relationships was constructed to enhance the anti-interference abilities, such as motion artifacts and smoke occlusion. Additionally, a pseudo-label-assisted supervision mechanism driven by directional gradient histograms significantly enhanced the accuracy of target edge segmentation. Experimental results demonstrated that SAM2-MSNet achieved a mean intersection over union (mIoU) of 70.2%/69.6% and a mean Dice coefficient (mDice) of 78.5%/75.0% on the Endovis2018 and AutoLaparo datasets. On the premise that the reasoning speed was equivalent to that of SAM2-UNet (23 frames per second vs. 25 frames per second), the segmentation accuracy was significantly improved by 3.0%/6.7% (mIoU) and 2.8%/6.8% (mDice). This work enabled high-precision automatic segmentation for laparoscopic surgical scenes, providing a robust technical foundation for the autonomous operation of surgical robots.

Key words: laparoscopic surgical scene segmentation, visual large model, synchronous extraction of cross-scale features, global perception of feature relationships, pseudo-label assisted supervision

CLC Number: