ASC-Net: fast segmentation network for surgical instruments and organs in laparoscopic video

doi:10.11996/JG.j.2095-302X.2024040659

Abstract

Abstract:

Laparoscopic surgery automation is an important component of intelligent surgery, which is based on the premise of real-time and precise segmentation of surgical instruments and organs under the scope of laparoscopy. Hindered by complex factors such as intraoperative blood contamination and smoke interference, the real-time and precise segmentation of surgical instruments and organs faced great challenges. The existing image segmentation methods all performed poorly. Therefore, a fast segmentation network based on attention perceptron and spatial channel (attention spatial channel net, ASC-Net) was proposed to achieve the rapid and precise segmentation of surgical instruments and organs in laparoscopic images. Under the UNet architecture, attention perceptron and spatial channel modules were designed, which were embedded between the network encoding and decoding modules through skip connections. This enabled the network to focus on the deep semantic information differences between similar targets in the images, while learning multi-dimensional features of each target at multiple scales. In addition, a pre-training fine-tuning strategy was adopted to reduce the network computation. Experimental results demonstrated that on the EndoVis2018 (Endovis robotic scene segmentation challenge 2018) dataset, the mean Dice coefficient (mDice), mean intersection-over-union (mIoU), and mean inference time (mIT) of this method were 90.64%, 86.40%, and 16.73 ms (about 60 frames/s), respectively, which were 26% and 39% higher than existing SOTA methods, with mIT reduced by 56%. On the AutoLaparo (automation in laparoscopic hysterectomy) dataset, the mDice, mIoU, and mIT of this method were 93.72%, 89.43%, and 16.41 ms (about 61 frames/s), respectively, outperforming the comparison method. While ensuring segmentation speed, the proposed method effectively enhanced segmentation accuracy, achieving the rapid and precise segmentation of surgical instruments and organs in laparoscopic images and advancing the field of laparoscopic surgery automation.

Key words: automated surgery, laparoscopic image, multi-object segmentation, attention perceptron, multi-scale features, pre-training fine-tuning

CLC Number:

TP391
TH77

ZHANG Xinyu, ZHANG Jiayi, GAO Xin. ASC-Net: fast segmentation network for surgical instruments and organs in laparoscopic video[J]. Journal of Graphics, 2024, 45(4): 659-669.

Figures/Tables 12

References 27

[1]	李翠云, 白静, 郑凉. 融合边缘增强注意力机制和U-Net网络的医学图像分割[J]. 图学学报, 2022, 43(2): 273-278.
	LI C Y, BAI J, ZHENG L. A U-Net based contour enhanced attention for medical image segmentation[J]. Journal of Graphics, 2022, 43(2): 273-278 (in Chinese). DOI
[2]	张丽媛, 赵海蓉, 何巍, 等. 融合全局-局部注意模块的Mask R-CNN膝关节囊肿检测方法[J]. 图学学报, 2023, 44(6): 1183-1190. DOI
	ZHANG L Y, ZHAO H R, HE W, et al. Knee cysts detection algorithm based on Mask R-CNN integrating global-local attention module[J]. Journal of Graphics, 2023, 44(6): 1183-1190 (in Chinese).
[3]	GARCÍA-PERAZA-HERRERA L C, LI W Q, FIDON L, et al. ToolNet: holistically-nested real-time segmentation of robotic surgical tools[C]// 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2017: 5717-5722.
[4]	KAMRUL HASAN S M, LINTE C A. U-NetPlus: a modified encoder-decoder U-net architecture for semantic and instance segmentation of surgical instruments from laparoscopic images[C]// 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society. New York: IEEE Press, 2019: 7205-7211.
[5]	QIN F B, LIN S, LI Y M, et al. Towards better surgical instrument segmentation in endoscopic vision: multi-angle feature aggregation and contour supervision[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6639-6646.
[6]	YANG L, GU Y G, BIAN G B, et al. An attention-guided network for surgical instrument segmentation from endoscopic images[J]. Computers in Biology and Medicine, 2022, 151(Pt A): 106216.
[7]	JIN Y M, CHENG K Y, DOU Q, et al. Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2019: 440-448.
[8]	KURMANN T, MÁRQUEZ-NEILA P, ALLAN M, et al. Mask then classify: multi-instance segmentation for surgical instruments[J]. International Journal of Computer Assisted Radiology and Surgery, 2021, 16(7): 1227-1236. DOI PMID
[9]	NI Z L, ZHOU X H, WANG G N, et al. SurgiNet: pyramid attention aggregation and class-wise self-distillation for surgical instrument segmentation[J]. Medical Image Analysis, 2022, 76: 102310.
[10]	单芳湄, 王梦文, 李敏. 融合注意力机制的肠道息肉分割多尺度卷积神经网络[J]. 图学学报, 2023, 44(1): 50-58. DOI
	SHAN F M, WANG M W, LI M. Multi-scale convolutional neural network incorporating attention mechanism for intestinal polyp segmentation[J]. Journal of Graphics, 2023, 44(1): 50-58 (in Chinese).
[11]	陆秋, 邵铧泽, 张云磊. 动态平衡多尺度特征融合的结直肠息肉分割[J]. 图学学报, 2023, 44(2): 225-232. DOI
	LU Q, SHAO H Z, ZHANG Y L. Dynamic balanced multi-scale feature fusion for colorectal polyp segmentation[J]. Journal of Graphics, 2023, 44(2): 225-232 (in Chinese). DOI
[12]	GIBSON E, ROBU M R, THOMPSON S, et al. Deep residual networks for automatic segmentation of laparoscopic videos of the liver[C]//Medical Imaging 2017:Image-Guided Procedures, Robotic Interventions, and Modeling. Bellingham:SPIE, 2017: 423-428.
[13]	NI Z L, BIAN G B, LI Z, et al. Space squeeze reasoning and low-rank bilinear feature fusion for surgical image segmentation[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(7): 3209-3217.
[14]	ALLAN M, KONDO S, BODENSTEDT S, et al. 2018 robotic scene segmentation challenge[EB/OL]. [2024-01-18]. http://arxiv.org/abs/2001.11190.
[15]	HASAN S M K, SIMON R A, LINTE C A. Inpainting surgical occlusion from laparoscopic video sequences for robot-assisted interventions[J]. Journal of Medical Imaging, 2023, 10(4): 045002.
[16]	IGLOVIKOV V, SEFERBEKOV S, BUSLAEV A, et al. TernausNetV2: fully convolutional network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2018: 228-2284.
[17]	HENDRYCKS D, GIMPEL K. Gaussian error linear units (GELUs)[EB/OL]. [2024-01-18]. http://arxiv.org/abs/1606.08415.
[18]	KUMAR R L, KAKARLA J, ISUNURI B V, et al. Multi-class brain tumor classification using residual network and global average pooling[J]. Multimedia Tools and Applications, 2021, 80(9): 13429-13438.
[19]	WANG Z Y, LU B, LONG Y H, et al. AutoLaparo: A new dataset of integrated multi-tasks for image-guided surgical automation in laparoscopic hysterectomy[C]// International Conference on Medical Image Computing and Computer- Assisted Intervention. Cham: Springer, 2022: 486-496.
[20]	LENG Z Q, TAN M X, LIU C X, et al. PolyLoss: a polynomial expansion perspective of classification loss functions[EB/OL]. [2024-01-18]. http://arxiv.org/abs/2204.12511.
[21]	ZHONG Z L, LIN Z Q, BIDART R, et al. Squeeze-and- attention networks for semantic segmentation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 13062-13071.
[22]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder- decoder with atrous separable convolution for semantic image segmentation[C]// Computer Vision - ECCV 2018: 15th European Conference. New York: ACM, 2018: 833-851.
[23]	FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3141-3149.
[24]	RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241.
[25]	NI Z L, BIAN G B, ZHOU X H, et al. RAUNet: residual attention U-net for semantic segmentation of cataract surgical instruments[C]// International Conference on Neural Information Processing. Cham: Springer, 2019: 139-149.
[26]	NI Z L, BIAN G B, WANG G N, et al. BARNet: bilinear attention network with adaptive receptive fields for surgical instrument segmentation[EB/OL]. [2024-01-18]. http://arxiv.org/abs/2001.07093.
[27]	ZHAO X K, HAYASHI Y, ODA M, et al. Masked frequency consistency for domain-adaptive semantic segmentation of laparoscopic images[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2023: 663-673.

模型	ASPB	CFB	mDice/%	mIoU/%
基线模型	×	×	48.86	39.86
基线模型	√	×	70.12	65.47
基线模型	×	√	68.28	58.13
基线模型	√	√	71.39	66.59

模型	ASPB	CFB	mDice/%	mIoU/%
基线模型	×	×	48.86	39.86
基线模型	√	×	70.12	65.47
基线模型	×	√	68.28	58.13
基线模型	√	√	71.39	66.59

模型	mDice	mIoU
基线模型	48.86	39.86
基线模型+SE	67.94	54.63
基线模型+ASPP	70.04	58.46
基线模型+DAM	70.21	62.88
基线模型+SCB	71.39	66.59

模型	mDice	mIoU
基线模型	48.86	39.86
基线模型+SE	67.94	54.63
基线模型+ASPP	70.04	58.46
基线模型+DAM	70.21	62.88
基线模型+SCB	71.39	66.59

模型	MHCA	MCP	mDice/%	mIoU/%
基线模型	×	×	48.86	39.86
基线模型	√	×	71.21	62.48
基线模型	×	√	52.54	43.92
基线模型	√	√	73.31	65.15