BPA-SAM：面向工笔画数据的SAM边界框提示增强方法

doi:10.11996/JG.j.2095-302X.2025020322

图学学报 ›› 2025, Vol. 46 ›› Issue (2): 322-331.DOI: 10.11996/JG.j.2095-302X.2025020322

• 计算机图形学与虚拟现实 • 上一篇下一篇

BPA-SAM：面向工笔画数据的SAM边界框提示增强方法

张天圣¹(), 朱闽峰²(), 任怡雯³, 王琛涵³, 张立冬³, 张玮⁴, 陈为¹

1.浙江大学计算机辅助设计与图形系统全国重点实验室，浙江杭州 310058
2.浙江大学软件学院，浙江杭州 310058
3.浙江大学计算机科学与技术学院，浙江杭州 310058
4.浙大城市学院，浙江杭州 310015

收稿日期:2024-10-28 接受日期:2024-12-13 出版日期:2025-04-30 发布日期:2025-04-24
通讯作者:朱闽峰(1993-)，男，研究员，博士。主要研究方向为人工智能、可视分析等。E-mail：minfeng_zhu@zju.edu.cn
第一作者:张天圣(1998-)，男，硕士研究生。主要研究方向为计算机视觉。E-mail：22221302@zju.edu.cn
基金资助:
国家自然科学基金(62132017);浙江省重点研发“尖兵”攻关计划(2023C01119);浙江省自然科学基金(LD24F020011);浙江省自然科学基金(Q24F020006)

BPA-SAM: box prompt augmented SAM for traditional Chinese realistic painting

ZHANG Tiansheng¹(), ZHU Minfeng²(), REN Yiwen³, WANG Chenhan³, ZHANG Lidong³, ZHANG Wei⁴, CHEN Wei¹

1. State Key Laboratory of CAD & CG, Zhejiang University, Hangzhou Zhejiang 310058, China
2. School of Software Technology, Zhejiang University, Hangzhou Zhejiang 310058, China
3. College of Computer Science and Technology, Zhejiang University, Hangzhou Zhejiang 310058, China
4. Hangzhou City University, Hangzhou Zhejiang 310015, China

Received:2024-10-28 Accepted:2024-12-13 Published:2025-04-30 Online:2025-04-24
Contact: ZHU Minfeng (1993-), researcher, Ph.D. His main research interests cover artificial intelligence, visual analytics, etc. E-mail：minfeng_zhu@zju.edu.cn
First author：ZHANG Tiansheng (1998-), master student. His main research interest covers computer vision. E-mail：22221302@zju.edu.cn
Supported by:
National Natural Science Foundation of China(62132017);Key Research and Development “Pioneer” Tackling Plan Program in Zhejiang Province(2023C01119);Zhejiang Provincial Natural Science Foundation of China(LD24F020011);Zhejiang Provincial Natural Science Foundation of China(Q24F020006)

摘要/Abstract

摘要：

由于缺乏带有像素级标注的公开工笔画数据集，使得图像分割技术在工笔画领域的发展严重受阻。工笔画具有物象与背景颜色纹理相似、使用晕染渐变导致物象边界模糊等特性，给图像分割带来了挑战，SAM的出现为解决这些挑战带来新的可能性。尽管SAM在自然图像领域里展现出惊人分割能力和零样本泛化能力，但在处理工笔画图像时存在对物象不敏感、前景背景混淆等问题。针对上述问题，首先建立了一个包含403幅图像的花鸟主题工笔画数据集SegTCRP，其中包含5类前景对象。随后，采用LoRA方法对SAM进行微调，使其适应工笔画图像的特点。此外，提出了一种新的SAM边界框提示增强方法BPA-SAM，通过借助U-Net在边界框提示范围内基于一定策略辅助生成额外点提示来改善SAM前景背景混淆的问题。最终，实验验证了BPA-SAM较原始SAM在边界框提示条件下的分割性能提升了7.1%，为SAM在工笔画领域的图像分割应用奠定了基础。

关键词: 深度学习, 图像分割, 工笔画, 提示增强, 计算机视觉

Abstract:

Due to the lack of publicly available meticulously annotated datasets for traditional Chinese realistic painting, the development of image segmentation techniques in this field is severely hindered. Traditional Chinese realistic painting exhibits characteristics such as similarity in object and background color textures, as well as blurred object boundaries due to the use of gradient transitions, posing challenges for image segmentation. The emergence of the segment anything model (SAM) presents new possibilities for addressing these challenges. Despite SAM demonstrating remarkable segmentation capabilities and zero-shot generalization in the natural image domain, it faces issues of insensitivity to object details and foreground-background confusion when processing traditional Chinese realistic painting. To address these issues, a segmented Traditional Chinese realistic painting dataset themed around flowers and birds was constructed, comprising 403 images with 5 classes of fore-ground objects. Subsequently, we employed the LoRA (Low-Rank Adaptation) method was employed to fine-tune SAM, enabling it to adapt to the characteristics of traditional Chinese realistic paintings. Additionally, a novel boundary box prompting enhancement method called BPA-SAM was proposed, based on the U-Net model, to address fore-ground-background confusion by generating point prompts within the boundary box range. Ultimately, experiments confirmed that our approach improved SAM’s segmentation performance by 7.1% under boundary box prompting conditions, establishing a foundation for SAM’s image segmentation applications in the traditional Chinese realistic painting domain.

Key words: deep learning, image segmentation, traditional Chinese realistic painting, prompt augmentation, computer vision

中图分类号:

TP391
J212

张天圣, 朱闽峰, 任怡雯, 王琛涵, 张立冬, 张玮, 陈为. BPA-SAM：面向工笔画数据的SAM边界框提示增强方法[J]. 图学学报, 2025, 46(2): 322-331.

ZHANG Tiansheng, ZHU Minfeng, REN Yiwen, WANG Chenhan, ZHANG Lidong, ZHANG Wei, CHEN Wei. BPA-SAM: box prompt augmented SAM for traditional Chinese realistic painting[J]. Journal of Graphics, 2025, 46(2): 322-331.

图/表 12

图1 BPA-SAM结构

Fig. 1 Structure of BPA-SAM

图2 SegTCRP图片与标注样例

Fig. 2 Images and annotation samples in SegTCRP

表1 SegTCRP各前景类别实例数量

Table 1 Number of foreground categories instances in SegTCRP

类别	数量/个
花	1294
鸟	435
鱼	31
虫	214
印章	1299
总计	3273

图3 7种算法在SegTCRP上的实验结果((a)输入图像；(b) U-Net；(c) FCN；(d) PSPNet；(e) DeepLabV3+；(f) SegFormer；(g) SAM；(h) SAM-LoRA；(i)真实标注)

Fig. 3 Experimental results of seven algorithms on SegTCRP ((a) Input images; (b) U-Net; (c) FCN; (d) PSPNet; (e) DeepLabV3+; (f) SegFormer; (g) SAM; (h) SAM-LoRA; (i) True labeling)

表2 不同模型在SegTCRP上的分割精度对比/%

Table 2 Comparison of segmentation accuracy of different models on SegTCRP/%

模型	图像				平均
模型	花	鸟	虫	印章	平均
U-Net	65.57	80.13	65.10	82.26	73.27
FCN	54.23	57.93	58.06	76.46	61.82
PSPNet	53.91	69.92	57.68	60.49	60.51
DeepLabV3+	64.11	74.16	72.59	76.43	71.82
SegFormer	65.83	76.24	73.41	77.52	73.25
SAM	83.51	83.26	89.44	87.52	85.93
SAM-LoRA	90.38	86.52	92.32	95.10	91.08

表3 BPA-SAM不同点提示生成策略分割精度对比

Table 3 Comparison of segmentation accuracy of different point prompt generation strategies in BPA-SAM

阈值	前景点	背景点	分割精度/%
阈值	前景点	背景点	随机	最大熵	最远距离
5	2	0	92.65	92.73	92.83
	0	2	91.69	92.57	92.81
	2	2	91.74	92.37	92.46
15	2	0	92.66	92.91	92.85
	0	2	91.06	92.55	92.70
	2	2	92.72	92.17	92.31
25	2	0	92.42	92.64	92.48
	0	2	91.67	92.58	92.74
	2	2	91.93	92.03	91.81

图4 15%阈值时3种选取策略点选取结果((a)随机策略；(b)最大熵策略；(c)最远距离策略)

Fig. 4 Selection results of three selection strategies at a 15% threshold ((a) Random strategies; (b) Maximum entropy strategy; (c) The farthest distance strategy)

图5 15%阈值2个前景点时3种选取策略下BPA-SAM的分割结果((a)输入图像；(b) SAM-LoRA；(c)随机选取；(d)最大熵选取；(e)最远距离选取；(f)真实标注)

Fig. 5 Segmentation results of BPA-SAM under three selection strategies with two foreground points at a 15% threshold ((a) Input images; (b) SAM-LoRA; (c) Random selection; (d) Maximum entropy selection; (e) Farthest distance selection; (f) True labeling)

图6 不同阈值下最大距离策略选取结果((a) 5%阈值；(b) 15%阈值；(c) 25%阈值)

Fig. 6 Selection results of the maximum distance strategy under different thresholds ((a) The 5 per cent threshold; (b) The 15 per cent threshold; (c) The 25 per cent threshold)

表4 SAM上点提示生成策略的分割精度对比

Table 4 Comparison of segmentation accuracy of point prompt generation strategies on SAM

阈值/%	前景点	背景点	分割精度/%
阈值/%	前景点	背景点	随机	最大熵	最远距离
5	2	0	87.63	86.98	86.31
	0	2	81.96	79.28	80.39
	2	2	86.33	85.04	85.47
15	2	0	85.34	86.95	86.87
	0	2	81.02	78.76	82.63
	2	2	83.98	85.77	85.41
25	2	0	86.06	86.53	85.76
	0	2	82.62	79.48	82.69
	2	2	84.64	85.82	84.49

表5 不同微调方法的分割精度对比/%

Table 5 Comparison of segmentation accuracy of different fine-tuning methods/%

方法	DSC
图像编码器	91.08
掩码解码器	86.37
图像编码器+掩码解码器	89.67

表6 多前景提示点3种策略的分割精度对比

Table 6 Comparison of segmentation accuracy of three strategies for multiple foreground prompt points

阈值	前景点	分割精度/%
阈值	前景点	随机	最大熵	最远距离
5	4	91.19	91.49	91.27
	6	90.55	90.98	90.51
	8	89.94	90.96	90.23
15	4	91.72	91.94	91.12
	6	90.35	90.27	90.33
	8	89.06	90.12	90.14

参考文献 39

[1]	ZHANG W, KAM-KWAI W, CHEN Y T, et al. ScrollTimes: tracing the provenance of paintings as a window into history[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2981-2994.
[2]	SHI H Z, XU D, HE K J, et al. Contrastive learning for a single historical painting’s blind super-resolution[J]. Visual Informatics, 2021, 5(4): 81-88.
[3]	LI M, WANG Y, XU Y Q. Computing for Chinese cultural heritage[J]. Visual Informatics, 2022, 6(1): 1-13.
[4]	HUANG L L, PENG J F, ZHANG R M, et al. Learning deep representations for semantic image parsing: a comprehensive overview[J]. Frontiers of Computer Science, 2018, 12(5): 840-857. DOI
[5]	KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3992-4003.
[6]	WANG X L, ZHANG X S, CAO Y, et al. SegGPT: towards segmenting everything in context[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 1130-1140.
[7]	WU J D, JI W, LIU Y P, et al. Medical SAM adapter: adapting segment anything model for medical image segmentation[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2304.12620.
[8]	CHENG J L, YE J, DENG Z Y, et al. SAM-med2D[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2308.16184.
[9]	ZHANG K D, LIU D. Customized segment anything model for medical image segmentation[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2304.13785.
[10]	SULTAN R I, LI C Y, ZHU H, et al. GeoSAM: fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2311.11319.
[11]	CHEN K Y, LIU C Y, CHEN H, et al. RSPrompter: learning to prompt for remote sensing instance segmentation based on visual foundation model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4701117.
[12]	WANG C M, WANG R J. Image-based color ink diffusion rendering[J]. IEEE Transactions on Visualization and Computer Graphics, 2007, 13(2): 235-246.
[13]	CHEN X J, LIU Q H, CHEN Y H, et al. ColorNetVis: an interactive color network analysis system for exploring the color composition of traditional Chinese painting[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2916-2928.
[14]	COHEN N, NEWMAN Y, SHAMIR A. Semantic segmentation in art paintings[J]. Computer Graphics Forum, 2022, 41(2): 261-275.
[15]	JIANG S Q, HUANG Q M, YE Q X, et al. An effective method to detect and categorize digitized traditional Chinese paintings[J]. Pattern Recognition Letters, 2006, 27(7): 734-746.
[16]	SUN M J, ZHANG D, WANG Z, et al. Monte Carlo convex hull model for classification of traditional Chinese paintings[J]. Neurocomputing, 2016, 171: 788-797.
[17]	WANG Z, LU D Y, ZHANG D, et al. Fake modern Chinese painting identification based on spectral-spatial feature fusion on hyperspectral image[J]. Multidimensional Systems and Signal Processing, 2016, 27(4): 1031-1044.
[18]	REN T H, LIU S L, ZENG A L, et al. Grounded SAM: assembling open-world models for diverse visual tasks[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2401.14159.
[19]	WANG H X, VASU P K A, FAGHRI F, et al. SAM-CLIP: merging vision foundation models towards semantic and spatial understanding[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2024: 3635-3647.
[20]	HU E J, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2106.09685.
[21]	MAO Y R, GE Y H, FAN Y J, et al. A survey on LoRA of large language models[J]. Frontiers of Computer Science, 2025, 19(7): 197605. DOI
[22]	高峰, 聂婕, 黄磊, 等. 基于表现手法的国画分类方法研究[J]. 计算机学报, 2017, 40(12): 2871-2882.
	GAO F, NIE J, HUANG L, et al. Traditional Chinese painting classification based on painting techniques[J]. Chinese Journal of Computers, 2017, 40(12): 2871-2882 (in Chinese).
[23]	盛家川, 李玉芝. 国画的艺术目标分割及深度学习与分类[J]. 中国图象图形学报, 2018, 23(8): 1193-1206.
	SHENG J C, LI Y Z. Learning artistic objects for improved classification of Chinese paintings[J]. Journal of Image and Graphics, 2018, 23(8): 1193-1206 (in Chinese).
[24]	HU Q Y, ZHOU W L, PENG X L, et al. DRANet: a semantic segmentation network for Chinese landscape paintings[J]. Digital Signal Processing, 2024, 147: 104427.
[25]	MA J, HE Y T, LI F F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15(1): 654. DOI PMID
[26]	CHEN T R, ZHU L Y, DING C T, et al. SAM fails to segment anything?-SAM-adapter: adapting SAM in underperformed scenes:camouflage, shadow, medical image segmentation, and more[EB/OL]. [2024-04-19]. https://arxiv.org/pdf/2304.09148.
[27]	JULKA S, GRANITZER M. Knowledge distillation with segment anything (SAM) model for planetary geological mapping[C]// The 9th International Conference on Machine Learning, Optimization, and Data Science. Cham: Springer, 2023: 68-77.
[28]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]// The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[29]	DAI H X, MA C, LIU Z L, et al. SAMAug: point prompt augmentation for segment anything model[EB/OL]. [2024-04-19]. https://arxiv.org/pdf/2307.01187.
[30]	TANG F, DONG W M, MENG Y P, et al. Animated construction of Chinese brush paintings[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(12): 3019-3031.
[31]	LAI Y C, CHEN B A, CHEN K W, et al. Data-driven NPR illustrations of natural flows in Chinese painting[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(12): 2535-2549.
[32]	陈逸天, 张玮, 谭思危, 等. 历史人物群体对比可视化[J]. 图学学报, 2023, 44(6): 1227-1238. DOI
	CHEN Y T, ZHANG W, TAN S W, et al. Visualization comparison of historical figures cohorts[J]. Journal of Graphics, 2023, 44(6): 1227-1238 (in Chinese).
[33]	王斯加, 封颖超杰, 朱航, 等. TCPVis: 基于谢赫六法的传统中国绘画画派可视分析系统[J]. 图学学报, 2024, 45(1): 209-218. DOI
	WANG S J, FENG Y C J, ZHU H, et al. TCPVis: visual analysis system of traditional Chinese painting school based on six principles of Chinese painting[J]. Journal of Graphics, 2024, 45(1): 209-218 (in Chinese). DOI
[34]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:transformers for image recognition at scale[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2010.11929.
[35]	XIE E Z, WANG W H, YU Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[EB/OL]. [2024-08-28]. https://proceedings.neurips.cc/paper_files/paper/2021/file/64f1f27bf1b4ec22924fd0acb550c235-Supplemental.pdf.
[36]	EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The PASCAL visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136.
[37]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 3431-3440.
[38]	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6230-6239.
[39]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 833-841.

BPA-SAM：面向工笔画数据的SAM边界框提示增强方法

BPA-SAM: box prompt augmented SAM for traditional Chinese realistic painting

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 39

相关文章 15

编辑推荐

Metrics

本文评价

[1]	冷烁, 王玮, 欧家勇, 薛志刚, 宋英龙, 莫斯钧. 基于大型视觉语言模型的施工现场安全监控研究[J]. 图学学报, 2025, 46(5): 960-968.
[2]	朱泓淼, 钟国杰, 张严辞. 基于均值漂移与深度学习融合的小语义点云语义分割[J]. 图学学报, 2025, 46(5): 998-1009.
[3]	汪子宇, 曹维维, 曹玉柱, 刘猛, 陈俊, 刘兆邦, 郑健. 基于类内区域动态解耦的半监督肺气管分割[J]. 图学学报, 2025, 46(4): 763-774.
[4]	张帅, 洪翱, 胡恒瑞, 兰名荥, 郗小超. 基于AI动作捕捉技术的视神经脊髓炎康复训练系统交互性研究[J]. 图学学报, 2025, 46(4): 783-792.
[5]	王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509.
[6]	孙浩, 谢滔, 何龙, 郭文忠, 虞永方, 吴其军, 王建伟, 东辉. 多模态文本视觉大模型机器人地形感知算法研究[J]. 图学学报, 2025, 46(3): 558-567.
[7]	翟永杰, 王璐瑶, 赵晓瑜, 胡哲东, 王乾铭, 王亚茹. 基于级联查询-位置关系的输电线路多金具检测方法[J]. 图学学报, 2025, 46(2): 288-299.
[8]	潘树焱, 刘立群. MSFAFuse：基于多尺度特征信息与注意力机制的SAR和可见光图像融合模型[J]. 图学学报, 2025, 46(2): 300-311.
[9]	孙禾衣, 李艺潇, 田希, 张松海. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344.
[10]	陈瑞启, 刘晓飞, 万峰, 侯鹏, 沈金屹. 数字孪生驱动的卫星太阳翼展开测试仿真与预测方法[J]. 图学学报, 2025, 46(2): 449-458.
[11]	汪颜, 张牧雨, 刘秀珍. 基于深度学习的电影海报视觉互动意义评价方法[J]. 图学学报, 2025, 46(1): 221-232.
[12]	刘冀辰, 李金星, 吴佳, 张威, 齐宇诺, 周国亮. 大模型技术在电力行业的应用展望[J]. 图学学报, 2024, 45(6): 1132-1144.
[13]	李琼, 考月英, 张莹, 徐沛. 面向无人机航拍图像的目标检测研究综述[J]. 图学学报, 2024, 45(6): 1145-1164.
[14]	刘灿锋, 孙浩, 东辉. 结合Transformer与Kolmogorov Arnold网络的分子扩增时序预测研究[J]. 图学学报, 2024, 45(6): 1256-1265.
[15]	宋思程, 陈辰, 李晨辉, 王长波. 基于密度图多目标追踪的时空数据可视化[J]. 图学学报, 2024, 45(6): 1289-1300.