图学学报 ›› 2025, Vol. 46 ›› Issue (2): 322-331.DOI: 10.11996/JG.j.2095-302X.2025020322
张天圣1(), 朱闽峰2(
), 任怡雯3, 王琛涵3, 张立冬3, 张玮4, 陈为1
收稿日期:
2024-10-28
接受日期:
2024-12-13
出版日期:
2025-04-30
发布日期:
2025-04-24
通讯作者:
朱闽峰(1993-),男,研究员,博士。主要研究方向为人工智能、可视分析等。E-mail:minfeng_zhu@zju.edu.cn第一作者:
张天圣(1998-),男,硕士研究生。主要研究方向为计算机视觉。E-mail:22221302@zju.edu.cn
基金资助:
ZHANG Tiansheng1(), ZHU Minfeng2(
), REN Yiwen3, WANG Chenhan3, ZHANG Lidong3, ZHANG Wei4, CHEN Wei1
Received:
2024-10-28
Accepted:
2024-12-13
Published:
2025-04-30
Online:
2025-04-24
First author:
ZHANG Tiansheng (1998-), master student. His main research interest covers computer vision. E-mail:22221302@zju.edu.cn
Supported by:
摘要:
由于缺乏带有像素级标注的公开工笔画数据集,使得图像分割技术在工笔画领域的发展严重受阻。工笔画具有物象与背景颜色纹理相似、使用晕染渐变导致物象边界模糊等特性,给图像分割带来了挑战,SAM的出现为解决这些挑战带来新的可能性。尽管SAM在自然图像领域里展现出惊人分割能力和零样本泛化能力,但在处理工笔画图像时存在对物象不敏感、前景背景混淆等问题。针对上述问题,首先建立了一个包含403幅图像的花鸟主题工笔画数据集SegTCRP,其中包含5类前景对象。随后,采用LoRA方法对SAM进行微调,使其适应工笔画图像的特点。此外,提出了一种新的SAM边界框提示增强方法BPA-SAM,通过借助U-Net在边界框提示范围内基于一定策略辅助生成额外点提示来改善SAM前景背景混淆的问题。最终,实验验证了BPA-SAM较原始SAM在边界框提示条件下的分割性能提升了7.1%,为SAM在工笔画领域的图像分割应用奠定了基础。
中图分类号:
张天圣, 朱闽峰, 任怡雯, 王琛涵, 张立冬, 张玮, 陈为. BPA-SAM:面向工笔画数据的SAM边界框提示增强方法[J]. 图学学报, 2025, 46(2): 322-331.
ZHANG Tiansheng, ZHU Minfeng, REN Yiwen, WANG Chenhan, ZHANG Lidong, ZHANG Wei, CHEN Wei. BPA-SAM: box prompt augmented SAM for traditional Chinese realistic painting[J]. Journal of Graphics, 2025, 46(2): 322-331.
类别 | 数量/个 |
---|---|
花 | 1294 |
鸟 | 435 |
鱼 | 31 |
虫 | 214 |
印章 | 1299 |
总计 | 3273 |
表1 SegTCRP各前景类别实例数量
Table 1 Number of foreground categories instances in SegTCRP
类别 | 数量/个 |
---|---|
花 | 1294 |
鸟 | 435 |
鱼 | 31 |
虫 | 214 |
印章 | 1299 |
总计 | 3273 |
图3 7种算法在SegTCRP上的实验结果((a)输入图像;(b) U-Net;(c) FCN;(d) PSPNet;(e) DeepLabV3+;(f) SegFormer;(g) SAM;(h) SAM-LoRA;(i)真实标注)
Fig. 3 Experimental results of seven algorithms on SegTCRP ((a) Input images; (b) U-Net; (c) FCN; (d) PSPNet; (e) DeepLabV3+; (f) SegFormer; (g) SAM; (h) SAM-LoRA; (i) True labeling)
模型 | 图像 | 平均 | |||
---|---|---|---|---|---|
花 | 鸟 | 虫 | 印章 | ||
U-Net | 65.57 | 80.13 | 65.10 | 82.26 | 73.27 |
FCN | 54.23 | 57.93 | 58.06 | 76.46 | 61.82 |
PSPNet | 53.91 | 69.92 | 57.68 | 60.49 | 60.51 |
DeepLabV3+ | 64.11 | 74.16 | 72.59 | 76.43 | 71.82 |
SegFormer | 65.83 | 76.24 | 73.41 | 77.52 | 73.25 |
SAM | 83.51 | 83.26 | 89.44 | 87.52 | 85.93 |
SAM-LoRA | 90.38 | 86.52 | 92.32 | 95.10 | 91.08 |
表2 不同模型在SegTCRP上的分割精度对比/%
Table 2 Comparison of segmentation accuracy of different models on SegTCRP/%
模型 | 图像 | 平均 | |||
---|---|---|---|---|---|
花 | 鸟 | 虫 | 印章 | ||
U-Net | 65.57 | 80.13 | 65.10 | 82.26 | 73.27 |
FCN | 54.23 | 57.93 | 58.06 | 76.46 | 61.82 |
PSPNet | 53.91 | 69.92 | 57.68 | 60.49 | 60.51 |
DeepLabV3+ | 64.11 | 74.16 | 72.59 | 76.43 | 71.82 |
SegFormer | 65.83 | 76.24 | 73.41 | 77.52 | 73.25 |
SAM | 83.51 | 83.26 | 89.44 | 87.52 | 85.93 |
SAM-LoRA | 90.38 | 86.52 | 92.32 | 95.10 | 91.08 |
阈值 | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 92.65 | 92.73 | 92.83 |
0 | 2 | 91.69 | 92.57 | 92.81 | |
2 | 2 | 91.74 | 92.37 | 92.46 | |
15 | 2 | 0 | 92.66 | 92.91 | 92.85 |
0 | 2 | 91.06 | 92.55 | 92.70 | |
2 | 2 | 92.72 | 92.17 | 92.31 | |
25 | 2 | 0 | 92.42 | 92.64 | 92.48 |
0 | 2 | 91.67 | 92.58 | 92.74 | |
2 | 2 | 91.93 | 92.03 | 91.81 |
表3 BPA-SAM不同点提示生成策略分割精度对比
Table 3 Comparison of segmentation accuracy of different point prompt generation strategies in BPA-SAM
阈值 | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 92.65 | 92.73 | 92.83 |
0 | 2 | 91.69 | 92.57 | 92.81 | |
2 | 2 | 91.74 | 92.37 | 92.46 | |
15 | 2 | 0 | 92.66 | 92.91 | 92.85 |
0 | 2 | 91.06 | 92.55 | 92.70 | |
2 | 2 | 92.72 | 92.17 | 92.31 | |
25 | 2 | 0 | 92.42 | 92.64 | 92.48 |
0 | 2 | 91.67 | 92.58 | 92.74 | |
2 | 2 | 91.93 | 92.03 | 91.81 |
图4 15%阈值时3种选取策略点选取结果((a)随机策略;(b)最大熵策略;(c)最远距离策略)
Fig. 4 Selection results of three selection strategies at a 15% threshold ((a) Random strategies; (b) Maximum entropy strategy; (c) The farthest distance strategy)
图5 15%阈值2个前景点时3种选取策略下BPA-SAM的分割结果((a)输入图像;(b) SAM-LoRA;(c)随机选取;(d)最大熵选取;(e)最远距离选取;(f)真实标注)
Fig. 5 Segmentation results of BPA-SAM under three selection strategies with two foreground points at a 15% threshold ((a) Input images; (b) SAM-LoRA; (c) Random selection; (d) Maximum entropy selection; (e) Farthest distance selection; (f) True labeling)
图6 不同阈值下最大距离策略选取结果((a) 5%阈值;(b) 15%阈值;(c) 25%阈值)
Fig. 6 Selection results of the maximum distance strategy under different thresholds ((a) The 5 per cent threshold; (b) The 15 per cent threshold; (c) The 25 per cent threshold)
阈值/% | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 87.63 | 86.98 | 86.31 |
0 | 2 | 81.96 | 79.28 | 80.39 | |
2 | 2 | 86.33 | 85.04 | 85.47 | |
15 | 2 | 0 | 85.34 | 86.95 | 86.87 |
0 | 2 | 81.02 | 78.76 | 82.63 | |
2 | 2 | 83.98 | 85.77 | 85.41 | |
25 | 2 | 0 | 86.06 | 86.53 | 85.76 |
0 | 2 | 82.62 | 79.48 | 82.69 | |
2 | 2 | 84.64 | 85.82 | 84.49 |
表4 SAM上点提示生成策略的分割精度对比
Table 4 Comparison of segmentation accuracy of point prompt generation strategies on SAM
阈值/% | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 87.63 | 86.98 | 86.31 |
0 | 2 | 81.96 | 79.28 | 80.39 | |
2 | 2 | 86.33 | 85.04 | 85.47 | |
15 | 2 | 0 | 85.34 | 86.95 | 86.87 |
0 | 2 | 81.02 | 78.76 | 82.63 | |
2 | 2 | 83.98 | 85.77 | 85.41 | |
25 | 2 | 0 | 86.06 | 86.53 | 85.76 |
0 | 2 | 82.62 | 79.48 | 82.69 | |
2 | 2 | 84.64 | 85.82 | 84.49 |
方法 | DSC |
---|---|
图像编码器 | 91.08 |
掩码解码器 | 86.37 |
图像编码器+掩码解码器 | 89.67 |
表5 不同微调方法的分割精度对比/%
Table 5 Comparison of segmentation accuracy of different fine-tuning methods/%
方法 | DSC |
---|---|
图像编码器 | 91.08 |
掩码解码器 | 86.37 |
图像编码器+掩码解码器 | 89.67 |
阈值 | 前景点 | 分割精度/% | ||
---|---|---|---|---|
随机 | 最大熵 | 最远距离 | ||
5 | 4 | 91.19 | 91.49 | 91.27 |
6 | 90.55 | 90.98 | 90.51 | |
8 | 89.94 | 90.96 | 90.23 | |
15 | 4 | 91.72 | 91.94 | 91.12 |
6 | 90.35 | 90.27 | 90.33 | |
8 | 89.06 | 90.12 | 90.14 |
表6 多前景提示点3种策略的分割精度对比
Table 6 Comparison of segmentation accuracy of three strategies for multiple foreground prompt points
阈值 | 前景点 | 分割精度/% | ||
---|---|---|---|---|
随机 | 最大熵 | 最远距离 | ||
5 | 4 | 91.19 | 91.49 | 91.27 |
6 | 90.55 | 90.98 | 90.51 | |
8 | 89.94 | 90.96 | 90.23 | |
15 | 4 | 91.72 | 91.94 | 91.12 |
6 | 90.35 | 90.27 | 90.33 | |
8 | 89.06 | 90.12 | 90.14 |
[1] | ZHANG W, KAM-KWAI W, CHEN Y T, et al. ScrollTimes: tracing the provenance of paintings as a window into history[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2981-2994. |
[2] | SHI H Z, XU D, HE K J, et al. Contrastive learning for a single historical painting’s blind super-resolution[J]. Visual Informatics, 2021, 5(4): 81-88. |
[3] | LI M, WANG Y, XU Y Q. Computing for Chinese cultural heritage[J]. Visual Informatics, 2022, 6(1): 1-13. |
[4] |
HUANG L L, PENG J F, ZHANG R M, et al. Learning deep representations for semantic image parsing: a comprehensive overview[J]. Frontiers of Computer Science, 2018, 12(5): 840-857.
DOI |
[5] | KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3992-4003. |
[6] | WANG X L, ZHANG X S, CAO Y, et al. SegGPT: towards segmenting everything in context[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 1130-1140. |
[7] | WU J D, JI W, LIU Y P, et al. Medical SAM adapter: adapting segment anything model for medical image segmentation[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2304.12620. |
[8] | CHENG J L, YE J, DENG Z Y, et al. SAM-med2D[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2308.16184. |
[9] | ZHANG K D, LIU D. Customized segment anything model for medical image segmentation[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2304.13785. |
[10] | SULTAN R I, LI C Y, ZHU H, et al. GeoSAM: fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2311.11319. |
[11] | CHEN K Y, LIU C Y, CHEN H, et al. RSPrompter: learning to prompt for remote sensing instance segmentation based on visual foundation model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4701117. |
[12] | WANG C M, WANG R J. Image-based color ink diffusion rendering[J]. IEEE Transactions on Visualization and Computer Graphics, 2007, 13(2): 235-246. |
[13] | CHEN X J, LIU Q H, CHEN Y H, et al. ColorNetVis: an interactive color network analysis system for exploring the color composition of traditional Chinese painting[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2916-2928. |
[14] | COHEN N, NEWMAN Y, SHAMIR A. Semantic segmentation in art paintings[J]. Computer Graphics Forum, 2022, 41(2): 261-275. |
[15] | JIANG S Q, HUANG Q M, YE Q X, et al. An effective method to detect and categorize digitized traditional Chinese paintings[J]. Pattern Recognition Letters, 2006, 27(7): 734-746. |
[16] | SUN M J, ZHANG D, WANG Z, et al. Monte Carlo convex hull model for classification of traditional Chinese paintings[J]. Neurocomputing, 2016, 171: 788-797. |
[17] | WANG Z, LU D Y, ZHANG D, et al. Fake modern Chinese painting identification based on spectral-spatial feature fusion on hyperspectral image[J]. Multidimensional Systems and Signal Processing, 2016, 27(4): 1031-1044. |
[18] | REN T H, LIU S L, ZENG A L, et al. Grounded SAM: assembling open-world models for diverse visual tasks[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2401.14159. |
[19] | WANG H X, VASU P K A, FAGHRI F, et al. SAM-CLIP: merging vision foundation models towards semantic and spatial understanding[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2024: 3635-3647. |
[20] | HU E J, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2106.09685. |
[21] |
MAO Y R, GE Y H, FAN Y J, et al. A survey on LoRA of large language models[J]. Frontiers of Computer Science, 2025, 19(7): 197605.
DOI |
[22] | 高峰, 聂婕, 黄磊, 等. 基于表现手法的国画分类方法研究[J]. 计算机学报, 2017, 40(12): 2871-2882. |
GAO F, NIE J, HUANG L, et al. Traditional Chinese painting classification based on painting techniques[J]. Chinese Journal of Computers, 2017, 40(12): 2871-2882 (in Chinese). | |
[23] | 盛家川, 李玉芝. 国画的艺术目标分割及深度学习与分类[J]. 中国图象图形学报, 2018, 23(8): 1193-1206. |
SHENG J C, LI Y Z. Learning artistic objects for improved classification of Chinese paintings[J]. Journal of Image and Graphics, 2018, 23(8): 1193-1206 (in Chinese). | |
[24] | HU Q Y, ZHOU W L, PENG X L, et al. DRANet: a semantic segmentation network for Chinese landscape paintings[J]. Digital Signal Processing, 2024, 147: 104427. |
[25] |
MA J, HE Y T, LI F F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15(1): 654.
DOI PMID |
[26] | CHEN T R, ZHU L Y, DING C T, et al. SAM fails to segment anything?-SAM-adapter: adapting SAM in underperformed scenes:camouflage, shadow, medical image segmentation, and more[EB/OL]. [2024-04-19]. https://arxiv.org/pdf/2304.09148. |
[27] | JULKA S, GRANITZER M. Knowledge distillation with segment anything (SAM) model for planetary geological mapping[C]// The 9th International Conference on Machine Learning, Optimization, and Data Science. Cham: Springer, 2023: 68-77. |
[28] | RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]// The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241. |
[29] | DAI H X, MA C, LIU Z L, et al. SAMAug: point prompt augmentation for segment anything model[EB/OL]. [2024-04-19]. https://arxiv.org/pdf/2307.01187. |
[30] | TANG F, DONG W M, MENG Y P, et al. Animated construction of Chinese brush paintings[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(12): 3019-3031. |
[31] | LAI Y C, CHEN B A, CHEN K W, et al. Data-driven NPR illustrations of natural flows in Chinese painting[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(12): 2535-2549. |
[32] |
陈逸天, 张玮, 谭思危, 等. 历史人物群体对比可视化[J]. 图学学报, 2023, 44(6): 1227-1238.
DOI |
CHEN Y T, ZHANG W, TAN S W, et al. Visualization comparison of historical figures cohorts[J]. Journal of Graphics, 2023, 44(6): 1227-1238 (in Chinese). | |
[33] |
王斯加, 封颖超杰, 朱航, 等. TCPVis: 基于谢赫六法的传统中国绘画画派可视分析系统[J]. 图学学报, 2024, 45(1): 209-218.
DOI |
WANG S J, FENG Y C J, ZHU H, et al. TCPVis: visual analysis system of traditional Chinese painting school based on six principles of Chinese painting[J]. Journal of Graphics, 2024, 45(1): 209-218 (in Chinese).
DOI |
|
[34] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:transformers for image recognition at scale[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2010.11929. |
[35] | XIE E Z, WANG W H, YU Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[EB/OL]. [2024-08-28]. https://proceedings.neurips.cc/paper_files/paper/2021/file/64f1f27bf1b4ec22924fd0acb550c235-Supplemental.pdf. |
[36] | EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The PASCAL visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136. |
[37] | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 3431-3440. |
[38] | ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6230-6239. |
[39] | CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 833-841. |
[1] | 翟永杰, 王璐瑶, 赵晓瑜, 胡哲东, 王乾铭, 王亚茹. 基于级联查询-位置关系的输电线路多金具检测方法[J]. 图学学报, 2025, 46(2): 288-299. |
[2] | 潘树焱, 刘立群. MSFAFuse:基于多尺度特征信息与注意力机制的SAR和可见光图像融合模型[J]. 图学学报, 2025, 46(2): 300-311. |
[3] | 孙禾衣, 李艺潇, 田希, 张松海. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344. |
[4] | 陈瑞启, 刘晓飞, 万峰, 侯鹏, 沈金屹. 数字孪生驱动的卫星太阳翼展开测试仿真与预测方法[J]. 图学学报, 2025, 46(2): 449-458. |
[5] | 汪颜, 张牧雨, 刘秀珍. 基于深度学习的电影海报视觉互动意义评价方法[J]. 图学学报, 2025, 46(1): 221-232. |
[6] | 刘冀辰, 李金星, 吴佳, 张威, 齐宇诺, 周国亮. 大模型技术在电力行业的应用展望[J]. 图学学报, 2024, 45(6): 1132-1144. |
[7] | 李琼, 考月英, 张莹, 徐沛. 面向无人机航拍图像的目标检测研究综述[J]. 图学学报, 2024, 45(6): 1145-1164. |
[8] | 刘灿锋, 孙浩, 东辉. 结合Transformer与Kolmogorov Arnold网络的分子扩增时序预测研究[J]. 图学学报, 2024, 45(6): 1256-1265. |
[9] | 宋思程, 陈辰, 李晨辉, 王长波. 基于密度图多目标追踪的时空数据可视化[J]. 图学学报, 2024, 45(6): 1289-1300. |
[10] | 王宗继, 刘云飞, 陆峰. Cloud Sphere: 一种基于渐进式变形自编码的三维模型表征方法[J]. 图学学报, 2024, 45(6): 1375-1388. |
[11] | 许丹丹, 崔勇, 张世倩, 刘雨聪, 林予松. 优化医学影像三维渲染可视化效果:技术综述[J]. 图学学报, 2024, 45(5): 879-891. |
[12] | 胡凤阔, 叶兰, 谭显峰, 张钦展, 胡志新, 方清, 王磊, 满孝锋. 一种基于改进YOLOv8的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900. |
[13] | 刘义艳, 郝婷楠, 贺晨, 常英杰. 基于DBBR-YOLO的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921. |
[14] | 吴沛宸, 袁立宁, 胡皓, 刘钊, 郭放. 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929. |
[15] | 翟永杰, 李佳蔚, 陈年昊, 王乾铭, 王新颖. 融合改进Transformer的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||