Journal of Graphics ›› 2025, Vol. 46 ›› Issue (2): 322-331.DOI: 10.11996/JG.j.2095-302X.2025020322
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
ZHANG Tiansheng1(), ZHU Minfeng2(
), REN Yiwen3, WANG Chenhan3, ZHANG Lidong3, ZHANG Wei4, CHEN Wei1
Received:
2024-10-28
Accepted:
2024-12-13
Online:
2025-04-30
Published:
2025-04-24
Contact:
ZHU Minfeng
About author:
First author contact:ZHANG Tiansheng (1998-), master student. His main research interest covers computer vision. E-mail:22221302@zju.edu.cn
Supported by:
CLC Number:
ZHANG Tiansheng, ZHU Minfeng, REN Yiwen, WANG Chenhan, ZHANG Lidong, ZHANG Wei, CHEN Wei. BPA-SAM: box prompt augmented SAM for traditional Chinese realistic painting[J]. Journal of Graphics, 2025, 46(2): 322-331.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025020322
类别 | 数量/个 |
---|---|
花 | 1294 |
鸟 | 435 |
鱼 | 31 |
虫 | 214 |
印章 | 1299 |
总计 | 3273 |
Table 1 Number of foreground categories instances in SegTCRP
类别 | 数量/个 |
---|---|
花 | 1294 |
鸟 | 435 |
鱼 | 31 |
虫 | 214 |
印章 | 1299 |
总计 | 3273 |
Fig. 3 Experimental results of seven algorithms on SegTCRP ((a) Input images; (b) U-Net; (c) FCN; (d) PSPNet; (e) DeepLabV3+; (f) SegFormer; (g) SAM; (h) SAM-LoRA; (i) True labeling)
模型 | 图像 | 平均 | |||
---|---|---|---|---|---|
花 | 鸟 | 虫 | 印章 | ||
U-Net | 65.57 | 80.13 | 65.10 | 82.26 | 73.27 |
FCN | 54.23 | 57.93 | 58.06 | 76.46 | 61.82 |
PSPNet | 53.91 | 69.92 | 57.68 | 60.49 | 60.51 |
DeepLabV3+ | 64.11 | 74.16 | 72.59 | 76.43 | 71.82 |
SegFormer | 65.83 | 76.24 | 73.41 | 77.52 | 73.25 |
SAM | 83.51 | 83.26 | 89.44 | 87.52 | 85.93 |
SAM-LoRA | 90.38 | 86.52 | 92.32 | 95.10 | 91.08 |
Table 2 Comparison of segmentation accuracy of different models on SegTCRP/%
模型 | 图像 | 平均 | |||
---|---|---|---|---|---|
花 | 鸟 | 虫 | 印章 | ||
U-Net | 65.57 | 80.13 | 65.10 | 82.26 | 73.27 |
FCN | 54.23 | 57.93 | 58.06 | 76.46 | 61.82 |
PSPNet | 53.91 | 69.92 | 57.68 | 60.49 | 60.51 |
DeepLabV3+ | 64.11 | 74.16 | 72.59 | 76.43 | 71.82 |
SegFormer | 65.83 | 76.24 | 73.41 | 77.52 | 73.25 |
SAM | 83.51 | 83.26 | 89.44 | 87.52 | 85.93 |
SAM-LoRA | 90.38 | 86.52 | 92.32 | 95.10 | 91.08 |
阈值 | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 92.65 | 92.73 | 92.83 |
0 | 2 | 91.69 | 92.57 | 92.81 | |
2 | 2 | 91.74 | 92.37 | 92.46 | |
15 | 2 | 0 | 92.66 | 92.91 | 92.85 |
0 | 2 | 91.06 | 92.55 | 92.70 | |
2 | 2 | 92.72 | 92.17 | 92.31 | |
25 | 2 | 0 | 92.42 | 92.64 | 92.48 |
0 | 2 | 91.67 | 92.58 | 92.74 | |
2 | 2 | 91.93 | 92.03 | 91.81 |
Table 3 Comparison of segmentation accuracy of different point prompt generation strategies in BPA-SAM
阈值 | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 92.65 | 92.73 | 92.83 |
0 | 2 | 91.69 | 92.57 | 92.81 | |
2 | 2 | 91.74 | 92.37 | 92.46 | |
15 | 2 | 0 | 92.66 | 92.91 | 92.85 |
0 | 2 | 91.06 | 92.55 | 92.70 | |
2 | 2 | 92.72 | 92.17 | 92.31 | |
25 | 2 | 0 | 92.42 | 92.64 | 92.48 |
0 | 2 | 91.67 | 92.58 | 92.74 | |
2 | 2 | 91.93 | 92.03 | 91.81 |
Fig. 4 Selection results of three selection strategies at a 15% threshold ((a) Random strategies; (b) Maximum entropy strategy; (c) The farthest distance strategy)
Fig. 5 Segmentation results of BPA-SAM under three selection strategies with two foreground points at a 15% threshold ((a) Input images; (b) SAM-LoRA; (c) Random selection; (d) Maximum entropy selection; (e) Farthest distance selection; (f) True labeling)
Fig. 6 Selection results of the maximum distance strategy under different thresholds ((a) The 5 per cent threshold; (b) The 15 per cent threshold; (c) The 25 per cent threshold)
阈值/% | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 87.63 | 86.98 | 86.31 |
0 | 2 | 81.96 | 79.28 | 80.39 | |
2 | 2 | 86.33 | 85.04 | 85.47 | |
15 | 2 | 0 | 85.34 | 86.95 | 86.87 |
0 | 2 | 81.02 | 78.76 | 82.63 | |
2 | 2 | 83.98 | 85.77 | 85.41 | |
25 | 2 | 0 | 86.06 | 86.53 | 85.76 |
0 | 2 | 82.62 | 79.48 | 82.69 | |
2 | 2 | 84.64 | 85.82 | 84.49 |
Table 4 Comparison of segmentation accuracy of point prompt generation strategies on SAM
阈值/% | 前景点 | 背景点 | 分割精度/% | ||
---|---|---|---|---|---|
随机 | 最大熵 | 最远距离 | |||
5 | 2 | 0 | 87.63 | 86.98 | 86.31 |
0 | 2 | 81.96 | 79.28 | 80.39 | |
2 | 2 | 86.33 | 85.04 | 85.47 | |
15 | 2 | 0 | 85.34 | 86.95 | 86.87 |
0 | 2 | 81.02 | 78.76 | 82.63 | |
2 | 2 | 83.98 | 85.77 | 85.41 | |
25 | 2 | 0 | 86.06 | 86.53 | 85.76 |
0 | 2 | 82.62 | 79.48 | 82.69 | |
2 | 2 | 84.64 | 85.82 | 84.49 |
方法 | DSC |
---|---|
图像编码器 | 91.08 |
掩码解码器 | 86.37 |
图像编码器+掩码解码器 | 89.67 |
Table 5 Comparison of segmentation accuracy of different fine-tuning methods/%
方法 | DSC |
---|---|
图像编码器 | 91.08 |
掩码解码器 | 86.37 |
图像编码器+掩码解码器 | 89.67 |
阈值 | 前景点 | 分割精度/% | ||
---|---|---|---|---|
随机 | 最大熵 | 最远距离 | ||
5 | 4 | 91.19 | 91.49 | 91.27 |
6 | 90.55 | 90.98 | 90.51 | |
8 | 89.94 | 90.96 | 90.23 | |
15 | 4 | 91.72 | 91.94 | 91.12 |
6 | 90.35 | 90.27 | 90.33 | |
8 | 89.06 | 90.12 | 90.14 |
Table 6 Comparison of segmentation accuracy of three strategies for multiple foreground prompt points
阈值 | 前景点 | 分割精度/% | ||
---|---|---|---|---|
随机 | 最大熵 | 最远距离 | ||
5 | 4 | 91.19 | 91.49 | 91.27 |
6 | 90.55 | 90.98 | 90.51 | |
8 | 89.94 | 90.96 | 90.23 | |
15 | 4 | 91.72 | 91.94 | 91.12 |
6 | 90.35 | 90.27 | 90.33 | |
8 | 89.06 | 90.12 | 90.14 |
[1] | ZHANG W, KAM-KWAI W, CHEN Y T, et al. ScrollTimes: tracing the provenance of paintings as a window into history[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2981-2994. |
[2] | SHI H Z, XU D, HE K J, et al. Contrastive learning for a single historical painting’s blind super-resolution[J]. Visual Informatics, 2021, 5(4): 81-88. |
[3] | LI M, WANG Y, XU Y Q. Computing for Chinese cultural heritage[J]. Visual Informatics, 2022, 6(1): 1-13. |
[4] |
HUANG L L, PENG J F, ZHANG R M, et al. Learning deep representations for semantic image parsing: a comprehensive overview[J]. Frontiers of Computer Science, 2018, 12(5): 840-857.
DOI |
[5] | KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3992-4003. |
[6] | WANG X L, ZHANG X S, CAO Y, et al. SegGPT: towards segmenting everything in context[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 1130-1140. |
[7] | WU J D, JI W, LIU Y P, et al. Medical SAM adapter: adapting segment anything model for medical image segmentation[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2304.12620. |
[8] | CHENG J L, YE J, DENG Z Y, et al. SAM-med2D[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2308.16184. |
[9] | ZHANG K D, LIU D. Customized segment anything model for medical image segmentation[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2304.13785. |
[10] | SULTAN R I, LI C Y, ZHU H, et al. GeoSAM: fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2311.11319. |
[11] | CHEN K Y, LIU C Y, CHEN H, et al. RSPrompter: learning to prompt for remote sensing instance segmentation based on visual foundation model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4701117. |
[12] | WANG C M, WANG R J. Image-based color ink diffusion rendering[J]. IEEE Transactions on Visualization and Computer Graphics, 2007, 13(2): 235-246. |
[13] | CHEN X J, LIU Q H, CHEN Y H, et al. ColorNetVis: an interactive color network analysis system for exploring the color composition of traditional Chinese painting[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2916-2928. |
[14] | COHEN N, NEWMAN Y, SHAMIR A. Semantic segmentation in art paintings[J]. Computer Graphics Forum, 2022, 41(2): 261-275. |
[15] | JIANG S Q, HUANG Q M, YE Q X, et al. An effective method to detect and categorize digitized traditional Chinese paintings[J]. Pattern Recognition Letters, 2006, 27(7): 734-746. |
[16] | SUN M J, ZHANG D, WANG Z, et al. Monte Carlo convex hull model for classification of traditional Chinese paintings[J]. Neurocomputing, 2016, 171: 788-797. |
[17] | WANG Z, LU D Y, ZHANG D, et al. Fake modern Chinese painting identification based on spectral-spatial feature fusion on hyperspectral image[J]. Multidimensional Systems and Signal Processing, 2016, 27(4): 1031-1044. |
[18] | REN T H, LIU S L, ZENG A L, et al. Grounded SAM: assembling open-world models for diverse visual tasks[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2401.14159. |
[19] | WANG H X, VASU P K A, FAGHRI F, et al. SAM-CLIP: merging vision foundation models towards semantic and spatial understanding[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2024: 3635-3647. |
[20] | HU E J, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2106.09685. |
[21] |
MAO Y R, GE Y H, FAN Y J, et al. A survey on LoRA of large language models[J]. Frontiers of Computer Science, 2025, 19(7): 197605.
DOI |
[22] | 高峰, 聂婕, 黄磊, 等. 基于表现手法的国画分类方法研究[J]. 计算机学报, 2017, 40(12): 2871-2882. |
GAO F, NIE J, HUANG L, et al. Traditional Chinese painting classification based on painting techniques[J]. Chinese Journal of Computers, 2017, 40(12): 2871-2882 (in Chinese). | |
[23] | 盛家川, 李玉芝. 国画的艺术目标分割及深度学习与分类[J]. 中国图象图形学报, 2018, 23(8): 1193-1206. |
SHENG J C, LI Y Z. Learning artistic objects for improved classification of Chinese paintings[J]. Journal of Image and Graphics, 2018, 23(8): 1193-1206 (in Chinese). | |
[24] | HU Q Y, ZHOU W L, PENG X L, et al. DRANet: a semantic segmentation network for Chinese landscape paintings[J]. Digital Signal Processing, 2024, 147: 104427. |
[25] |
MA J, HE Y T, LI F F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15(1): 654.
DOI PMID |
[26] | CHEN T R, ZHU L Y, DING C T, et al. SAM fails to segment anything?-SAM-adapter: adapting SAM in underperformed scenes:camouflage, shadow, medical image segmentation, and more[EB/OL]. [2024-04-19]. https://arxiv.org/pdf/2304.09148. |
[27] | JULKA S, GRANITZER M. Knowledge distillation with segment anything (SAM) model for planetary geological mapping[C]// The 9th International Conference on Machine Learning, Optimization, and Data Science. Cham: Springer, 2023: 68-77. |
[28] | RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]// The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241. |
[29] | DAI H X, MA C, LIU Z L, et al. SAMAug: point prompt augmentation for segment anything model[EB/OL]. [2024-04-19]. https://arxiv.org/pdf/2307.01187. |
[30] | TANG F, DONG W M, MENG Y P, et al. Animated construction of Chinese brush paintings[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(12): 3019-3031. |
[31] | LAI Y C, CHEN B A, CHEN K W, et al. Data-driven NPR illustrations of natural flows in Chinese painting[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(12): 2535-2549. |
[32] |
陈逸天, 张玮, 谭思危, 等. 历史人物群体对比可视化[J]. 图学学报, 2023, 44(6): 1227-1238.
DOI |
CHEN Y T, ZHANG W, TAN S W, et al. Visualization comparison of historical figures cohorts[J]. Journal of Graphics, 2023, 44(6): 1227-1238 (in Chinese). | |
[33] |
王斯加, 封颖超杰, 朱航, 等. TCPVis: 基于谢赫六法的传统中国绘画画派可视分析系统[J]. 图学学报, 2024, 45(1): 209-218.
DOI |
WANG S J, FENG Y C J, ZHU H, et al. TCPVis: visual analysis system of traditional Chinese painting school based on six principles of Chinese painting[J]. Journal of Graphics, 2024, 45(1): 209-218 (in Chinese).
DOI |
|
[34] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:transformers for image recognition at scale[EB/OL]. [2024-04-19]. https://arxiv.org/abs/2010.11929. |
[35] | XIE E Z, WANG W H, YU Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[EB/OL]. [2024-08-28]. https://proceedings.neurips.cc/paper_files/paper/2021/file/64f1f27bf1b4ec22924fd0acb550c235-Supplemental.pdf. |
[36] | EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The PASCAL visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136. |
[37] | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 3431-3440. |
[38] | ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6230-6239. |
[39] | CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 833-841. |
[1] | ZHAI Yongjie, WANG Luyao, ZHAO Xiaoyu, HU Zhedong, WANG Qianming, WANG Yaru. Multi-fitting detection for transmission lines based on a cascade query-position relationship method [J]. Journal of Graphics, 2025, 46(2): 288-299. |
[2] | PAN Shuyan, LIU Liqun. MSFAFuse: sar and optical image fusion model based on multi-scale feature information and attention mechanism [J]. Journal of Graphics, 2025, 46(2): 300-311. |
[3] | SUN Heyi, LI Yixiao, TIAN Xi, ZHANG Songhai. Image to 3D vase generation technology combining procedural content generation and diffusion models [J]. Journal of Graphics, 2025, 46(2): 332-344. |
[4] | CHEN Ruiqi, LIU Xiaofei, WAN Feng, HOU Peng, SHEN Jinyi. Simulation and prediction method of satellite solar wing deployment test driven by digital twin [J]. Journal of Graphics, 2025, 46(2): 449-458. |
[5] | WANG Yan, ZHANG Muyu, LIU Xiuzhen. Visual interactive meaning evaluation method of movie posters based on deep learning [J]. Journal of Graphics, 2025, 46(1): 221-232. |
[6] | LIU Jichen, LI Jinxing, WU Jia, ZHANG Wei, QI Yunuo, ZHOU Guoliang. Prospects for the application of large models technology in the power industry [J]. Journal of Graphics, 2024, 45(6): 1132-1144. |
[7] | LI Qiong, KAO Yueying, ZHANG Ying, XU Pei. Review on object detection in UAV aerial images [J]. Journal of Graphics, 2024, 45(6): 1145-1164. |
[8] | LIU Canfeng, SUN Hao, DONG Hui. Molecular amplification time series prediction research combining Transformer with Kolmogorov-Arnold network [J]. Journal of Graphics, 2024, 45(6): 1256-1265. |
[9] | SONG Sicheng, CHEN Chen, LI Chenhui, WANG Changbo. Spatiotemporal data visualization based on density map multi-target tracking [J]. Journal of Graphics, 2024, 45(6): 1289-1300. |
[10] | WANG Zongji, LIU Yunfei, LU Feng. Cloud Sphere: a 3D shape representation method via progressive deformation [J]. Journal of Graphics, 2024, 45(6): 1375-1388. |
[11] | XU Dandan, CUI Yong, ZHANG Shiqian, LIU Yucong, LIN Yusong. Optimizing the visual effects of 3D rendering in medical imaging: a technical review [J]. Journal of Graphics, 2024, 45(5): 879-891. |
[12] | HU Fengkuo, YE Lan, TAN Xianfeng, ZHANG Qinzhan, HU Zhixin, FANG Qing, WANG Lei, MAN Xiaofeng. A refined YOLOv8-based algorithm for lightweight pavement disease detection [J]. Journal of Graphics, 2024, 45(5): 892-900. |
[13] | LIU Yiyan, HAO Tingnan, HE Chen, CHANG Yingjie. Photovoltaic cell surface defect detection based on DBBR-YOLO [J]. Journal of Graphics, 2024, 45(5): 913-921. |
[14] | WU Peichen, YUAN Lining, HU Hao, LIU Zhao, GUO Fang. Video anomaly detection based on attention feature fusion [J]. Journal of Graphics, 2024, 45(5): 922-929. |
[15] | ZHAI Yongjie, LI Jiawei, CHEN Nianhao, WANG Qianming, WANG Xinying. The vehicle parts detection method enhanced with Transformer integration [J]. Journal of Graphics, 2024, 45(5): 930-940. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||