CLIP-based semantic offset transferable attacks on 3D point clouds

doi:10.11996/JG.j.2095-302X.2025030588

Abstract

Abstract:

Deep learning-based 3D point cloud understanding has received increasing attention in various applications such as autonomous driving, robotics, surveillance, etc., and the study of adversarial attacks on point cloud deep learning models helps to evaluate and improve their adversarial robustness. However, most of the existing attack methods are aimed at white-box attacks, generating adversarial samples that have very low success rate and are easily defensible against transferable attacks on black-box models with unknown model parameters. These methods only consider optimization in the geometric space to mislead specific classifiers and fail to essentially change the deep intrinsic semantic structure of point cloud data, resulting in their limited ability to transferable attacks under different classifiers. To address these issues, the proposed algorithm leveraged the rich semantic comprehension capability of large multimodal models to incorporate the semantic information of the point clouds into the attack, thereby ensuring that the adversarial samples diverged significantly from the original semantic attributes to a remarkable extent to enhance transferability. In addition, considering that the current adversarial samples with high attack transferability often exhibited insufficient imperceptibility, the algorithm integrated the above semantic adversarial attack into the spectral domain space, achieving a delicate balance between transferability and imperceptibility. Extensive evaluations demonstrated the 3D CLIP-based semantic offset attack (3DCLAT) can significantly improve the transferability of the adversarial samples and is more robust to defense methods.

Key words: CLIP, point cloud, adversarial attack, attack transferability, spectral domains

CLC Number:

MA Yang, HUANG Lujie, PENG Weilong, WU Zhize, TANG Keke, FANG Meie. CLIP-based semantic offset transferable attacks on 3D point clouds[J]. Journal of Graphics, 2025, 46(3): 588-601.

Figures/Tables 13

References 60

[1]	AKHTAR N, MIAN A. Threat of adversarial attacks on deep learning in computer vision: a survey[J]. IEEE Access, 2018, 6: 14410-14430.
[2]	ZHANG W E, SHENG Q Z, ALHAZMI A, et al. Adversarial attacks on deep-learning models in natural language processing: a survey[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 11(3): 24.
[3]	TSAI T, YANG K C, HO T Y, et al. Robust adversarial objects against deep learning models[C]// The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 954-962.
[4]	XIANG C, QI C R, LI B. Generating 3D adversarial point clouds[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9128-9136.
[5]	ZHOU H, CHEN D D, LIAO J, et al. LG-GAN: label guided adversarial network for flexible targeted attack of point cloud based deep networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10353-10362.
[6]	HAMDI A, ROJAS S, THABET A, et al. AdvPC: transferable adversarial perturbations on 3d point clouds[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 241-257.
[7]	MA C C, MENG W L, WU B Y, et al. Efficient joint gradient based attack against SOR defense for 3D point cloud classification[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1819-1827.
[8]	HU Q J, LIU D Z, HU W. Exploring the devil in graph spectral domain for 3D point cloud attacks[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 229-248.
[9]	TANG K K, WU J P, PENG W L, et al. Deep manifold attack on point clouds via parameter plane stretching[C]// The 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 2420-2428.
[10]	XU H Y, YAN M, LI C L, et al. E2E-VLP:end-to-end vision-language pre-training enhanced by visual learning[EB/OL]. (2021-06-03) [2024-07-15]https://arxiv.org/abs/2106.01804v2.
[11]	YAO L W, HUANG R H, HOU L, et al. FILIP: fine-grained interactive language-image pre-training[EB/OL]. (2021-11-09) [2024-07-15]https://arxiv.org/abs/2111.07783v1.
[12]	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. (2021-02-26) [2024-07-15]https://arxiv.org/abs/2103.00020?file=2103.00020.
[13]	MAO C Z, GENG S, YANG J F, et al. Understanding zero-shot adversarial robustness for large-scale models[EB/OL]. (2022-12-14) [2024-07-15]https://arxiv.org/abs/2212.07016?context=cs.
[14]	WICKER M, KWIATKOWSKA M. Robustness of 3D deep learning in an adversarial setting[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 11759-11767.
[15]	YANG J C, ZHANG Q, FANG R Y, et al. Adversarial attack and defense on point sets[EB/OL]. (2019-02-28) [2024-07-15]https://arxiv.org/abs/1902.10899.
[16]	ZHENG T H, CHEN C Y, YUAN J S, et al. PointCloud saliency maps[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1598-1606.
[17]	ZHAO Y, WU Y W, CHEN C H, et al. On isometry robustness of deep 3D point cloud models under adversarial attacks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1198-1207.
[18]	LIU D Z, HU W. Imperceptible transfer attack and defense on 3D point cloud classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023: 45(4): 4727-4746.
[19]	GUO M H, CAI J X, LIU Z N, et al. PCT: point cloud transformer[J]. Computational Visual Media, 2021, 7: 187-199.
[20]	ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 16239-16248.
[21]	YU X M, TANG L L, RAO Y M, et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 19291-19300.
[22]	ZHANG C, WAN H C, SHEN X Y, et al. PVT: point-voxel transformer for point cloud learning[J]. International Journal of Intelligent Systems, 2022, 37(12): 11985-12008.
[23]	WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 146.
[24]	UY M A, PHAM Q H, HUA B S, et al. Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1588-1597.
[25]	CHANG A X, FUNKHOUSER T, GUIBAS L, et al. ShapeNet:an information-rich 3D model repository[EB/OL]. [2024-07-30]https://arxiv.org/abs/1512.03012.
[26]	CHARLES R Q, SU H, KAICHUN M, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[27]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5105-5114.
[28]	WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1912-1920.
[29]	GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[C]// The 3rd International Conference on Learning Representations. Washington DC, 2015: 1-11.
[30]	MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[EB/OL]. (2017-06-19) [2024-07-15]https://arxiv.org/abs/1706.06083.
[31]	CARLINI N, WAGNER D. Towards evaluating the robustness of neural networks[C]// 2017 IEEE Symposium on Security and Privacy. New York: IEEE Press, 2017: 39-57.
[32]	LIU D, YU R, SU H. Extending adversarial attacks and defenses to deep 3D point cloud classifiers[C]// 2019 IEEE International Conference on Image Processing. New York: IEEE Press, 2019: 2279-2283.
[33]	ZHANG Y, LIANG G B, SALEM T, et al. Defense-PointNet: protecting PointNet against adversarial attacks[C]// 2019 IEEE International Conference on Big Data. New York: IEEE Press, 2019: 5654-5660.
[34]	LIU D Z, HU W, LI X. Point cloud attacks in graph spectral domain: when 3D geometry meets graph signal processing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 3079-3095.
[35]	TAO Y B, LIU D Z, ZHOU P, et al. 3DHacker: spectrum-based decision boundary generation for hard-label 3D point cloud attack[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 14294-14304.
[36]	LIU B B, ZHANG J L, ZHU J H. Boosting 3D adversarial attacks with attacking on frequency[J]. IEEE Access, 2022, 10: 50974-50984.
[37]	HEGDE D, VALANARASU J, PATEL V. CLIP goes 3D: leveraging prompt tuning for language grounded 3d recognition[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 2020-2030.
[38]	HUANG T Y, DONG B W, YANG Y H, et al. CLIP2Point: transfer clip to point cloud classification with image-depth pre-training[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 22100-22110.
[39]	LIU M H, SHI R X, KUANG K M, et al. OpenShape:scaling up 3D shape representation towards open-world understanding[EB/OL]. (2023-05-18) [2024-07-15]https://arxiv.org/abs/2305.10764.
[40]	SANGHI A, JAYARAMAN P K, RAMPINI A, et al. Sketch-a -shape:zero-shot sketch-to-3d shape generation[EB/OL]. (2023-07-08) [2024-07-15]https://arxiv.org/abs/2307.03869.
[41]	WANG H W, TANG J J, JI J Y, et al. Beyond first impressions: integrating joint multi-modal cues for comprehensive 3D representation[C]// The 31st ACM International Conference on Multimedia. New York: ACM, 2023: 3403-3414.
[42]	XUE L, GAO M F, XING C, et al. ULIP: learning a unified representation of language, images, and point clouds for 3D understanding[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 1179-1189.
[43]	ZENG Y H, JIANG C H, MAO J G, et al. CLIP²: contrastive language-image-point pretraining from real-world point cloud data[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 15244-15253.
[44]	ZHANG J B, DONG R P, MA K S. CLIP-FO3D: learning free open-world 3D scene representations from 2D dense CLIP[C]// 2023 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2023: 2040-2051.
[45]	ZHANG R R, GUO Z Y, ZHANG W, et al. PointCLIP: point cloud understanding by CLIP[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8542-8552.
[46]	ZHU X Y, ZHANG R R, HE B W, et al. PointCLIP V2: prompting clip and GPT for powerful 3D open-world learning[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 2639-2650.
[47]	XUE L, YU N, ZHANG S, et al. ULIP-2: towards scalable multimodal pre-training for 3D understanding[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 27081-27091.
[48]	ZHOU H, CHEN K J, ZHANG W M, et al. DUP-Net: denoiser and upsampler network for 3D adversarial point clouds defense[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1961-1970.
[49]	PÉREZ C S, PÉREZ J C, ALFARRA M, et al. 3Deformrs: certifying spatial deformations on point clouds[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 15148-15158.
[50]	LI K D, ZHANG Z M, ZHONG C C, et al. Robust structured declarative classifiers for 3d point clouds: defending adversarial attacks with implicit gradients[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 15273-15283.
[51]	SHEN G, KIM W S, NARANG S K, et al. Edge-adaptive transforms for efficient depth map coding[C]// The 28th Picture Coding Symposium. New York: IEEE Press, 2010: 566-569.
[52]	HU W, CHEUNG G, LI X, et al. Depth map compression using multi-resolution graph-based transform for depth-image-based rendering[C]// 2012 19th IEEE International Conference on Image Processing. New York: IEEE Press, 2012: 1297-1300.
[53]	HU W, CHEUNG G, ORTEGA A, et al. Multiresolution graph Fourier transform for compression of piecewise smooth images[J]. IEEE Transactions on Image Processing, 2015, 24(1): 419-433. DOI PMID
[54]	HU W, CHEUNG G, ORTEGA A, et al. Intra-prediction and generalized graph Fourier transform for image coding[J]. IEEE Signal Processing Letters, 2015, 22(11): 1913-1917.
[55]	SHUMAN D I, NARANG S K, FROSSARD P, et al. The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains[J]. IEEE Signal Processing Magazine, 2013, 30(3): 83-98.
[56]	YI L, KIM V G, CEYLAN D, et al. A scalable active framework for region annotation in 3D shape collections[J]. ACM Transactions on Graphics, 2016, 35(6): 210.
[57]	WU W X, QI Z G, LI F X. PointConv: deep convolutional networks on 3D point clouds[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9613-9622.
[58]	KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2014-12-22) [2024-07-15]https://arxiv.org/abs/1412.6980.
[59]	FAN H Q, SU H, GUIBAS L. A point set generation network for 3D object reconstruction from a single image[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2463-2471.
[60]	TAHA A A, HANBURY A. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool[J]. BMC Medical Imaging, 2015, 15: 29. DOI PMID

受害者模型	攻击方法	ModelNet40				ShapeNet Part
受害者模型	攻击方法	PointNet	PointNet++	PointConv	DGCNN	PointNet	PointNet++	PointConv	DGCNN
PointNet	3D-Adv	100.00	5.01	2.06	4.22	100.00	4.12	1.33	3.18
	AdvPC	100.00	30.40	13.60	14.80	100.00	29.80	13.10	14.20
	AOF	99.90	57.20	36.30	29.40	100.00	53.50	33.20	25.50
	Mani-ADV	100.00	65.30	40.10	30.60	100.00	63.40	35.80	26.90
	本文	100.00	67.40	43.20	35.10	100.00	64.60	40.80	30.50
PointNet++	3D-Adv	1.54	100.00	4.77	6.49	1.28	100.00	3.41	5.22
	AdvPC	4.81	100.00	28.20	18.90	2.65	100.00	23.50	16.90
	AOF	7.89	99.60	48.40	33.30	6.62	100.00	45.20	29.30
	Mani-ADV	17.70	100.00	59.30	75.10	15.40	100.00	57.90	74.80
	本文	19.80	100.00	63.40	80.50	17.70	100.00	60.20	77.30
PointConv	3D-Adv	1.45	6.58	100.00	3.02	1.32	5.35	100.00	2.33
	AdvPC	5.13	34.20	100.00	18.00	4.67	33.20	100.00	17.50
	AOF	6.85	50.20	99.90	25.50	6.54	49.30	100.00	25.30
	Mani-ADV	16.90	57.50	100.00	29.40	14.30	50.20	100.00	26.40
	本文	19.20	61.90	100.00	34.10	16.70	55.60	100.00	30.20
DGCNN	3D-Adv	0.91	6.63	5.21	100.00	0.72	6.54	5.02	100.00
	AdvPC	7.44	60.00	44.50	93.70	7.03	53.50	43.30	90.30
	AOF	14.00	69.60	58.40	96.70	12.60	66.20	55.70	93.70
	Mani-ADV	48.30	71.60	60.20	100.00	21.00	68.60	57.90	100.00
	本文	55.20	77.90	62.80	98.80	50.40	73.80	59.10	100.00

受害者模型	攻击方法	ModelNet40				ShapeNet Part
受害者模型	攻击方法	PointNet	PointNet++	PointConv	DGCNN	PointNet	PointNet++	PointConv	DGCNN
PointNet	3D-Adv	100.00	5.01	2.06	4.22	100.00	4.12	1.33	3.18
	AdvPC	100.00	30.40	13.60	14.80	100.00	29.80	13.10	14.20
	AOF	99.90	57.20	36.30	29.40	100.00	53.50	33.20	25.50
	Mani-ADV	100.00	65.30	40.10	30.60	100.00	63.40	35.80	26.90
	本文	100.00	67.40	43.20	35.10	100.00	64.60	40.80	30.50
PointNet++	3D-Adv	1.54	100.00	4.77	6.49	1.28	100.00	3.41	5.22
	AdvPC	4.81	100.00	28.20	18.90	2.65	100.00	23.50	16.90
	AOF	7.89	99.60	48.40	33.30	6.62	100.00	45.20	29.30
	Mani-ADV	17.70	100.00	59.30	75.10	15.40	100.00	57.90	74.80
	本文	19.80	100.00	63.40	80.50	17.70	100.00	60.20	77.30
PointConv	3D-Adv	1.45	6.58	100.00	3.02	1.32	5.35	100.00	2.33
	AdvPC	5.13	34.20	100.00	18.00	4.67	33.20	100.00	17.50
	AOF	6.85	50.20	99.90	25.50	6.54	49.30	100.00	25.30
	Mani-ADV	16.90	57.50	100.00	29.40	14.30	50.20	100.00	26.40
	本文	19.20	61.90	100.00	34.10	16.70	55.60	100.00	30.20
DGCNN	3D-Adv	0.91	6.63	5.21	100.00	0.72	6.54	5.02	100.00
	AdvPC	7.44	60.00	44.50	93.70	7.03	53.50	43.30	90.30
	AOF	14.00	69.60	58.40	96.70	12.60	66.20	55.70	93.70
	Mani-ADV	48.30	71.60	60.20	100.00	21.00	68.60	57.90	100.00
	本文	55.20	77.90	62.80	98.80	50.40	73.80	59.10	100.00

类别	飞机	汽车	圆锥	钢琴	人	花盆	吉他	杯子	床	桌子	浴缸
对照	44.03	43.77	44.27	40.43	43.05	37.85	43.57	36.50	40.84	42.14	38.12
本文	71.83	65.33	58.36	61.58	66.81	63.79	64.75	47.95	53.36	54.57	47.26

类别	飞机	汽车	圆锥	钢琴	人	花盆	吉他	杯子	床	桌子	浴缸
对照	44.03	43.77	44.27	40.43	43.05	37.85	43.57	36.50	40.84	42.14	38.12
本文	71.83	65.33	58.36	61.58	66.81	63.79	64.75	47.95	53.36	54.57	47.26

受害者模型	攻击方法	ModelNet40				ShapeNet Part
受害者模型	攻击方法	无防御	SRS	SOR	DUP-Net	无防御	SRS	SOR	DUP-Net
PointNet	3D-Adv	100.0	37.4	18.4	9.62	100.0	39.5	15.6	8.73
	AdvPC	93.7	89.6	53.6	23.1	92.7	89.2	53.1	24.8
	AOF	96.7	99.7	94.2	75.4	95.4	99.0	93.5	77.5
	Mani-ADV	93.8	93.5	87.0	72.3	92.7	88.7	85.5	74.1
	本文	100.0	98.4	93.5	83.8	100.0	98.8	92.9	85.3
PointNet++	3D-Adv	100.0	65.9	27.3	22.9	100.0	60.2	25.7	20.2
	AdvPC	100.0	86.8	79.0	72.1	99.5	85.1	81.9	70.6
	AOF	96.7	92.8	91.0	88.2	97.6	88.2	85.8	81.2
	Mani-ADV	95.0	94.1	90.7	87.3	94.4	89.6	86.7	83.0
	本文	100.0	94.6	92.4	89.1	100.0	92.9	90.1	82.5
PointConv	3D-Adv	100.0	42.4	48.0	37.3	100.0	45.4	50.4	41.3
	AdvPC	93.7	80.5	94.0	88.1	98.1	77.6	92.1	84.8
	AOF	96.7	90.3	96.3	96.0	94.2	90.5	95.5	95.2
	Mani-ADV	98.4	92.1	97.2	96.3	97.9	93.1	96.7	95.0
	本文	100.0	95.0	95.6	96.5	100.0	94.3	95.4	95.3
DGCNN	3D-Adv	100.0	29.5	23.7	23.2	100.0	25.2	17.7	15.9
	AdvPC	93.7	65.4	68.5	62.1	96.1	62.5	59.7	57.2
	AOF	96.7	75.8	79.3	76.0	98.5	73.3	76.8	72.7
	Mani-ADV	97.5	84.7	80.1	79.7	98.2	75.4	78.8	75.9
	本文	98.8	84.9	88.5	81.1	99.8	80.8	84.2	77.5