PointMLP-FD: a point cloud classification model based on multi-level adaptive downsampling

doi:10.11996/JG.j.2095-302X.2023010112

Abstract

Abstract:

Due to the influence of objective factors, such as hardware limitations, object occlusion, and background clutter, the target point clouds collected by sensors have strong sparsity and density inhomogeneity, resulting in low learning efficiency of point cloud features by the classification model and poor classification generalization ability. To address these challenges, a point cloud classification model PointMLP-FD (feature-driven) was proposed based on multi-level adaptive downsampling. Multiple MLP modules were designed as network branches in the model, and with the shallow features of point clouds as inputs, feature expressions in each point cloud category dimension could be obtained. Then the points with stronger semantic features were selected to form the downsampled point set according to the ranking of the feature expressions. The information reflecting the essential features of the target could be self-adaptively retained by filtering the background and the information with low relevance to the target. Finally, the losses of branch networks were calculated separately and trained in parallel with the backbone network to optimize the point cloud features and reduce the model parameters. The proposed method was tested on the Scan Object NN dataset, and the results show that compared with PointMLP-elite, the classification accuracy is higher, with 1% improvement in mAcc and 0.8% improvement in OA, approaching the performance of the SOTA model with fewer parameters.

Key words: point cloud classification, self-adaption, downsampling, parallel training

CLC Number:

TP391

LIANG AO, LI Zhi-han, HUA Hai-yang. PointMLP-FD: a point cloud classification model based on multi-level adaptive downsampling[J]. Journal of Graphics, 2023, 44(1): 112-119.

Figures/Tables 10

Fig. 1 Generic point cloud classification network structure

Fig. 2 Model structure of PointMLP-FD, the backbone part is the same as PointMLP, Class ATT is the proposed adaptive downsampling module, which is a shallow MLP model with shallow point cloud features as input. Two branching networks are added to calculate the loss separately and participate in the network training together with the final classification error of the network

Fig. 3 Parallel training of Class ATT and loss calculation

Fig. 4 Samples from the ScanObjectNN

Fig. 5 Training process ((a) PointMLP-elite; (b) PointMLP-FD)

Table 1 Experimental results

Method	Overall Acc(%)	Mean Acc(%)	Param(M)
PointNet^[20]	63.0	58.1	-
SpiderCNN^[35]	68.2	63.4	-
PointNet++^[21]	77.9	75.4	1.41
DGCNN^[36]	78.1	73.6	-
PointCNN^[37]	78.5	75.1	-
GBNet^[38]	80.5	77.8	8.39
PRA-Net^[39]	82.1	79.1	-
Point-TnT^[40]	83.5	81.0	-
Point-BERT^[41]	83.1	-	20.8
PointMLP(SOTA)	85.4±0.3	83.9±0.5	12.6
PointMLP-elite	83.8±0.6	81.8±0.8	0.68
PointMLP-FD(Ours)	85.15	83.64	0.77

Table 2 Results for each category in the ScanObjectNN dataset

Method	Bag	Bin	Box	Cabinet	Chair	Desk	Display	Door	Shelf	Table	Bed	Pillow	Sink	Sofa	Toilet
PointMLP-elite	0.59	0.89	0.60	0.81	0.94	0.78	0.88	0.90	0.87	0.71	0.84	0.77	0.78	0.92	0.82
Ours	0.70	0.87	0.62	0.85	0.93	0.79	0.88	0.94	0.83	0.73	0.87	0.84	0.85	0.94	0.85

Fig. 6 Visualize the sampling results, FPS (top) and Class ATT (bottom) downsampling. The red box shows the results of both sampling methods on the background of the first sample in Chair ((a) Chair; (b) Table)

Table 3 Experimental results of extracted PointMLP-FD-se with PointMLP-elite?

Method	Overall Acc(%)	Avg Acc(%)	Param(M)
PointMLP-elite	83.8±0.6	81.8±0.8	0.68
PointMLP-eliteʹ	82.41	79.57	0.31
PointMLP-FD- se(Ours)	84.04	82.27	0.33

Fig. 7 The downsampling results of the network after training with max-pooling ((a) Average pooling respectively; (b) Post-training down-sampling results)

References 41

[1]	RAJ T, HASHIM F H, HUDDIN A B, et al. A survey on LiDAR scanning mechanisms[J]. Electronics, 2020, 9(5): 741. DOI URL
[2]	MEHENDALE N, NEOGE S. Review on lidar technology[EB/OL]. [2022-01-12]. https://www.researchgate.net/publication/342154967_Review_on_Lidar_Technology.
[3]	FEI B, YANG W D, CHEN W M, et al. Comprehensive review of deep learning-based 3D point cloud completion processing and analysis[EB/OL]. [2022-01-12].https://arxiv.org/abs/2203.03311.
[4]	ZAMANAKOS G, TSOCHATZIDIS L, AMANATIADIS A, et al. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving[J]. Computers & Graphics, 2021, 99: 153-181. DOI URL
[5]	QIAN R, LAI X, LI X R. 3D object detection for autonomous driving: a survey[EB/OL]. [2022-01-12].https://arxiv.org/abs/2106.10823.
[6]	RORIZ R, CABRAL J, GOMES T. Automotive LiDAR technology: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 6282-6297. DOI URL
[7]	LV D K, YING X X, CUI Y J, et al. Research on the technology of LIDAR data processing[C]//2017 First International Conference on Electronics Instrumentation & Information Systems. New York: IEEE Press, 2017: 1-5.
[8]	BELLO S A, YU S S, WANG C, et al. Review: deep learning on 3D point clouds[J]. Remote Sensing, 2020, 12(11): 1729. DOI URL
[9]	SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]//2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 945-953.
[10]	BAI S, BAI X, ZHOU Z C, et al. GIFT: a real-time and scalable 3D shape search engine[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 5023-5032.
[11]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//The 25th International Conference on Neural Information Processing Systems - Volume 1. New York: ACM, 2012: 1097-1105.
[12]	SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 945-953.
[13]	QI C R, SU H, NIEßNER M, et al. Volumetric and multi-view CNNs for object classification on 3D data[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 5648-5656.
[14]	MATURANA D, SCHERER S.3D convolutional neural networks for landing zone detection from LiDAR[C]//2015 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2015: 3471-3478.
[15]	MATURANA D, SCHERER S. VoxNet: a 3D convolutional neural network for real-time object recognition[C]//2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2015: 922-928.
[16]	WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1912-1920.
[17]	WANG C, CHENG M, SOHEL F, et al. NormalNet: a voxel-based CNN for 3D object classification and retrieval[J]. Neurocomputing, 2019, 323: 139-147. DOI
[18]	RIEGLER G, ULUSOY A O, GEIGER A. OctNet: learning deep 3D representations at high resolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6620-6629.
[19]	TATARCHENKO M, DOSOVITSKIY A, BROX T. Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2107-2115.
[20]	CHARLES R Q, HAO S, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[21]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5105-5114.
[22]	LI J X, CHEN B M, LEE G H. SO-net: self-organizing network for point cloud analysis[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 9397-9406.
[23]	ZHAO H S, JIANG L, FU C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5560-5568.
[24]	ZHANG W X, XIAO C X. PCAN: 3D attention map learning using contextual information for point cloud based retrieval[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 12428-12437.
[25]	KLOKOV R, LEMPITSKY V. Escape from cells: deep Kd-networks for the recognition of 3D point cloud models[C]//2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 863-872.
[26]	WANG C, SAMARI B, SIDDIQI K. Local spectral graph convolution for point set feature learning[EB/OL]. [2022-01- 12]. https://arxiv.org/abs/1803.05827.
[27]	ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]//2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 16239-16248.
[28]	ENGEL N, BELAGIANNIS V, DIETMAYER K. Point transformer[EB/OL]. [2022-01-12]. https://arxiv.org/abs/2011.00931.
[29]	GUO M H, CAI J X, LIU Z N, et al. PCT: point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187-199. DOI
[30]	PANG Y T, WANG W X, TAY F E H, et al. Masked autoencoders for point cloud self-supervised learning[EB/OL]. [2022-01-12]. https://arxiv.org/abs/2203.06604.
[31]	ZHANG Y F, HU Q Y, XU G Q, et al. Not all points are equal: learning highly efficient point-based detectors for 3D LiDAR point clouds[EB/OL]. [2022-01-12]. https://arxiv.org/abs/2203.11139.
[32]	WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1912-1920.
[33]	UY M A, PHAM Q H, HUA B S, et al. Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1588-1597.
[34]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[35]	XU Y F, FAN T Q, XU M Y, et al. SpiderCNN: deep learning on point sets with parameterized convolutional filters[M]// Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 90-105.
[36]	SHI S S, WANG X G, LI H S. PointRCNN: 3D object proposal generation and detection from point cloud[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 770-779.
[37]	WU W X, QI Z A, LI F X. PointConv: deep convolutional networks on 3D point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9613-9622.
[38]	QIU S, ANWAR S, BARNES N. Geometric back-projection network for point cloud classification[J]. IEEE Transactions on Multimedia, 2022, 24: 1943-1955. DOI URL
[39]	CHENG S L, CHEN X W, HE X W, et al. PRA-net: point relation-aware network for 3D point cloud analysis[J]. IEEE Transactions on Image Processing, 2021, 30: 4436-4448. DOI URL
[40]	BERG A, OSKARSSON M, OʹCONNOR M. Points to patches: enabling the use of self-attention for 3D shape recognition[EB/OL]. [2022-01-12].https://arxiv.org/abs/2204.03957.
[41]	YU X, TANG L, RAO Y, et al. Point-bert: pre-training 3D point cloud transformers with masked point modeling[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 19313-19322.