Journal of Graphics ›› 2025, Vol. 46 ›› Issue (3): 558-567.DOI: 10.11996/JG.j.2095-302X.2025030558
• Image Processing and Computer Vision • Previous Articles Next Articles
SUN Hao1(), XIE Tao1, HE Long2, GUO Wenzhong3, YU Yongfang2, WU Qijun2, WANG Jianwei2, DONG Hui4,5
Received:
2024-08-01
Accepted:
2025-01-22
Online:
2025-06-30
Published:
2025-06-13
About author:
First author contact:SUN Hao (1986-), professor, Ph.D. His main research interests cover sensors and artificial intelligence. E-mail:sunnice@hit.edu.cn
Supported by:
CLC Number:
SUN Hao, XIE Tao, HE Long, GUO Wenzhong, YU Yongfang, WU Qijun, WANG Jianwei, DONG Hui. Research on multimodal text-visual large model for robotic terrain perception algorithm[J]. Journal of Graphics, 2025, 46(3): 558-567.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025030558
网络 | 监督方式 | IoU/% | 可用掩码样本数 | 可用掩码占比/% |
---|---|---|---|---|
SERNet-Former[ | 监督训练 | 98.82 | - | - |
Panoptic DeepLab[ | 监督训练 | 98.88 | - | - |
SAC[ | 自监督训练 | 90.41 | - | - |
RPT[ | 自监督训练 | 89.20 | - | - |
本文算法 | 无训练 | 90.14 | 3 446 | 76.58 |
Table 1 Evaluation results of Cityscapes dataset under different supervision frameworks
网络 | 监督方式 | IoU/% | 可用掩码样本数 | 可用掩码占比/% |
---|---|---|---|---|
SERNet-Former[ | 监督训练 | 98.82 | - | - |
Panoptic DeepLab[ | 监督训练 | 98.88 | - | - |
SAC[ | 自监督训练 | 90.41 | - | - |
RPT[ | 自监督训练 | 89.20 | - | - |
本文算法 | 无训练 | 90.14 | 3 446 | 76.58 |
SLIC | Dice | MIoU/% | 训练损失 |
---|---|---|---|
× | × | 91.42 | 0.930 4 |
√ | × | 93.78 | 0.820 1 |
√ | √ | 96.34 | 0.666 5 |
Table 2 Ablation Experiment
SLIC | Dice | MIoU/% | 训练损失 |
---|---|---|---|
× | × | 91.42 | 0.930 4 |
√ | × | 93.78 | 0.820 1 |
√ | √ | 96.34 | 0.666 5 |
Fig. 8 Outdoor experiment and segmentation prediction of robotic dogs ((a) Robot dog and experimental environment; (b) Original terrain image; (c) Terrain segmentation results)
[1] |
GUPTA A, SAVARESE S, GANGULI S, et al. Embodied intelligence via learning and evolution[J]. Nature communications, 2021, 12(1): 5721.
DOI PMID |
[2] | MATHUR P, PANDIAN K S. Terrain classification for traversability analysis for autonomous robot navigation in unknown natural terrain[J]. International Journal of Engineering Science and Technology, 2012, 4(1): 38-49. |
[3] | LEE J, HWANGBO J, WELLHAUSEN L, et al. Learning quadrupedal locomotion over challenging terrain[J]. Science robotics, 2020, 5(47): eabc5986. |
[4] | WELLHAUSEN L, DOSOVITSKIY A, RANFTL R, et al. Where should i walk? predicting terrain properties from images via self-supervised learning[J]. IEEE Robotics and Automation Letters, 2019, 4(2): 1509-1516. |
[5] | LONG Y X, LI X Q, CAI W Z, et al. Discuss before moving: Visual language navigation via multi-expert discussions[C]// 2024 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2024: 17380-17387. |
[6] |
张慧, 荣学文, 李贻斌, 等. 四足机器人地形识别与路径规划算法[J]. 机器人, 2015, 37(5): 546-556.
DOI |
ZHANG H, RONG X W, LI Y B, et al. Terrain recognition and path planning for quadruped robot[J]. Robot, 2015, 37(5): 546-556 (in Chinese).
DOI |
|
[7] | FANKHAUSER P, BJELONIC M, BELLICOSO C D, et al. Robust rough-terrain locomotion with a quadrupedal robot[C]// 2018 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2018: 5761-5768. |
[8] | JENELTEN F, MIKI T, VIJAYAN A E, et al. Perceptive locomotion in rough terrain-online foothold optimization[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 5370-5376. |
[9] | KUROBE A, NAKAJIMA Y, KITANI K, et al. Audio-visual self-supervised terrain type recognition for ground mobile platforms[J]. IEEE Access, 2021, 9: 29970-29979. |
[10] | CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 833-851. |
[11] | 赵迪, 戴志鹏, 李世其, 等. 巡视探测任务中复杂地形信息感知与场景建模[J]. 航天器工程, 2019, 28(5): 32-38. |
ZHAO D, DAI Z P, LI S Q, et al. Perception and scene modeling of complex terrain information in patrol and exploration tasks[J]. Spacecraft Engineering, 2019, 28(5): 32-38 (in Chinese). | |
[12] |
张明路, 王哲, 李满宏, 等. 基于足端位置的六足机器人漫游地形感知与表征[J]. 机械工程学报, 2021, 57(19): 48-60.
DOI |
ZHANG M L, WANG Z, LI M H, et al. Perception and representation of roaming terrain for a hexapod robot based on foot positions[J]. Journal of Mechanical Engineering, 2021, 57(19): 48-60 (in Chinese).
DOI |
|
[13] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 40(4): 834-848. |
[14] | HOWARD A, SERAJI H. Vision‐based terrain characterization and traversability assessment[J]. journal of robotic systems, 2001, 18(10): 577-587. |
[15] | KINGRY N, JUNG M, DERSE E, et al. Vision-based terrain classification and solar irradiance mapping for solar-powered robotics[C]// 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2018: 5834-5840. |
[16] |
张桂梅, 陶辉, 鲁飞飞, 等. 基于双源判别器的域自适应城市场景语义分割[J]. 图学学报, 2023, 44(5): 907-917.
DOI |
ZHANG G M, TAO H, LU F Fi, et al. Domain adaptive urban scene semantic segmentation based on dual-source discriminator[J]. Journal of Graphics, 2023, 44(5): 907-917 (in Chinese). | |
[17] | WANG Z R, ZENG X, YAN Z Y, et al. AIR-PolSAR-Seg: a large-scale data set for terrain segmentation in complex-scene PolSAR images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 3830-3841. |
[18] | CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]// The IEEE conference on computer vision and pattern recognition. New York: IEEE Press, 2016: 3213-3223. |
[19] | XU P, DING L, LI Z Y, et al. Learning physical characteristics like animals for legged robots[J]. National Science Review, 2023, 10(5): nwad045. |
[20] |
李满宏, 张明路, 张建华, 等. 基于增强学习的六足机器人自由步态规划[J]. 机械工程学报, 2019, 55(5): 36-44.
DOI |
LI M H, ZHANG M L, ZHANG J H, et al. Free gait planning for a hexapod robot based on reinforcement learning[J]. Journal of Mechanical Engineering, 2019, 55(5): 36-44 (in Chinese).
DOI |
|
[21] | KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]// The IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3992-4003. |
[22] |
吴精乙, 景峻, 贺熠凡, 等. 基于多模态大模型的高速公路场景交通异常事件分析方法[J]. 图学学报, 2024, 45(6): 1266-1276.
DOI |
WU J Y, JING J, HE Y F, et al. Traffic anomaly event analysis method for highway scenes based on multimodal large language models[J]. Journal of Graphics, 2024, 45(6): 1266-1276 (in Chinese).
DOI |
|
[23] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]//[2024-05-31]https://dblp.uni-trier.de/db/conf/icml/icml2021.html#RadfordKHRGASAM21. |
[24] |
ACHANTA R, SHAJI A, SMITH K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 34(11): 2274-2282.
PMID |
[25] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]// The 18th International Conference on Medical image computing and computer-assisted intervention Cham:Springer, 2015: 234-241. |
[26] | MINAEE S, BOYKOV Y, PORIKLI F, et al. Image segmentation using deep learning: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(7): 3523-3542. |
[27] | ERISEN S. SERNet-former: segmentation by efficient-ResNet with attention-boosting gates and attention-fusion networks[C]// IEEE International Conference on Computer Vision and Machine Intelligence. New York: IEEE Press, 2024: 1-6. |
[28] | CHENG B W, COLLINS M D, ZHU Y K, et al. Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation[C]// The IEEE/CVF conference on computer vision and pattern recognition. New York: IEEE Press, 2020: 12472-12482. |
[29] | ARASLANOV N, ROTH S. Self-supervised augmentation consistency for adapting semantic segmentation[C]// The IEEE/CVF conference on computer vision and pattern recognition. New York: IEEE Press, 2021: 15379-15389. |
[30] | ZHANG YH, QIU Z F, YAO T, et al. Transferring and regularizing prediction for semantic segmentation[C]// The IEEE/CVF Conference on computer vision and pattern recognition. New York: IEEE Press, 2020: 9618-9627. |
[1] | WANG Daolei, DING Zijian, YANG Jun, ZHENG Shaokai, ZHU Rui, ZHAO Wenbin. Large scene reconstruction method based on voxel grid feature of NeRF [J]. Journal of Graphics, 2025, 46(3): 502-509. |
[2] | ZHAI Yongjie, WANG Luyao, ZHAO Xiaoyu, HU Zhedong, WANG Qianming, WANG Yaru. Multi-fitting detection for transmission lines based on a cascade query-position relationship method [J]. Journal of Graphics, 2025, 46(2): 288-299. |
[3] | PAN Shuyan, LIU Liqun. MSFAFuse: sar and optical image fusion model based on multi-scale feature information and attention mechanism [J]. Journal of Graphics, 2025, 46(2): 300-311. |
[4] | ZHANG Tiansheng, ZHU Minfeng, REN Yiwen, WANG Chenhan, ZHANG Lidong, ZHANG Wei, CHEN Wei. BPA-SAM: box prompt augmented SAM for traditional Chinese realistic painting [J]. Journal of Graphics, 2025, 46(2): 322-331. |
[5] | SUN Heyi, LI Yixiao, TIAN Xi, ZHANG Songhai. Image to 3D vase generation technology combining procedural content generation and diffusion models [J]. Journal of Graphics, 2025, 46(2): 332-344. |
[6] | CHEN Ruiqi, LIU Xiaofei, WAN Feng, HOU Peng, SHEN Jinyi. Simulation and prediction method of satellite solar wing deployment test driven by digital twin [J]. Journal of Graphics, 2025, 46(2): 449-458. |
[7] | WANG Yan, ZHANG Muyu, LIU Xiuzhen. Visual interactive meaning evaluation method of movie posters based on deep learning [J]. Journal of Graphics, 2025, 46(1): 221-232. |
[8] | LIU Jichen, LI Jinxing, WU Jia, ZHANG Wei, QI Yunuo, ZHOU Guoliang. Prospects for the application of large models technology in the power industry [J]. Journal of Graphics, 2024, 45(6): 1132-1144. |
[9] | LI Qiong, KAO Yueying, ZHANG Ying, XU Pei. Review on object detection in UAV aerial images [J]. Journal of Graphics, 2024, 45(6): 1145-1164. |
[10] | LIU Canfeng, SUN Hao, DONG Hui. Molecular amplification time series prediction research combining Transformer with Kolmogorov-Arnold network [J]. Journal of Graphics, 2024, 45(6): 1256-1265. |
[11] | SONG Sicheng, CHEN Chen, LI Chenhui, WANG Changbo. Spatiotemporal data visualization based on density map multi-target tracking [J]. Journal of Graphics, 2024, 45(6): 1289-1300. |
[12] | WANG Zongji, LIU Yunfei, LU Feng. Cloud Sphere: a 3D shape representation method via progressive deformation [J]. Journal of Graphics, 2024, 45(6): 1375-1388. |
[13] | XU Dandan, CUI Yong, ZHANG Shiqian, LIU Yucong, LIN Yusong. Optimizing the visual effects of 3D rendering in medical imaging: a technical review [J]. Journal of Graphics, 2024, 45(5): 879-891. |
[14] | HU Fengkuo, YE Lan, TAN Xianfeng, ZHANG Qinzhan, HU Zhixin, FANG Qing, WANG Lei, MAN Xiaofeng. A refined YOLOv8-based algorithm for lightweight pavement disease detection [J]. Journal of Graphics, 2024, 45(5): 892-900. |
[15] | LIU Yiyan, HAO Tingnan, HE Chen, CHANG Yingjie. Photovoltaic cell surface defect detection based on DBBR-YOLO [J]. Journal of Graphics, 2024, 45(5): 913-921. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||