Journal of Graphics ›› 2023, Vol. 44 ›› Issue (3): 531-539.DOI: 10.11996/JG.j.2095-302X.2023030531
Previous Articles Next Articles
Received:
2022-10-05
Accepted:
2023-02-22
Online:
2023-06-30
Published:
2023-06-30
About author:
WU Wen-huan (1985-), associate professor, Ph.D. His main research interests cover computer vision and image processing, etc. E-mail:wuwenhuan5@163.com
Supported by:
CLC Number:
WU Wen-huan, ZHANG Hao-kun. Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention[J]. Journal of Graphics, 2023, 44(3): 531-539.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023030531
Method | Backbone | SCCAM | CAM | mIoU (%) |
---|---|---|---|---|
Baseline | Resnet50 | - | - | 67.5 |
Ours | Resnet50 | - | √ | 80.0 |
Ours | Resnet50 | √ | - | 78.8 |
Ours | Resnet50 | √ | √ | 81.6 |
Table 1 Ablation study on the Cityscapes validation set
Method | Backbone | SCCAM | CAM | mIoU (%) |
---|---|---|---|---|
Baseline | Resnet50 | - | - | 67.5 |
Ours | Resnet50 | - | √ | 80.0 |
Ours | Resnet50 | √ | - | 78.8 |
Ours | Resnet50 | √ | √ | 81.6 |
Fig. 4 Visualized comparison of segmentation results of SCCAM and CAM ((a), (e) Original image; (b), (f) Ground truth; (c), (g) Results when remove the SCCAM and CAM respectively; (d), (h) Results when utilize the SCCAM and CAM the same time)
Method | Backbone | FPS | mIoU (%) |
---|---|---|---|
Baseline | Resnet50 | 0.95 | 67.5 |
EncNet[ | Resnet50 | 1.04 | 74.2 |
NLNet [ | Resnet50 | 0.82 | 77.0 |
SETR-MLA[ | VIT-L | 0.23 | 77.3 |
DNLNet[ | Resnet50 | 0.81 | 78.6 |
OCNet[ | Resnet50 | 1.08 | 79.3 |
DANet[ | Resnet50 | 0.84 | 80.0 |
Ours | Resnet50 | 0.95 | 81.6 |
Table 2 Results of different methods with the same experimental setup on the Cityscapes validation set
Method | Backbone | FPS | mIoU (%) |
---|---|---|---|
Baseline | Resnet50 | 0.95 | 67.5 |
EncNet[ | Resnet50 | 1.04 | 74.2 |
NLNet [ | Resnet50 | 0.82 | 77.0 |
SETR-MLA[ | VIT-L | 0.23 | 77.3 |
DNLNet[ | Resnet50 | 0.81 | 78.6 |
OCNet[ | Resnet50 | 1.08 | 79.3 |
DANet[ | Resnet50 | 0.84 | 80.0 |
Ours | Resnet50 | 0.95 | 81.6 |
Fig. 5 Results of image segmentation ((a) Original image; (b) Ground truth; (c) Baseline; (d) EncNet[12]; (e) NLNet[9]; (f) SETR-MLA[16]; (g) DNLNet[10]; (h) OCNet[22]; (i) DANet[13]; (j) Ours)
Method | mIoU | Road | Sidewalk | Building | Wall | Fence | Pole | Traffic Light | Traffic Sign | Vegetation |
---|---|---|---|---|---|---|---|---|---|---|
Baseline | 67.5 | 97.8 | 83.0 | 92.0 | 34.4 | 58.7 | 66.0 | 73.6 | 80.2 | 92.1 |
EncNet[ | 74.2 | 97.8 | 83.2 | 92.3 | 45.4 | 58.9 | 64.5 | 71.1 | 78.1 | 91.9 |
NLNet[ | 77.0 | 98.0 | 84.7 | 93.0 | 58.1 | 61.1 | 65.8 | 73.3 | 79.4 | 92.4 |
SETR-MLA[ | 77.3 | 98.2 | 85.3 | 92.2 | 63.7 | 64.4 | 53.0 | 63.3 | 73.4 | 91.8 |
DNLNet[ | 78.6 | 98.2 | 85.4 | 93.2 | 61.0 | 62.5 | 66.3 | 72.8 | 79.9 | 92.6 |
OCNet[ | 79.3 | 98.2 | 85.6 | 93.0 | 61.4 | 62.6 | 66.0 | 73.4 | 80.2 | 92.7 |
DANet[ | 80.0 | 98.3 | 85.8 | 93.1 | 62.0 | 63.5 | 66.7 | 73.3 | 80.7 | 92.8 |
Ours | 81.6 | 98.3 | 86.1 | 93.4 | 60.6 | 65.6 | 69.7 | 75.0 | 82.2 | 93.1 |
Method | Terrain | Sky | Person | Rider | Car | Truck | Bus | Train | Motorcycle | Bicycle |
Baseline | 59.1 | 94.6 | 82.4 | 62.4 | 91.6 | 17.3 | 33.4 | 35.0 | 51.3 | 77.9 |
EncNet[ | 61.3 | 94.4 | 80.8 | 62.3 | 94.6 | 64.3 | 84.9 | 60.2 | 47.8 | 76.6 |
NLNet[ | 62.7 | 94.7 | 82.7 | 62.1 | 95.4 | 68.0 | 85.7 | 76.9 | 51.7 | 78.2 |
SETR-MLA[ | 65.8 | 94.2 | 78.4 | 58.6 | 94.4 | 82.0 | 89.5 | 81.4 | 65.2 | 73.6 |
DNLNet[ | 64.7 | 95.1 | 83.3 | 65.4 | 95.6 | 73.9 | 85.2 | 69.7 | 69.5 | 79.0 |
OCNet[ | 65.3 | 95.1 | 83.2 | 64.6 | 95.5 | 80.7 | 87.0 | 76.1 | 67.4 | 78.8 |
DANet[ | 64.9 | 95.0 | 83.3 | 64.8 | 95.7 | 83.4 | 88.2 | 82.1 | 67.1 | 78.6 |
Ours | 65.7 | 95.2 | 84.5 | 67.0 | 95.7 | 86.4 | 91.7 | 86.9 | 72.2 | 80.2 |
Table 3 Results of different methods on the Cityscapes validation set for each category (%)
Method | mIoU | Road | Sidewalk | Building | Wall | Fence | Pole | Traffic Light | Traffic Sign | Vegetation |
---|---|---|---|---|---|---|---|---|---|---|
Baseline | 67.5 | 97.8 | 83.0 | 92.0 | 34.4 | 58.7 | 66.0 | 73.6 | 80.2 | 92.1 |
EncNet[ | 74.2 | 97.8 | 83.2 | 92.3 | 45.4 | 58.9 | 64.5 | 71.1 | 78.1 | 91.9 |
NLNet[ | 77.0 | 98.0 | 84.7 | 93.0 | 58.1 | 61.1 | 65.8 | 73.3 | 79.4 | 92.4 |
SETR-MLA[ | 77.3 | 98.2 | 85.3 | 92.2 | 63.7 | 64.4 | 53.0 | 63.3 | 73.4 | 91.8 |
DNLNet[ | 78.6 | 98.2 | 85.4 | 93.2 | 61.0 | 62.5 | 66.3 | 72.8 | 79.9 | 92.6 |
OCNet[ | 79.3 | 98.2 | 85.6 | 93.0 | 61.4 | 62.6 | 66.0 | 73.4 | 80.2 | 92.7 |
DANet[ | 80.0 | 98.3 | 85.8 | 93.1 | 62.0 | 63.5 | 66.7 | 73.3 | 80.7 | 92.8 |
Ours | 81.6 | 98.3 | 86.1 | 93.4 | 60.6 | 65.6 | 69.7 | 75.0 | 82.2 | 93.1 |
Method | Terrain | Sky | Person | Rider | Car | Truck | Bus | Train | Motorcycle | Bicycle |
Baseline | 59.1 | 94.6 | 82.4 | 62.4 | 91.6 | 17.3 | 33.4 | 35.0 | 51.3 | 77.9 |
EncNet[ | 61.3 | 94.4 | 80.8 | 62.3 | 94.6 | 64.3 | 84.9 | 60.2 | 47.8 | 76.6 |
NLNet[ | 62.7 | 94.7 | 82.7 | 62.1 | 95.4 | 68.0 | 85.7 | 76.9 | 51.7 | 78.2 |
SETR-MLA[ | 65.8 | 94.2 | 78.4 | 58.6 | 94.4 | 82.0 | 89.5 | 81.4 | 65.2 | 73.6 |
DNLNet[ | 64.7 | 95.1 | 83.3 | 65.4 | 95.6 | 73.9 | 85.2 | 69.7 | 69.5 | 79.0 |
OCNet[ | 65.3 | 95.1 | 83.2 | 64.6 | 95.5 | 80.7 | 87.0 | 76.1 | 67.4 | 78.8 |
DANet[ | 64.9 | 95.0 | 83.3 | 64.8 | 95.7 | 83.4 | 88.2 | 82.1 | 67.1 | 78.6 |
Ours | 65.7 | 95.2 | 84.5 | 67.0 | 95.7 | 86.4 | 91.7 | 86.9 | 72.2 | 80.2 |
Method | Backbone | mIoU (%) |
---|---|---|
Baseline | Resnet50 | 42.4 |
EncNet[ | Resnet50 | 42.7 |
DANet[ | Resnet50 | 42.8 |
OCNet[ | Resnet50 | 42.9 |
DNLNet[ | Resnet50 | 43.0 |
NLNet[ | Resnet50 | 43.1 |
Ours | Resnet50 | 43.8 |
Table 4 Cross validation
Method | Backbone | mIoU (%) |
---|---|---|
Baseline | Resnet50 | 42.4 |
EncNet[ | Resnet50 | 42.7 |
DANet[ | Resnet50 | 42.8 |
OCNet[ | Resnet50 | 42.9 |
DNLNet[ | Resnet50 | 43.0 |
NLNet[ | Resnet50 | 43.1 |
Ours | Resnet50 | 43.8 |
Method | Backbone | FPS | mIoU ( %) |
---|---|---|---|
Baseline | Resnet50 | 13.89 | 52.8 |
EncNet[ | Resnet50 | 14.29 | 72.7 |
OCNet[ | Resnet50 | 15.10 | 73.3 |
DNLNet [ | Resnet50 | 12.40 | 73.7 |
NLNet[ | Resnet50 | 12.52 | 74.0 |
DANet[ | Resnet50 | 12.76 | 74.3 |
SETR-MLA[ | VIT-L | 3.56 | 79.7 |
Ours | Resnet50 | 13.43 | 78.2 |
Table 5 Generalization performance test
Method | Backbone | FPS | mIoU ( %) |
---|---|---|---|
Baseline | Resnet50 | 13.89 | 52.8 |
EncNet[ | Resnet50 | 14.29 | 72.7 |
OCNet[ | Resnet50 | 15.10 | 73.3 |
DNLNet [ | Resnet50 | 12.40 | 73.7 |
NLNet[ | Resnet50 | 12.52 | 74.0 |
DANet[ | Resnet50 | 12.76 | 74.3 |
SETR-MLA[ | VIT-L | 3.56 | 79.7 |
Ours | Resnet50 | 13.43 | 78.2 |
[1] |
FENG D, HAASE-SCHÜTZ C, ROSENBAUM L, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(3): 1341-1360.
DOI URL |
[2] | CHEN X, WILLIAMS B M, VALLABHANENI S R, et al. Learning active contour models for medical image segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 11632-11640. |
[3] | ZHENG Z, ZHONG Y F, WANG J J, et al. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4096-4105. |
[4] | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 3431-3440. |
[5] |
CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
DOI URL |
[6] | ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2881-2890. |
[7] | RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional Networks for Biomedical Image Segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241. |
[8] |
BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
DOI PMID |
[9] | WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7794-7803. |
[10] | YIN M H, YAO Z L, CAO Y, et al. Disentangled non-local neural networks[EB/OL]. [2022-09-08]. https://arxiv.org/pdf/2006.06668.pdf. |
[11] | HUANG Z L, WANG X G, HUANG L C, et al. CCNet: criss-cross attention for semantic segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 603-612. |
[12] | ZHANG H, DANA K, SHI J P, et al. Context encoding for semantic segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7151-7160. |
[13] | FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3146-3154. |
[14] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2022-08-22]. https://arxiv.org/abs/2010.11929. |
[15] | STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: transformer for semantic segmentation[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 7262-7272. |
[16] | ZHENG S X, LU J C, ZHAO H S, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 6881-6890. |
[17] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 10012-10022. |
[18] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[19] | HE T, ZHANG Z, ZHANG H, et al. Bag of tricks for image classification with convolutional neural networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 558-567. |
[20] |
GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368.
DOI |
[21] | CONTRIBUTORS M. OpenMMLab semantic segmentation toolbox and benchmark[EB/OL]. [2022-08-15]. https://github.com/open-mmlab/mmsegmentation. |
[22] |
YUAN Y H, HUANG L, GUO J Y, et al. OCNet: object context for semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(8): 2375-2398.
DOI |
[1] |
LI Li-xia , WANG Xin, WANG Jun , ZHANG You-yuan.
Small object detection algorithm in UAV image based on
feature fusion and attention mechanism
[J]. Journal of Graphics, 2023, 44(4): 658-666.
|
[2] |
LI Xin , PU Yuan-yuan, ZHAO Zheng-peng , XU Dan , QIAN Wen-hua.
Content semantics and style features match consistent
artistic style transfer
[J]. Journal of Graphics, 2023, 44(4): 699-709.
|
[3] |
YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping.
Monocular depth estimation based on Laplacian
pyramid with attention fusion
[J]. Journal of Graphics, 2023, 44(4): 728-738.
|
[4] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. |
[5] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. |
[6] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[7] | XIAO Tian-xing, WU Jing-jing. Segmentation of laser coding characters based on residual and feature-grouped attention [J]. Journal of Graphics, 2023, 44(3): 482-491. |
[8] | XIE Guo-bo, HE Di-xuan, HE Yu-qin, LIN Zhi-yi. P-CenterNet for chimney detection in optical remote-sensing images [J]. Journal of Graphics, 2023, 44(2): 233-249. |
[9] | XIONG Ju-ju , XU Yang, FAN Run-ze , SUN Shao-cong. Flowers recognition based on lightweight visual transformer [J]. Journal of Graphics, 2023, 44(2): 271-279. |
[10] | CHENG Lang , JING Chao. X-ray image rotating object detection based on improved YOLOv7 [J]. Journal of Graphics, 2023, 44(2): 324-334. |
[11] | CAO Yi-qin , WU Ming-lin , XU Lu. Steel surface defect detection based on improved YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(2): 335-345. |
[12] | ZHANG Wei-kang, SUN Hao, CHEN Xin-kai, LI Xu-bing, YAO Li-gang, DONG Hui . Research on weed detection in vegetable seedling fields based on the improved YOLOv5 intelligent weeding robot [J]. Journal of Graphics, 2023, 44(2): 346-356. |
[13] | LI Xiao-bo , LI Yang-gui, GUO Ning , FAN Zhen. Mask detection algorithm based on YOLOv5 integrating attention mechanism [J]. Journal of Graphics, 2023, 44(1): 16-25. |
[14] | SHAO Wen-bin, LIU Yu-jie, SUN Xiao-rui, LI Zong-min . Cross modality person re-identification based on residual enhanced attention [J]. Journal of Graphics, 2023, 44(1): 33-40. |
[15] |
SHAN Fang-mei , WANG Meng-wen , LI Min.
Multi-scale convolutional neural network incorporating attention
mechanism for intestinal polyp segmentation
[J]. Journal of Graphics, 2023, 44(1): 50-58.
|
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||