Domain adaptive urban scene semantic segmentation based on dual-source discriminator

doi:10.11996/JG.j.2095-302X.2023050907

Abstract

Abstract:

The adaptive segmentation network represents an efficacious method for cross-domain semantic segmentation within urban scenes. However, the challenge arises from the distinct appearance distributions among cross-domain datasets, leading to domain gaps and unsatisfactory network segmentation accuracy for small targets. To address these issues, a domain adaptive segmentation method based on a dual-source discriminator was proposed. Firstly, the new source domain S' was obtained using the style translation technology FastPhotoStyle for the source domain S, thereby reducing the domain gaps at the image level. Next, the generator was employed to extract segmentation feature maps from the source domain S, the new source domain S', and the target domain T, respectively. The feature map of the new source domain served as an intermediate bridge for the channel-wise fusion between the source and target domains feature maps. The two fused feature maps were input into the dual-source discriminator, with both the dual-source discriminator and the generator undergoing iterative training. Since the discriminator input of the proposed model consists of dual-source features, it is referred to as a dual-source discriminator. The two features from the dual-source input contained similar feature information, which further reduced domain differences at the feature level. To enhance segmentation accuracy, a self-training pseudo-label was introduced. At the same time, to address class imbalance issues during training, a class balance factor was incorporated into the loss function of the target domain, thereby enhancing the network’s ability to segment small targets. Experiments on two segmentation tasks GTA5→Cityscapes and SYNTHIA→Cityscapes demonstrated the advancement and effectiveness of the proposed method.

Key words: dual-source discriminator, adversarial learning, domain adaptation, semantic segmentation, self-training

CLC Number:

TP391

ZHANG Gui-mei, TAO Hui, LU Fei-fei, PENG Kun. Domain adaptive urban scene semantic segmentation based on dual-source discriminator[J]. Journal of Graphics, 2023, 44(5): 907-917.

Figures/Tables 12

References 32

[1]	青晨, 禹晶, 肖创柏, 等. 深度卷积神经网络图像语义分割研究进展[J]. 中国图象图形学报, 2020, 25(6): 1069-1090.
	QING C, YU J, XIAO C B, et al. Deep convolutional neural network for semantic image segmentation[J]. Journal of Image and Graphics, 2020, 25(6): 1069-1090. (in Chinese)
[2]	LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. DOI
[3]	范苍宁, 刘鹏, 肖婷, 等. 深度域适应综述: 一般情况与复杂情况[J]. 自动化学报, 2021, 47(3): 515-548.
	FAN C N, LIU P, XIAO T, et al. A review of deep domain adaptation: general situation and complex situation[J]. A Review of Deep Domain Adaptation: General Situation and Complex Situation, 2021, 47(3): 515-548. (in Chinese)
[4]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. DOI URL
[5]	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1-9.
[6]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[7]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[EB/OL]. [2023-01-19]. http://de.arxiv.org/pdf/1411.4038.
[8]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. DOI PMID
[9]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. DOI URL
[10]	TSAI Y H, HUNG W C, SCHULTER S, et al. Learning to adapt structured output space for semantic segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7472-7481.
[11]	GOODFELLOW I, POUGETABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680.
[12]	GONG R, LI W, CHEN Y H, et al. DLOW: domain flow for adaptation and generalization[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2472-2481.
[13]	ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2242-2251.
[14]	MATHUR A, ISOPOUSSU A, KAWSAR F, et al. FlexAdapt: flexible cycle-consistent adversarial domain adaptation[C]// 2019 18th IEEE International Conference on Machine Learning and Applications. New York: IEEE Press, 2020: 896-901.
[15]	LI Y S, YUAN L, VASCONCELOS N. Bidirectional learning for domain adaptation of semantic segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 6929-6938.
[16]	VU T H, JAIN H, BUCHER M, et al. ADVENT: adversarial entropy minimization for domain adaptation in semantic segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2512-2521.
[17]	HUANG J X, LU S J, GUAN D Y, et al. Contextual-relation consistent domain adaptation for semantic segmentation[M]// Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 705-722.
[18]	WANG Y C, WANG H C, SHEN Y J, et al. Semi-supervised semantic segmentation using unreliable pseudo-labels[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 4238-4247.
[19]	邵文斌, 刘玉杰, 孙晓瑞, 等. 基于残差增强注意力的跨模态行人重识别[J]. 图学学报, 2023, 44(1): 33-40. DOI
	SHAO W B, LIU Y J, SUN X R, et al. Cross modality person re-identification based on residual enhanced attention[J]. Journal of Graphics, 2023, 44(1): 33-40. (in Chinese)
[20]	CHEN S J, JIA X, HE J Z, et al. Semi-supervised domain adaptation based on dual-level domain mixing for semantic segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11013-11022.
[21]	LI Y J, LIU M Y, LI X T, et al. A closed-form solution to photorealistic image stylization[M]// Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 468-483.
[22]	LIN Y X, STANLEY TAN D, CHENG W H, et al. Spatially-aware domain adaptation for semantic segmentation of urban scenes[C]// 2019 IEEE International Conference on Image Processing. New York: IEEE Press, 2019: 1870-1874.
[23]	LIN Y X, TAN D S, CHENG W H, et al. Adapting semantic segmentation of urban scenes via mask-aware gated discriminator[C]// 2019 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2019: 218-223.
[24]	ZOU Y, YU Z D, VIJAYA KUMAR B K, et al. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training[C]// Computer Vision - ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part III. New York: ACM, 2018: 297-313.
[25]	LUO Y W, ZHENG L, GUAN T, et al. Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2502-2511.
[26]	张桂梅, 鲁飞飞, 龙邦耀, 等. 结合自集成和对抗学习的域自适应城市场景语义分割[J]. 模式识别与人工智能, 2021, 34(1): 58-67. DOI
	ZHANG G M, LU F F, LONG B Y, et al. Domain adaptation semantic segmentation for urban scene combining self-ensembling and adversarial learning[J]. Pattern Recognition and Artificial Intelligence, 2021, 34(1): 58-67. (in Chinese) DOI
[27]	HUANG J X, LU S J, GUAN D Y, et al. Contextual-relation consistent domain adaptation for semantic segmentation[M]// Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 705-722.
[28]	张桂梅, 潘国峰, 刘建新. 域自适应城市场景语义分割[J]. 中国图象图形学报, 2020, 25(5): 913-925.
	ZHANG G M, PAN G F, LIU J X. Domain adaptation for semantic segmentation based on adaption learning rate[J]. Journal of Image and Graphics, 2020, 25(5): 913-925. (in Chinese)
[29]	HOYER L, DAI D X, VAN GOOL L. DAFormer: improving network architectures and training strategies for domain-adaptive semantic segmentation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 9914-9925.
[30]	ZHANG P, ZHANG B, ZHANG T, et al. Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12409-12419.
[31]	PENG D, LEI Y J, HAYAT M, et al. Semantic-aware domain generalized segmentation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2584-2595.
[32]	LIN T Y, GOYAL P, GIRSHICK R B, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 42(2): 318-327. DOI URL

编号	数据集Dataset	颜色直方图	SSIM
图像1	D_S_-_T	0.247 9	0.117 8
图像1	D_S'_-_T	0.375 7	0.157 9
图像2	D_S_-_T	0.188 3	0.061 8
图像2	D_S'_-_T	0.295 0	0.312 4
图像3	D_S_-_T	0.214 8	0.078 6
图像3	D_S'_-_T	0.294 3	0.150 9

编号	数据集Dataset	颜色直方图	SSIM
图像1	D_S_-_T	0.247 9	0.117 8
图像1	D_S'_-_T	0.375 7	0.157 9
图像2	D_S_-_T	0.188 3	0.061 8
图像2	D_S'_-_T	0.295 0	0.312 4
图像3	D_S_-_T	0.214 8	0.078 6
图像3	D_S'_-_T	0.294 3	0.150 9

Domain adaptation dataset	Method	mIoU (%)
GTA5→Cityscapes	AT(S+T)	35.0
GTA5→Cityscapes	AT(S'+T)	42.0
SYNTHIA→Cityscapes	AT(S+T)	37.6
SYNTHIA→Cityscapes	AT(S'+T)	44.6

Domain adaptation dataset	Method	mIoU (%)
GTA5→Cityscapes	AT(S+T)	35.0
GTA5→Cityscapes	AT(S'+T)	42.0
SYNTHIA→Cityscapes	AT(S+T)	37.6
SYNTHIA→Cityscapes	AT(S'+T)	44.6

类别	方法
类别	AdaptSegNet^[10]	AdvEnt^[16]	CLAN^[25]	Cycada^[14]	SEGL^[26]	Ours
Road	87.30	86.9	88.00	87.30	2.1	92.4
Sidewalk	29.80	28.7	30.60	33.50	53.9	54.5
Building	78.60	78.7	79.20	77.90	81.4	83.2
Wall	21.10	28.5	23.40	20.90	27.3	30.8
Fence	18.20	25.2	20.50	17.90	25.1	24.8
Pole	22.50	17.1	26.10	-	33.2	34.0
Light	21.50	20.3	23.00	33.40	38.8	39.1
Sign	11.00	10.9	14.80	19.70	23.0	24.5
Vegetation	79.70	80.0	81.60	83.20	83.5	84.1
Terrain	29.60	26.4	34.50	-	34.1	34.9
Sky	71.30	70.2	72.00	70.10	70.7	78.7
Person	46.80	47.1	45.80	43.30	58.5	51.9
Rider	6.50	8.4	7.90	-	29.4	19.2
Car	80.10	81.5	80.50	77.40	84.2	84.3
Truck	23.00	26.0	26.60	-	27.8	28.3
Bus	26.90	17.2	29.90	22.50	34.8	38.3
Train	0.01	18.9	0.01	3.40	4.8	3.6
Motorbike	10.60	11.7	10.70	11.30	25.1	13.2
Bike	0.30	1.6	0.00	12.90	19.4	20.4
mIoU	35.00	36.1	36.60	37.20	44.8	45.8