Perceptually-aligned panoramic image quality assessment via global semantic feature fusion

doi:10.11996/JG.j.2095-302X.2026020332

Abstract

Abstract:

Panoramic Image Quality Assessment aims to objectively reflect the subjective perceptual quality of immersive visual content. However, a significant discrepancy often exists between the objective predictions of current deep learning models and human subjective perception, primarily due to an over-reliance on low-level distortion features. To address this critical issue, a novel Hierarchical Semantic-Guided Network, was proposed, which emulated the “top-down” cognitive mechanism inherent in the human visual system. Prevailing methods predominantly follow a “bottom-up” paradigm, aggregating quality scores from pixel-level features. however, this process often fails to effectively integrate high-level semantic information such as global composition and aesthetic attributes, thereby limiting the performance ceiling. To this end, a dual-path parallel information processing architecture was constructed, centered around a “top-down” semantic attention modulation mechanism. Within this architecture, a semantic prior path leveraged a Vision-Language Model to parse the input image into a structured semantic embedding. Concurrently, a visual representation path extracted multi-scale feature maps using a deep convolutional network. The designed modulation mechanism utilized the semantic embedding as a conditional input to generate dynamic attention weights, which performed real-time recalibration of the multi-scale features in the visual path. This design ensured that the entire feature extraction process was guided by high-level semantics, thereby focusing on information most critical to human subjective judgment. To ensure the ordinal relationship of the model’s predictions aligns with human perception, the entire framework was optimized end-to-end via a composite objective function that incorporated a listwise ranking loss. Comprehensive experiments on three public benchmark datasets, CVIQD, OIQA, and OIQ-10K, demonstrated that the proposed framework significantly outperformed state-of-the-art methods, validating the effectiveness and novelty of the semantic-guided paradigm in advancing perceptual quality assessment tasks.

Key words: panoramic image quality assessment, perceptual alignment, vision-language model, no-reference quality assessment

CLC Number:

TP391.41

BAO Yongtang, WANG Moqin, WANG Zhihui, MA Guangxiao. Perceptually-aligned panoramic image quality assessment via global semantic feature fusion[J]. Journal of Graphics, 2026, 47(2): 332-340.

Figures/Tables 7

References 24

[1]	田颖哲, 董武, 陆利坤, 等. 基于深度学习的全景图像质量评价研究现状及展望[J/OL]. 计算机科学与探索, (2025-07-25) [2025-08-28]. https://link.cnki.net/urlid/11.5602.tp.20250724.1452.002.
	TIAN Y Z, DONG W, LU L K, et al. Research status and prospects of omnidirectional image quality assessment based on deep learning[J/OL]. Journal of Frontiers of Computer Science and Technology, (2025-07-25) [2025-08-28]. https://link.cnki.net/urlid/11.5602.tp.20250724.1452.002 (in Chinese).
[2]	ZHAI G T, MIN X K. Perceptual image quality assessment: a survey[J]. Science China Information Sciences, 2020, 63(11): 211301. DOI
[3]	MIN X K, DUAN H Y, SUN W, et al. Perceptual video quality assessment: a survey[J]. Science China Information Sciences, 2024, 67(11): 211301. DOI
[4]	鄢杰斌, 谭淄文, 吴康诚, 等. 非视口依赖的抗畸变无参考全景图像质量评价[J]. 中国图象图形学报, 2024, 29(12): 3699-3711.
	YAN J B, TAN Z W, WU K C, et al. Viewport-independent and deformation-unaware no-reference omnidirectional image quality assessment[J]. Journal of Image and Graphics, 2024, 29(12): 3699-3711 (in Chinese). DOI URL
[5]	WANG Z, LI Q. Information content weighting for perceptual image quality assessment[J]. IEEE Transactions on Image Processing, 2011, 20(5): 1185-1198. DOI PMID
[6]	FU J, HOU C, ZHOU W, et al. Adaptive hypergraph convolutional network for no-reference 360-degree image quality assessment[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 961-969.
[7]	TIAN C Z, CHAI X L, CHEN G, et al. VSOIQE: a novel viewport-based stitched 360° omnidirectional image quality evaluator[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6557-6572. DOI URL
[8]	卢洋, 陈林慧, 姜晓恒, 等. SDENet: 基于多尺度注意力质量感知的合成缺陷数据评价网络[J]. 图学学报, 2025, 46(1): 94-103. DOI
	LU Y, CHEN L H, JIANG X H, et al. SDENet: a synthetic defect data evaluation network based on multi-scale attention quality perception[J]. Journal of Graphics, 2025, 46(1): 94-103 (in Chinese). DOI
[9]	安平, 汤旭锋, 杨超, 等. 基于立体感感知的全景图像质量评价算法[J]. 信号处理, 2025, 41(4): 759-769.
	AN P, TANG X F, YANG C, et al. Omnidirectional image quality assessment algorithm based on stereo perception[J]. Journal of Signal Processing, 2025, 41(4): 759-769 (in Chinese).
[10]	冯晨曦, 张地, 林敢, 等. 基于多视口自适应融合的全景图像质量评价[J]. 北京航空航天大学学报, 2025, 51(7): 2404-2414.
	FENG C X, ZHANG D, LIN G, et al. Omnidirectional image quality assessment based on adaptive multi-viewport fusion[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2404-2414 (in Chinese).
[11]	林恒, 纪庆革. 融合显著视口提取与跨层注意力的全景图像质量评价方法[J]. 计算机科学, 2025, 52(9): 249-258.
	LIN H, JI Q G. Panoramic image quality assessment method integrating salient viewport extraction and cross-layer attention[J]. Computer Science, 2025, 52(9): 249-258 (in Chinese).
[12]	何子健, 李冠彬. 基于扩散模型的个性化图像生成方法综述[J/OL]. 软件学报, (2025-11-27) [2025-11-28]. https://doi.org/10.13328/j.cnki.jos.007511.
	HE Z J, LI G B. Review of personalized image generation methods based on diffusion models[J/OL]. Journal of Software, (2025-11-27) [2025-11-28]. https://doi.org/10.13328/j.cnki.jos.007511 (in Chinese).
[13]	LIU T J, LIN W S, KUO C C J. Image quality assessment using multi-method fusion[J]. IEEE Transactions on Image Processing, 2013, 22(5): 1793-1807. DOI URL
[14]	SUN W, GU K, MA S W, et al. A large-scale compressed 360-degree spherical image database: from subjective quality evaluation to objective model comparison[C]// The 20th IEEE International Workshop on Multimedia Signal Processing. New York: IEEE Press, 2018: 1-6.
[15]	DUAN H Y, ZHAI G T, MIN X K, et al. Perceptual quality assessment of omnidirectional images[C]// 2018 IEEE International Symposium on Circuits and Systems. New York: IEEE Press, 2018: 1-5.
[16]	YAN J B, TAN Z W, FANG Y M, et al. Omnidirectional image quality captioning: a large-scale database and a new model[J]. IEEE Transactions on Image Processing, 2025, 34: 1326-1339. DOI PMID
[17]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. DOI PMID
[18]	ZAKHARCHENKO V, CHOI K P, PARK J H. Quality metric for spherical panoramic video[C]// Proceedings of SPIE-The International Society for Optical Engineering. Bellingham: SPIE Press, 2016: 1-9.
[19]	YU M, LAKSHMAN H, GIROD B. A framework to evaluate omnidirectional video coding schemes[C]// 2015 IEEE International Symposium on Mixed and Augmented Reality. New York: IEEE Press, 2015: 31-36.
[20]	MITTAL A, MOORTHY A K, BOVIK A C. No-reference image quality assessment in the spatial domain[J]. IEEE Transactions on Image Processing, 2012, 21(12): 4695-4708. DOI PMID
[21]	HOU W L, GAO X B, TAO D C, et al. Blind image quality assessment via deep learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(6): 1275-1286. DOI PMID
[22]	AGNOLUCCI L, GALTERI L, BERTINI M, et al. ARNIQA: learning distortion manifold for image quality assessment[C]// 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2024: 188-197.
[23]	SUN W, MIN X K, ZHAI G T, et al. MC360IQA: a multi-channel CNN for blind 360-degree image quality assessment[J]. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 64-77. DOI URL
[24]	XU J H, ZHOU W, CHEN Z B. Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(5): 1724-1737. DOI URL

类型	方法	CVIQD↑			OIQA↑
类型	方法	SRCC	PLCC	RMSE	SRCC	PLCC	RMSE
全参考	SSIM	0.897	0.885	6.21	0.889	0.889	6.58
	CPSNR	0.782	0.757	8.76	0.705	0.705	10.10
	SPSNR	0.774	0.747	8.91	0.711	0.715	10.10
无参考	BISQE	0.764	-0.744	9.71	-0.779	0.777	9.04
	NIQE	0.532	-0.512	11.90	-0.331	0.470	12.70
	ANIQA	0.809	0.862	9.74	0.586	0.652	11.40
	360IQA	0.913	0.951	3.09	0.918	0.924	14.60
	VGCN	0.942	0.965	─	─	─	─
	Ours	0.944	0.961	4.03	0.941	0.931	6.18

类型	方法	CVIQD↑			OIQA↑
类型	方法	SRCC	PLCC	RMSE	SRCC	PLCC	RMSE
全参考	SSIM	0.897	0.885	6.21	0.889	0.889	6.58
	CPSNR	0.782	0.757	8.76	0.705	0.705	10.10
	SPSNR	0.774	0.747	8.91	0.711	0.715	10.10
无参考	BISQE	0.764	-0.744	9.71	-0.779	0.777	9.04
	NIQE	0.532	-0.512	11.90	-0.331	0.470	12.70
	ANIQA	0.809	0.862	9.74	0.586	0.652	11.40
	360IQA	0.913	0.951	3.09	0.918	0.924	14.60
	VGCN	0.942	0.965	─	─	─	─
	Ours	0.944	0.961	4.03	0.941	0.931	6.18

类型	方法	CdistR1↑		CdistR2↑		All↑
类型	方法	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC
全参考	SSIM	0.191	0.227	0.280	0.261	0.250	0.299
	CPSNR	0.188	0.220	0.271	0.355	0.248	0.295
	SPSNR	0.216	0.273	0.275	0.359	0.262	0.302
无参考	360IQA	0.426	0.446	0.625	0.626	0.710	0.721
	VGCN	0.479	0.498	0.649	0.654	0.699	0.706
	Our	0.542	0.560	0.666	0.671	0.731	0.740

类型	方法	CdistR1↑		CdistR2↑		All↑
类型	方法	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC
全参考	SSIM	0.191	0.227	0.280	0.261	0.250	0.299
	CPSNR	0.188	0.220	0.271	0.355	0.248	0.295
	SPSNR	0.216	0.273	0.275	0.359	0.262	0.302
无参考	360IQA	0.426	0.446	0.625	0.626	0.710	0.721
	VGCN	0.479	0.498	0.649	0.654	0.699	0.706
	Our	0.542	0.560	0.666	0.671	0.731	0.740

模块配置	CVIQD↑			OIQA↑
模块配置	SRCC	PLCC	RMSE	SRCC	PLCC	RMSE
去掉语义感知	0.900	0.915	4.569	0.920	0.912	6.340
去掉多模态融合	0.921	0.932	4.332	0.929	0.919	6.239
去掉排序损失函数	0.933	0.940	4.281	0.936	0.926	6.180
仅视觉特征	0.885	0.895	4.800	0.896	0.906	6.701
完整模型	0.944	0.961	4.032	0.941	0.931	6.181