融合语义特征的全景图像质量评估

doi:10.11996/JG.j.2095-302X.2026020332

摘要/Abstract

摘要：

全景图像质量评估旨在客观反映沉浸式视觉内容的主观感知质量。然而，现有深度学习模型在该任务中，常因过度依赖底层失真特征而导致其客观预测与人类主观感知存在显著偏差。为解决这一关键问题，提出一种新颖的层级式语义引导网络，其核心在于模拟人类视觉系统中“自顶向下”的认知机制。当前主流方法多遵循“自底向上”的范式，即从像素级特征中聚合质量分数，由于过程缺乏对图像全局结构、构图美学等高级语义信息的有效整合，从而限制了其性能上界。为此，该框架构建了一个双路并行信息处理体系，其核心在于“自顶向下”的语义注意力调制机制。在该体系中，语义先验通路利用视觉语言模型将输入图像解析为一个结构化的语义嵌入向量；与此同时，视觉表征通路通过深度卷积网络提取多尺度特征图。其设计的调制机制以语义嵌入向量为条件输入，生成动态注意力权重，对视觉通路中的多尺度特征进行实时重标定。并使得整个特征提取过程都能受到高级语义的引导，从而聚焦于人类主观判断的关键信息。为确保模型预测在排序关系上与人类感知保持一致，整个框架通过一个结合了列表排序损失的复合目标函数进行端到端优化。在CVIQD，OIQA和OIQ-10K的3个公开基准数据集上的综合实验结果表明，该框架的性能显著优于现有前沿方法，验证了该语义引导范式在提升感知质量评估任务上的有效性与先进性。

关键词: 全景图像质量评估, 感知一致性, 视觉语言模型, 无参考质量评估

Abstract:

Panoramic Image Quality Assessment aims to objectively reflect the subjective perceptual quality of immersive visual content. However, a significant discrepancy often exists between the objective predictions of current deep learning models and human subjective perception, primarily due to an over-reliance on low-level distortion features. To address this critical issue, a novel Hierarchical Semantic-Guided Network, was proposed, which emulated the “top-down” cognitive mechanism inherent in the human visual system. Prevailing methods predominantly follow a “bottom-up” paradigm, aggregating quality scores from pixel-level features. however, this process often fails to effectively integrate high-level semantic information such as global composition and aesthetic attributes, thereby limiting the performance ceiling. To this end, a dual-path parallel information processing architecture was constructed, centered around a “top-down” semantic attention modulation mechanism. Within this architecture, a semantic prior path leveraged a Vision-Language Model to parse the input image into a structured semantic embedding. Concurrently, a visual representation path extracted multi-scale feature maps using a deep convolutional network. The designed modulation mechanism utilized the semantic embedding as a conditional input to generate dynamic attention weights, which performed real-time recalibration of the multi-scale features in the visual path. This design ensured that the entire feature extraction process was guided by high-level semantics, thereby focusing on information most critical to human subjective judgment. To ensure the ordinal relationship of the model’s predictions aligns with human perception, the entire framework was optimized end-to-end via a composite objective function that incorporated a listwise ranking loss. Comprehensive experiments on three public benchmark datasets, CVIQD, OIQA, and OIQ-10K, demonstrated that the proposed framework significantly outperformed state-of-the-art methods, validating the effectiveness and novelty of the semantic-guided paradigm in advancing perceptual quality assessment tasks.

Key words: panoramic image quality assessment, perceptual alignment, vision-language model, no-reference quality assessment

中图分类号:

TP391.41

包永堂, 王谟钦, 王智慧, 马光晓. 融合语义特征的全景图像质量评估[J]. 图学学报, 2026, 47(2): 332-340.

BAO Yongtang, WANG Moqin, WANG Zhihui, MA Guangxiao. Perceptually-aligned panoramic image quality assessment via global semantic feature fusion[J]. Journal of Graphics, 2026, 47(2): 332-340.

图/表 7

参考文献 24

[1]	田颖哲, 董武, 陆利坤, 等. 基于深度学习的全景图像质量评价研究现状及展望[J/OL]. 计算机科学与探索, (2025-07-25) [2025-08-28]. https://link.cnki.net/urlid/11.5602.tp.20250724.1452.002.
	TIAN Y Z, DONG W, LU L K, et al. Research status and prospects of omnidirectional image quality assessment based on deep learning[J/OL]. Journal of Frontiers of Computer Science and Technology, (2025-07-25) [2025-08-28]. https://link.cnki.net/urlid/11.5602.tp.20250724.1452.002 (in Chinese).
[2]	ZHAI G T, MIN X K. Perceptual image quality assessment: a survey[J]. Science China Information Sciences, 2020, 63(11): 211301. DOI
[3]	MIN X K, DUAN H Y, SUN W, et al. Perceptual video quality assessment: a survey[J]. Science China Information Sciences, 2024, 67(11): 211301. DOI
[4]	鄢杰斌, 谭淄文, 吴康诚, 等. 非视口依赖的抗畸变无参考全景图像质量评价[J]. 中国图象图形学报, 2024, 29(12): 3699-3711.
	YAN J B, TAN Z W, WU K C, et al. Viewport-independent and deformation-unaware no-reference omnidirectional image quality assessment[J]. Journal of Image and Graphics, 2024, 29(12): 3699-3711 (in Chinese). DOI URL
[5]	WANG Z, LI Q. Information content weighting for perceptual image quality assessment[J]. IEEE Transactions on Image Processing, 2011, 20(5): 1185-1198. DOI PMID
[6]	FU J, HOU C, ZHOU W, et al. Adaptive hypergraph convolutional network for no-reference 360-degree image quality assessment[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 961-969.
[7]	TIAN C Z, CHAI X L, CHEN G, et al. VSOIQE: a novel viewport-based stitched 360° omnidirectional image quality evaluator[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6557-6572. DOI URL
[8]	卢洋, 陈林慧, 姜晓恒, 等. SDENet: 基于多尺度注意力质量感知的合成缺陷数据评价网络[J]. 图学学报, 2025, 46(1): 94-103. DOI
	LU Y, CHEN L H, JIANG X H, et al. SDENet: a synthetic defect data evaluation network based on multi-scale attention quality perception[J]. Journal of Graphics, 2025, 46(1): 94-103 (in Chinese). DOI
[9]	安平, 汤旭锋, 杨超, 等. 基于立体感感知的全景图像质量评价算法[J]. 信号处理, 2025, 41(4): 759-769.
	AN P, TANG X F, YANG C, et al. Omnidirectional image quality assessment algorithm based on stereo perception[J]. Journal of Signal Processing, 2025, 41(4): 759-769 (in Chinese).
[10]	冯晨曦, 张地, 林敢, 等. 基于多视口自适应融合的全景图像质量评价[J]. 北京航空航天大学学报, 2025, 51(7): 2404-2414.
	FENG C X, ZHANG D, LIN G, et al. Omnidirectional image quality assessment based on adaptive multi-viewport fusion[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2404-2414 (in Chinese).
[11]	林恒, 纪庆革. 融合显著视口提取与跨层注意力的全景图像质量评价方法[J]. 计算机科学, 2025, 52(9): 249-258.
	LIN H, JI Q G. Panoramic image quality assessment method integrating salient viewport extraction and cross-layer attention[J]. Computer Science, 2025, 52(9): 249-258 (in Chinese).
[12]	何子健, 李冠彬. 基于扩散模型的个性化图像生成方法综述[J/OL]. 软件学报, (2025-11-27) [2025-11-28]. https://doi.org/10.13328/j.cnki.jos.007511.
	HE Z J, LI G B. Review of personalized image generation methods based on diffusion models[J/OL]. Journal of Software, (2025-11-27) [2025-11-28]. https://doi.org/10.13328/j.cnki.jos.007511 (in Chinese).
[13]	LIU T J, LIN W S, KUO C C J. Image quality assessment using multi-method fusion[J]. IEEE Transactions on Image Processing, 2013, 22(5): 1793-1807. DOI URL
[14]	SUN W, GU K, MA S W, et al. A large-scale compressed 360-degree spherical image database: from subjective quality evaluation to objective model comparison[C]// The 20th IEEE International Workshop on Multimedia Signal Processing. New York: IEEE Press, 2018: 1-6.
[15]	DUAN H Y, ZHAI G T, MIN X K, et al. Perceptual quality assessment of omnidirectional images[C]// 2018 IEEE International Symposium on Circuits and Systems. New York: IEEE Press, 2018: 1-5.
[16]	YAN J B, TAN Z W, FANG Y M, et al. Omnidirectional image quality captioning: a large-scale database and a new model[J]. IEEE Transactions on Image Processing, 2025, 34: 1326-1339. DOI PMID
[17]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. DOI PMID
[18]	ZAKHARCHENKO V, CHOI K P, PARK J H. Quality metric for spherical panoramic video[C]// Proceedings of SPIE-The International Society for Optical Engineering. Bellingham: SPIE Press, 2016: 1-9.
[19]	YU M, LAKSHMAN H, GIROD B. A framework to evaluate omnidirectional video coding schemes[C]// 2015 IEEE International Symposium on Mixed and Augmented Reality. New York: IEEE Press, 2015: 31-36.
[20]	MITTAL A, MOORTHY A K, BOVIK A C. No-reference image quality assessment in the spatial domain[J]. IEEE Transactions on Image Processing, 2012, 21(12): 4695-4708. DOI PMID
[21]	HOU W L, GAO X B, TAO D C, et al. Blind image quality assessment via deep learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(6): 1275-1286. DOI PMID
[22]	AGNOLUCCI L, GALTERI L, BERTINI M, et al. ARNIQA: learning distortion manifold for image quality assessment[C]// 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2024: 188-197.
[23]	SUN W, MIN X K, ZHAI G T, et al. MC360IQA: a multi-channel CNN for blind 360-degree image quality assessment[J]. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 64-77. DOI URL
[24]	XU J H, ZHOU W, CHEN Z B. Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(5): 1724-1737. DOI URL

类型	方法	CVIQD↑			OIQA↑
类型	方法	SRCC	PLCC	RMSE	SRCC	PLCC	RMSE
全参考	SSIM	0.897	0.885	6.21	0.889	0.889	6.58
	CPSNR	0.782	0.757	8.76	0.705	0.705	10.10
	SPSNR	0.774	0.747	8.91	0.711	0.715	10.10
无参考	BISQE	0.764	-0.744	9.71	-0.779	0.777	9.04
	NIQE	0.532	-0.512	11.90	-0.331	0.470	12.70
	ANIQA	0.809	0.862	9.74	0.586	0.652	11.40
	360IQA	0.913	0.951	3.09	0.918	0.924	14.60
	VGCN	0.942	0.965	─	─	─	─
	Ours	0.944	0.961	4.03	0.941	0.931	6.18

类型	方法	CVIQD↑			OIQA↑
类型	方法	SRCC	PLCC	RMSE	SRCC	PLCC	RMSE
全参考	SSIM	0.897	0.885	6.21	0.889	0.889	6.58
	CPSNR	0.782	0.757	8.76	0.705	0.705	10.10
	SPSNR	0.774	0.747	8.91	0.711	0.715	10.10
无参考	BISQE	0.764	-0.744	9.71	-0.779	0.777	9.04
	NIQE	0.532	-0.512	11.90	-0.331	0.470	12.70
	ANIQA	0.809	0.862	9.74	0.586	0.652	11.40
	360IQA	0.913	0.951	3.09	0.918	0.924	14.60
	VGCN	0.942	0.965	─	─	─	─
	Ours	0.944	0.961	4.03	0.941	0.931	6.18

类型	方法	CdistR1↑		CdistR2↑		All↑
类型	方法	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC
全参考	SSIM	0.191	0.227	0.280	0.261	0.250	0.299
	CPSNR	0.188	0.220	0.271	0.355	0.248	0.295
	SPSNR	0.216	0.273	0.275	0.359	0.262	0.302
无参考	360IQA	0.426	0.446	0.625	0.626	0.710	0.721
	VGCN	0.479	0.498	0.649	0.654	0.699	0.706
	Our	0.542	0.560	0.666	0.671	0.731	0.740

类型	方法	CdistR1↑		CdistR2↑		All↑
类型	方法	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC
全参考	SSIM	0.191	0.227	0.280	0.261	0.250	0.299
	CPSNR	0.188	0.220	0.271	0.355	0.248	0.295
	SPSNR	0.216	0.273	0.275	0.359	0.262	0.302
无参考	360IQA	0.426	0.446	0.625	0.626	0.710	0.721
	VGCN	0.479	0.498	0.649	0.654	0.699	0.706
	Our	0.542	0.560	0.666	0.671	0.731	0.740

模块配置	CVIQD↑			OIQA↑
模块配置	SRCC	PLCC	RMSE	SRCC	PLCC	RMSE
去掉语义感知	0.900	0.915	4.569	0.920	0.912	6.340
去掉多模态融合	0.921	0.932	4.332	0.929	0.919	6.239
去掉排序损失函数	0.933	0.940	4.281	0.936	0.926	6.180
仅视觉特征	0.885	0.895	4.800	0.896	0.906	6.701
完整模型	0.944	0.961	4.032	0.941	0.931	6.181