Welcome to Journal of Graphics

Journal of Graphics ›› 2026, Vol. 47 ›› Issue (2): 332-340.DOI: 10.11996/JG.j.2095-302X.2026020332

• Image Processing and Computer Vision • Previous Articles     Next Articles

Perceptually-aligned panoramic image quality assessment via global semantic feature fusion

BAO Yongtang, WANG Moqin, WANG Zhihui, MA Guangxiao()   

  1. College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao Shandong 266590, China
  • Received:2025-10-09 Accepted:2025-12-06 Online:2026-04-30 Published:2026-05-20
  • Contact: MA Guangxiao
  • Supported by:
    Shandong Provincial Natural Science Foundation(ZR2024QF267);Qingdao Natural Science Foundation(24-4-4-zrjj-90-jch);Qingdao Natural Science Foundation(24-4-4-zrjj-126-jch);Qingdao Science and Technology Benefits the People Project(25-1-5-smjk-18-nsh)

Abstract:

Panoramic Image Quality Assessment aims to objectively reflect the subjective perceptual quality of immersive visual content. However, a significant discrepancy often exists between the objective predictions of current deep learning models and human subjective perception, primarily due to an over-reliance on low-level distortion features. To address this critical issue, a novel Hierarchical Semantic-Guided Network, was proposed, which emulated the “top-down” cognitive mechanism inherent in the human visual system. Prevailing methods predominantly follow a “bottom-up” paradigm, aggregating quality scores from pixel-level features. however, this process often fails to effectively integrate high-level semantic information such as global composition and aesthetic attributes, thereby limiting the performance ceiling. To this end, a dual-path parallel information processing architecture was constructed, centered around a “top-down” semantic attention modulation mechanism. Within this architecture, a semantic prior path leveraged a Vision-Language Model to parse the input image into a structured semantic embedding. Concurrently, a visual representation path extracted multi-scale feature maps using a deep convolutional network. The designed modulation mechanism utilized the semantic embedding as a conditional input to generate dynamic attention weights, which performed real-time recalibration of the multi-scale features in the visual path. This design ensured that the entire feature extraction process was guided by high-level semantics, thereby focusing on information most critical to human subjective judgment. To ensure the ordinal relationship of the model’s predictions aligns with human perception, the entire framework was optimized end-to-end via a composite objective function that incorporated a listwise ranking loss. Comprehensive experiments on three public benchmark datasets, CVIQD, OIQA, and OIQ-10K, demonstrated that the proposed framework significantly outperformed state-of-the-art methods, validating the effectiveness and novelty of the semantic-guided paradigm in advancing perceptual quality assessment tasks.

Key words: panoramic image quality assessment, perceptual alignment, vision-language model, no-reference quality assessment

CLC Number: