Welcome to Journal of Graphics

Journal of Graphics ›› 2026, Vol. 47 ›› Issue (2): 264-274.DOI: 10.11996/JG.j.2095-302X.2026020264

• Image Processing and Computer Vision • Previous Articles     Next Articles

Video attractiveness assessment method for scenic live stream recommendations

ZHOU Qiang1, HUANG Yaoqiu2, SHI Weimin1, ZHOU Zhong1()   

  1. 1 School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    2 School of Software, Beihang University, Beijing 100191, China
  • Received:2025-09-08 Accepted:2025-11-25 Online:2026-04-30 Published:2026-05-20
  • Contact: ZHOU Zhong
  • Supported by:
    National Natural Science Foundation of China(62272018);National Natural Science Foundation of China(62206184);Science and Technology Project of Hainan Provincial Department of Transportation(HNJTT-KXC2024-3-22-02)

Abstract:

With the proliferation of 5G, cloud computing, and audio-video technologies, live streaming has emerged as a pivotal medium for online cultural tourism. However, mainstream multi-camera “slow live broadcasts” lack human-guided narration and scripting, resulting in high content randomness that undermines traditional recommendation methods based on user preferences or video popularity. To address this limitation, video attractiveness assessment method was proposed to predict audience engagement by evaluating how multi-source video content stimulated viewer attention and emotional resonance. This approach proved more suitable for scenic-area live streaming scenarios than conventional methods. Centered on video attractiveness, a multi-perspective guided video- description generation method was developed and leveraged a Large Vision-Language Model (LVLM) to extract key information, structure content representations, and infer emotional semantics, synthesizing them into readable descriptive texts and attractiveness factors. Secondly, a multimodal feature fusion-based attractiveness assessment method integrated cross-attention mechanisms, dynamic saliency, and negative sample augmentation within a contrastive-learning network to output attractiveness scores and critical factors. Finally, an attractiveness driven live-streaming system prototype for scenic areas was implemented, featuring channel recommendation, attractiveness visualization, and AI-guided navigation. Validation on the TVSum50 dataset was conducted and demonstrated a 7.00% improvement in video-description relevance over raw descriptions and a 6.00% gain in cross-task generalization. On a self-built scenic live streaming dataset, the multimodal attractiveness evaluation method achieved a 24.00% higher accuracy than unimodal baselines.

Key words: video attraction analysis, multimodal fusion, scenic spot live-streaming, intelligent recommendation, large vision-language model

CLC Number: