欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (5): 980-989.DOI: 10.11996/JG.j.2095-302X.2025050980

• 图像处理与计算机视觉 • 上一篇    下一篇

PanoLoRA:基于Stable Diffusion的全景图像生成的高效微调方法

叶文龙1,3(), 陈斌2,3()   

  1. 1 北京大学地球与空间科学学院北京 100871
    2 北京大学计算机学院北京 100871
    3 智能平行技术国家级重点实验室北京 100871
  • 收稿日期:2024-12-11 接受日期:2025-02-20 出版日期:2025-10-30 发布日期:2025-09-10
  • 通讯作者:陈斌(1973-),男,教授,博士。主要研究方向为虚拟地理环境。E-mail:gischen@pku.edu.cn
  • 第一作者:叶文龙(2000-),男,硕士研究生。主要研究方向为扩散模型。E-mail:2397726787@qq.com
  • 基金资助:
    国家级重点实验室基金(2024JK19)

PanoLoRA: an efficient finetuning method for panoramic image generation based on Stable Diffusion

YE Wenlong1,3(), CHEN Bin2,3()   

  1. 1 School of Earth and Space Sciences, Peking University, Beijing 100871, China
    2 School of Computer Science, Peking University, Beijing 100871, China
    3 National Key Laboratory of Intelligent Parallel Technology, Beijing 100871, China
  • Received:2024-12-11 Accepted:2025-02-20 Published:2025-10-30 Online:2025-09-10
  • First author:YE Wenlong (2000-), master student. His main research interest covers diffusion model. E-mail:2397726787@qq.com
  • Supported by:
    Fund of National Key Laboratory(2024JK19)

摘要:

全景图像能表达周围环境整体的信息,已成为构建虚拟场景的重要表达方式之一。但在人工智能生成内容(AIGC)技术,尤其是大规模文本-图像数据集上训练的扩散模型和高效参数微调技术(PEFT)兴起的浪潮中,全景图像的生成和快速迁移的研究却尚不充分。因此,针对全景图像数据集稀少、空间失真的特点,收集了一个总计14 000张的开源全景图像数据集,通过投影转换对其进行了精细化的文本标注与筛选,在此基础上,提出了PanoLoRA方法。该方法在原有的卷积和自注意力模块提取空间特征的过程中,额外添加了球面卷积和LoRA模块,显式地提取全景图像球面特征,并与原有平面特征相融合,从而在保留了Stable Diffusion原有的强大图文生成能力的同时,实现了全景图像生成的高效迁移学习。实验结果表明,PanoLoRA在所收集到的文本-全景图像数据集上与最新的5种参数高效微调方法进行了比较,并取得了全面的优势,提高了图像生成的质量和图文一致性,并进行了一系列消融实验,验证了每个算法模块的有效性。

关键词: 扩散模型, 全景图像, 参数高效微调, 迁移学习, LoRA

Abstract:

Panoramic images, which can express the overall information of the surrounding environment, have become an important way to construct virtual scenes. However, amidst the rise of artificial intelligence generated content (AIGC) technology, especially diffusion models trained on large-scale text image datasets and parameter-efficient fine-tuning (PEFT) techniques, research on the generation and rapid transfer of panoramic images is still insufficient. To address the challenges posed by the scarcity and spatial distortion of panoramic image datasets, 14 000 open-source panoramic image datasets were collected, finely annotated, and filtered through projection transformation. Based on this, the PanoLoRA method was proposed. In the process of extracting spatial features from the original convolution and self-attention modules, PanoLoRA additionally incorporated spherical convolution and LoRA (low-rank adaptation) modules. This enabled the explicit extraction of spherical features from panoramic images, which were then fused with the original planar features, thereby achieving efficient transfer learning for panoramic image generation while retaining the strong image generation ability of Stable Diffusion. The experimental results demonstrated that PanoLoRA outperformed the latest 5 Parameter-Efficient Fine-Tuning methods in comparison tests using the collected text panoramic image dataset, achieving comprehensive advantages and improving the quality of image generation and graphic consistency. A series of ablation experiments were conducted to verify the effectiveness of each algorithm module.

Key words: diffusion model, panoramic image, parameter-efficient fine-tuning, transfer learning, LoRA

中图分类号: