Journal of Graphics ›› 2025, Vol. 46 ›› Issue (5): 980-989.DOI: 10.11996/JG.j.2095-302X.2025050980
• Image Processing and Computer Vision • Previous Articles Next Articles
YE Wenlong1,3(), CHEN Bin2,3(
)
Received:
2024-12-11
Accepted:
2025-02-20
Online:
2025-10-30
Published:
2025-09-10
Contact:
CHEN Bin
About author:
First author contact:YE Wenlong (2000-), master student. His main research interest covers diffusion model. E-mail:2397726787@qq.com
Supported by:
CLC Number:
YE Wenlong, CHEN Bin. PanoLoRA: an efficient finetuning method for panoramic image generation based on Stable Diffusion[J]. Journal of Graphics, 2025, 46(5): 980-989.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025050980
方法 | 参数量/M | FID | KID×1000 | CLIP score | |||
---|---|---|---|---|---|---|---|
室内 | 室外 | 室内 | 室外 | 室内 | 室外 | ||
BitFit | 0.34 | 24.07 | 24.73 | 11.24 | 6.64 | 22.38 | 22.06 |
Bias-Norm tuning | 0.44 | 21.36 | 24.44 | 8.37 | 6.94 | 22.60 | 22.15 |
Adapter (dim=48) | 3.63 | 19.56 | 22.13 | 5.84 | 6.11 | 22.42 | 22.11 |
LoRA (r=8) | 3.39 | 20.08 | 22.64 | 6.92 | 5.98 | 22.60 | 22.18 |
Lycoris (r=2) | 2.85 | 19.62 | 22.48 | 6.08 | 5.71 | 22.66 | 22.30 |
PanoLoRA (γ=64) | 3.14 | 18.63 | 20.97 | 5.26 | 4.81 | 22.66 | 22.30 |
Table 1 Quantitative evaluations on the test set
方法 | 参数量/M | FID | KID×1000 | CLIP score | |||
---|---|---|---|---|---|---|---|
室内 | 室外 | 室内 | 室外 | 室内 | 室外 | ||
BitFit | 0.34 | 24.07 | 24.73 | 11.24 | 6.64 | 22.38 | 22.06 |
Bias-Norm tuning | 0.44 | 21.36 | 24.44 | 8.37 | 6.94 | 22.60 | 22.15 |
Adapter (dim=48) | 3.63 | 19.56 | 22.13 | 5.84 | 6.11 | 22.42 | 22.11 |
LoRA (r=8) | 3.39 | 20.08 | 22.64 | 6.92 | 5.98 | 22.60 | 22.18 |
Lycoris (r=2) | 2.85 | 19.62 | 22.48 | 6.08 | 5.71 | 22.66 | 22.30 |
PanoLoRA (γ=64) | 3.14 | 18.63 | 20.97 | 5.26 | 4.81 | 22.66 | 22.30 |
Fig. 5 Comparison of visualization results of 3 kinds of scenes on test set among the state-of-the-art methods and our PanoLoRA ((a) Wild; (b) Urban; (c) Indoor)
模块 | 参数量/M | FID | KID×1000 | CLIP score |
---|---|---|---|---|
PanoLoRA(default) | 3.14 | 19.80 | 5.03 | 22.48 |
w/o Sphere LoRA | 3.28 | 30.36 | 14.48 | 22.12 |
w/o SA Q/K LoRA | 3.17 | 21.10 | 6.03 | 22.38 |
w/o 球面卷积 | 3.15 | 20.98 | 5.91 | 22.46 |
w/o 通道合并 | 3.15 | 20.40 | 5.45 | 22.48 |
w/o 复制权重 | 3.14 | 20.39 | 5.11 | 22.43 |
Table 2 Ablation studies of each module
模块 | 参数量/M | FID | KID×1000 | CLIP score |
---|---|---|---|---|
PanoLoRA(default) | 3.14 | 19.80 | 5.03 | 22.48 |
w/o Sphere LoRA | 3.28 | 30.36 | 14.48 | 22.12 |
w/o SA Q/K LoRA | 3.17 | 21.10 | 6.03 | 22.38 |
w/o 球面卷积 | 3.15 | 20.98 | 5.91 | 22.46 |
w/o 通道合并 | 3.15 | 20.40 | 5.45 | 22.48 |
w/o 复制权重 | 3.14 | 20.39 | 5.11 | 22.43 |
[1] | ARGYRIOU L, ECONOMOU D, BOUKI V. Design methodology for 360° immersive video applications: the case study of a cultural heritage virtual tour[J]. Personal and Ubiquitous Computing, 2020, 24(6): 843-859. |
[2] | KITTEL A, LARKIN P, CUNNINGHAM I, et al. 360° virtual reality: a SWOT analysis in comparison to virtual reality[J]. Frontiers in Psychology, 2020, 11: 563474. |
[3] | SOMANATH G, KURZ D. HDR environment map estimation for real-time augmented reality[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11293-11301. |
[4] | KINZIG C, CORTÉS I, FERNÁNDEZ C, et al. Real-time seamless image stitching in autonomous driving[C]// 2022 25th International Conference on Information Fusion. New York: IEEE Press, 2022: 1-8. |
[5] | WU S S, TANG H, JING X Y, et al. Cross-view panorama image synthesis[J]. IEEE Transactions on Multimedia, 2022, 25: 3546-3559. |
[6] | FENG M Y, LIU J L, CUI M M, et al. Diffusion360: seamless 360 degree panoramic image generation based on diffusion models[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2311.13141. |
[7] | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 574. |
[8] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10674-10685. |
[9] | ZAKEN E B, GOLDBERG Y, RAVFOGEL S. BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2106.10199. |
[10] | HU E J, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2106.09685. |
[11] | HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/1902.00751. |
[12] | YEH S Y, HSIEH Y G, GAO Z D, et al. Navigating text-to-image customization: from LyCORIS fine-tuning to model evaluation[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2309.14859 |
[13] | TANG N Y, FU M H, ZHU K, et al. Low-rank attention side-tuning for parameter-efficient fine-tuning[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2402.04009. |
[14] | COORS B, CONDURACHE A P, GEIGER A. SphereNet: learning spherical representations for detection and classification in omnidirectional images[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 525-541. |
[15] | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6629-6640. |
[16] | BIŃKOWSKI M, SUTHERLAND D J, ARBEL M, et al. Demystifying MMD GANs[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/1801.01401. |
[17] | HESSEL J, HOLTZMAN A, FORBES M, et al. CLIPScore: a reference-free evaluation metric for image captioning[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2104.08718. |
[18] | RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents[EB/OL]. [2024-12-01]. https://3dvar.com/Ramesh2022Hierarchical.pdf. |
[19] | SAHARIA C, HO J, CHAN W, et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4713-4726. |
[20] | BROOKS T, PEEBLES B, HOMES C, et al. Video generation models as world simulators[EB/OL]. [2024-12-01]. https://openai.com/research/video-generation-models-as-world-simulators. |
[21] |
张冀, 崔文帅, 张荣华, 等. 基于关键视图的文本驱动3D场景编辑方法[J]. 图学学报, 2024, 45(4): 834-844.
DOI |
ZHANG J, CUI W S, ZHANG R H, et al. A text-driven 3D scene editing method based on key views[J]. Journal of Graphics, 2024, 45(4): 834-844 (in Chinese).
DOI |
|
[22] |
王吉, 王森, 蒋智文, 等. 基于深度条件扩散模型的零样本文本驱动虚拟人生成方法[J]. 图学学报, 2023, 44(6): 1218-1226.
DOI |
WANG J, WANG S, JIANG Z W, et al. Zero-shot text-driven avatar generation based on depth-conditioned diffusion model[J]. Journal of Graphics, 2023, 44(6): 1218-1226 (in Chinese).
DOI |
|
[23] | SONG Y, SOHL-DICKSTEIN J, KINGMA D P, et al. Score-based generative modeling through stochastic differential equations[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2011.13456. |
[24] | SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2010.02502. |
[25] | DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[C]// The 35th International Conference on Neural Information Processing Systems. New York: ACM, 2021: 672. |
[26] | AKIMOTO N, MATSUO Y, AOKI Y. Diverse plausible 360-degree image outpainting for efficient 3DCG background creation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 11431-11440. |
[27] | DASTJERDI M R K, HOLD-GEOFFROY Y, EISENMANN J, et al. Guided co-modulated GAN for 360° field of view extrapolation[C]// 2022 International Conference on 3D Vision. New York: IEEE Press, 2022: 475-485. |
[28] | WU T H, ZHENG C X, CHAM T J. IPO-LDM:depth-aided 360-degree indoor RGB panorama outpainting via latent diffusion model[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2307.03177v1. |
[29] | CHEN Z X, WANG G C, LIU Z W. Text2Light: zero-shot text-driven HDR panorama generation[J]. ACM Transactions on Graphics (TOG), 2022, 41(6): 195. |
[30] | ESSER P, ROMBACH R, OMMER B. Taming transformers for high-resolution image synthesis[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12868-12878. |
[31] | TANG S T, ZHANG F Y, CHEN J C, et al. MVDiffusion: enabling holistic multi-view image generation with correspondence-aware diffusion[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 2229. |
[32] | RUIZ N, LI Y Z, JAMPANI V, et al. DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22500-22510. |
[33] | ACHIAM J, ADLER S, AGARWAL S, et al. Gpt-4 technical report[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2303.08774. |
[34] | ESSER P, KULAL S, BLATTMANN A, et al. Scaling rectified flow transformers for high-resolution image synthesis[C]// The 41st International Conference on Machine Learning. New York: ACM, 2024: 503. |
[35] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2024-12-01]. https://dblp.org/db/conf/iclr/iclr2021.html#DosovitskiyB0WZ21. |
[36] | ZHANG R R, HAN J M, LIU C, et al. LLaMA-adapter: efficient fine-tuning of language models with zero-init attention[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2303.16199. |
[37] | KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. [2024-12-01]. https://dblp.org/db/conf/iclr/iclr2014.html#KingmaW13. |
[38] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2103.00020. |
[39] | SIFRE L, MALLAT S. Rigid-motion scattering for texture classification[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/1403.1687. |
[40] | ZHENG J, ZHANG J F, LI J, et al. Structured3d: a large photo-realistic dataset for structured 3D modeling[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 519-535. |
[41] | YANG W Y, QIAN Y L, KÄMÄRÄINEN J K, et al. Object detection in equirectangular panorama[C]// The 24th International Conference on Pattern Recognition. New York: IEEE Press, 2018: 2190-2195. |
[42] | CIRIK V, BERG-KIRKPATRICK T, MORENCY L P. Refer360°: a referring expression recognition dataset in 360° images[C]// The 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 7189-7202. |
[43] | DENG X, WANG H, XU M, et al. LAU-Net: latitude adaptive upscaling network for omnidirectional image super-resolution[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 9185-9194. |
[44] | ZHANG Y D, SONG S R, TAN P, et al. PanoContext: a whole-room 3D context model for panoramic scene understanding[C]// The 13th European Conference on Computer Vision. Cham: Springer, 2014: 668-686. |
[45] | CAO M D, MOU C, YU F H, et al. NTIRE 2023 challenge on 360° omnidirectional image and video super-resolution: datasets, methods and results[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 1731-1745. |
[46] | ORHAN S, BASTANLAR Y. Semantic segmentation of outdoor panoramic images[J]. Signal, Image and Video Processing, 2022, 16(3): 643-650. |
[47] | CHANG S H, CHIU C Y, CHANG C S, et al. Generating 360 outdoor panorama dataset with reliable sun position estimation[C]// SIGGRAPH Asia 2018 Posters. New York: ACM, 2018: 22. |
[48] | LI J N, LI D X, XIONG C M, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2201.12086. |
[49] | AKIMOTO N, KASAI S, HAYASHI M, et al. 360-degree image completion by two-stage conditional gans[C]// 2019 IEEE International Conference on Image Processing. New York: IEEE Press, 2019: 4704-4708. |
[50] | HO J, SALIMANS T. Classifier-free diffusion guidance[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2207.12598. |
[51] | LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/1711.05101. |
[52] | LIU L P, REN Y, LIN Z J, et al. Pseudo numerical methods for diffusion models on manifolds[EB/OL]. [2024-12-01]. https://arxiv.org/pdf/2202.09778. |
[53] | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2818-2826. |
[1] | CHEN Zhizhang, FENG Yingchaojie, WENG Luoxuan, SHEN Jian, CHEN Wei. DRec: large language model-driven data analysis recommendation system [J]. Journal of Graphics, 2025, 46(5): 1028-1041. |
[2] | LEI Songlin, ZHAO Zhengpeng, YANG Qiuxia, PU Yuanyuan, GU Jinjing, XU Dan. Zero-shot style transfer based on decoupled diffusion models [J]. Journal of Graphics, 2025, 46(4): 727-738. |
[3] | SUN Heyi, LI Yixiao, TIAN Xi, ZHANG Songhai. Image to 3D vase generation technology combining procedural content generation and diffusion models [J]. Journal of Graphics, 2025, 46(2): 332-344. |
[4] | LI Jiyuan, GUAN Zheyu, SONG Haichuan, TAN Xin, MA Lizhuang. Human-in-the-loop field-specific logo generation method [J]. Journal of Graphics, 2025, 46(2): 382-392. |
[5] | TU Qinghao, LI Yuanqi, LIU Yifan, GUO Jie, GUO Yanwen. Generalization optimization method for text to material texture maps based on diffusion model [J]. Journal of Graphics, 2025, 46(1): 139-149. |
[6] | XIE Wenxiang, XU Weiwei. Active view selection for radiance fields using surface object points [J]. Journal of Graphics, 2025, 46(1): 179-187. |
[7] | XU Pei, HUANG Kaiqi. An efficient reinforcement learning method based on large language model [J]. Journal of Graphics, 2024, 45(6): 1165-1177. |
[8] | ZHANG Ji, CUI Wenshuai, ZHANG Ronghua, WANG Wenbin, LI Yaqi. A text-driven 3D scene editing method based on key views [J]. Journal of Graphics, 2024, 45(4): 834-844. |
[9] | WANG Da-fu, WANG Jing, SHI Yu-kai, DENG Zhi-wen, JIA Zhi-yong. Research on image privacy detection based on deep transfer learning [J]. Journal of Graphics, 2023, 44(6): 1112-1120. |
[10] | WANG Ji, WANG Sen, JIANG Zhi-wen, XIE Zhi-feng, LI Meng-tian. Zero-shot text-driven avatar generation based on depth-conditioned diffusion model [J]. Journal of Graphics, 2023, 44(6): 1218-1226. |
[11] | XIE Hong-xia, HU Yu-ning, ZHANG Yun, WANG Ya-qi, DU Hui, QIN Ai-hong. Survey of methods for scene analysis and content processing in panoramic images and videos [J]. Journal of Graphics, 2023, 44(4): 640-657. |
[12] | CHEN Tian-xiang, CHEN Bin . A fast construction method of 6-DOF field virtual environment based on panoramic video image [J]. Journal of Graphics, 2022, 43(5): 901-908. |
[13] | FAN Xin-nan, HUANG Wei-sheng, SHI Peng-fei, XIN Yuan-xue, ZHU Feng-ting, ZHOU Run-kang. Embedded substation instrument detection algorithm based on improved YOLOv4 [J]. Journal of Graphics, 2022, 43(3): 396-403. |
[14] | DU Chao, LIU Gui-hua . Improved VGG Neural Network Applied to Defect Detection of Diode Glass Bulb Image [J]. Journal of Graphics, 2019, 40(6): 1087-1092. |
[15] | HU Bin1,2, PAN Yu1, DING Weiping1, SHAO Yeqin3, YANG Cheng1 . Person Re-Identification Based on Transfer Learning [J]. Journal of Graphics, 2018, 39(5): 886-891. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||