图学学报 ›› 2024, Vol. 45 ›› Issue (6): 1117-1131.DOI: 10.11996/JG.j.2095-302X.2024061117
收稿日期:
2024-08-01
接受日期:
2024-10-02
出版日期:
2024-12-31
发布日期:
2024-12-24
通讯作者:
汪淼(1988-),男,副教授,博士。主要研究方向为计算机图形学、虚拟现实和增强现实。E-mail:miaow@buaa.edu.cn第一作者:
杨浩中(2000-),男,博士研究生。主要研究方向为虚拟现实与增强现实。E-mail:haozhongY@buaa.edu.cn
基金资助:
YANG Haozhong1(), KONG Xiaoyu1, GU Ruikun1, WANG Miao1,2(
)
Received:
2024-08-01
Accepted:
2024-10-02
Published:
2024-12-31
Online:
2024-12-24
Contact:
WANG Miao (1988-), associate professor, Ph.D. His main research interests cover computer graphics, virtual reality and augmented reality. E-mail:miaow@buaa.edu.cnFirst author:
YANG Haozhong (2000-), Ph.D candidate. His main research interests cover virtual reality and augmented reality. E-mail:haozhongY@buaa.edu.cn
Supported by:
摘要:
随着计算机技术的发展,虚拟现实(VR)技术日趋成熟,已在多种应用场景下为用户带来了沉浸式和高质量的体验,成为计算机图形学与人机交互领域的重要研究方向。大模型技术作为近来备受关注的研究热点,吸引了大量学者的关注,并为各个领域的经典问题提供了新的解决方法和思路。然而,目前在VR领域,关于大模型技术应用进展的综述性研究仍然匮乏。为弥补这一研究空白并进一步启发后续工作,本文搜集、整理并归纳了近年来在VR环境中与大模型相关的研究论文,对大模型技术的原理和代表性模型分类概述,并从内容生成和人机交互2个方面详细分析大模型技术的研究进展和应用场景,最后总结探讨了VR环境中使用大模型所面临的困难和挑战,并展望其未来发展趋势。
中图分类号:
杨浩中, 孔笑宇, 辜睿坤, 汪淼. 虚拟现实中的大模型技术研究进展与趋势[J]. 图学学报, 2024, 45(6): 1117-1131.
YANG Haozhong, KONG Xiaoyu, GU Ruikun, WANG Miao. Research progress and trends in large model technologies for virtual reality[J]. Journal of Graphics, 2024, 45(6): 1117-1131.
图4 基于不同架构的文本驱动人体生成工作((a)基于CLIP的虚拟化身构建[22];(b)基于Diffusion的虚拟化身构建[57])
Fig. 4 Text-driven human body generation based on different architectures ((a) CLIP-based virtual avatar construction[22]; (b) Diffusion-based virtual avatar construction[57])
[1] | SEENIVASAN, LALITHKUMAR, KANNAN G, et al. SurgicalGPT: end-to-end language-vision GPT for visual question answering in surgery[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Vancouver: Springer, 2023: 281-290. |
[2] | WANG S, ZHAO Z H, OUYANG X, et al. ChatCAD: interactive computer-aided diagnosis on medical image using large language models[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2302.07257. |
[3] | SHEN Y, SONG K, TAN X, et al. HuggingGPT: solving ai tasks with chatGPT and its friends in hugging face[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 38154-38180. |
[4] | 赵沁平. 抓住机遇, 专注创新, 发展互联网3.0技术与应用[J]. 科技导报, 2023, 41(15): 1-1. |
ZHAO Q P. Seize opportunities, focus on innovation, and develop Internet 3.0 technologies and applications[J]. Science and Technology Review, 2023, 41(15): 1-1 (in Chinese) | |
[5] | DALE R. GPT-3: What’s it good for?[J]. Natural Language Engineering, 2021, 27(1): 113-118. |
[6] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// The 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186. |
[7] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// International conference on machine learning. New York: ACM, 2021: 8748-8763. |
[8] | HIRZLE T, MULLER F, DRAXLER F, et al. When XR and AI meet-a scoping review on extended reality and artificial intelligence[C]// The 2023 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2023: 1-45. |
[9] | 赵沁平. 虚拟现实综述[J]. 中国科学: 信息科学, 2009, (1): 2-46. |
ZHAO Q P. Virtual reality overview[J]. Science China : Information Science, 2009, (1): 2-46 (in Chinese) | |
[10] | ANTHES C, GARCÍA-HERNÁNDEZ R J, WIEDEMANN M, et al. State of the art of virtual reality technology[C]// 2016 IEEE Aerospace Conference. New York: IEEE Press, 2016: 1-19. |
[11] | BOMMASANI R, HUDSON D A, ADELI E, et al. On the opportunities and risks of foundation models[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2108.07258. |
[12] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[13] | HONG Y C, ZHANG K, GU J X, et al. LRM: large reconstruction model for single image to 3D[EB/OL]. [2024-07-29]. https://arxiv.org/html/2311.04400v2. |
[14] | LI J H, TAN H, ZHANG K, et al. Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model[EB/OL]. [2024-07-29]. https://arxiv.org/pdf/2311.06214. |
[15] | XU Y H, SHI Z F, YIFAN W, et al. GRM: large gaussian reconstruction model for efficient 3D reconstruction and generation[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2403.14621. |
[16] | WEI X Y, ZHANG K, BI S, et al. MeshLRM: large reconstruction model for high-quality mesh[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2404.12385. |
[17] | TEVET G, GORDON B, HERTZ A, et al. MotionCLIP: exposing human motion generation to clip space[C]// European Conference on Computer Vision. Cham: Springer, 2022: 358-374. |
[18] | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 6840-6851. |
[19] | WU S, LIN Y T, ZHANG F H, et al. Direct3D: scalable image-to-3D generation via 3D latent diffusion transformer[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2405.14832. |
[20] | WU K L, LIU F F, CAI Z H, et al. Unique3D: high-quality and efficient 3D mesh generation from a single image[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2405.20343. |
[21] | LIU Q H, ZHANG Y, BAI S, et al. DIRECT-3D: learning direct text-to-3D generation on massive noisy 3D Data[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 6881-6891. |
[22] | HONG F Z, ZHANG M Y, PAN L, et al. AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars[J]. ACM Transactions on Graphics, 2022, 41(4): 1-19. |
[23] | GU A, DAO T. Mamba: linear-time sequence modeling with selective state spaces[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2312.00752. |
[24] | SHEN Q H, YI X Y, WU Z K, et al. Gamba: marry gaussian splatting with mamba for single view 3d reconstruction[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2403.18795. |
[25] | YAMAZAKI T, MIZUMOTO T, YOSHIKAWA K, et al. An open-domain avatar chatbot by exploiting a large language model[C]// The 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue. New York: ACM, 2023: 428-432. |
[26] | LAKHNATI Y, PASCHER M, GERKEN J. Exploring a GPT-based large language model for variable autonomy in a VR-based human-robot teaming simulation[J]. Frontiers in Robotics and AI, 2024, 11: 1347538. |
[27] | KHELIFI S, MORRIS A. Mixed reality IoT smart environments with large language model agents[C]// 2024 IEEE 4th International Conference on Human-Machine Systems. New York: IEEE Press, 2024: 1-7. |
[28] | PARK S, MENASSA C C, KAMAT V R. Integrating large language models with multimodal virtual reality interfaces to support collaborative human-robot construction work[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2404.03498. |
[29] | OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]// The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 27730-27744. |
[30] | JAIN A, MILDENHALL B, BARRON J T, et al. Zero-shot text-guided object generation with dream fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 867-876. |
[31] | MICHEL O, BAR-ON R, LIU R, et al. Text2mesh: text-driven neural stylization for meshes[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13492-13502. |
[32] | MOHAMMAD KHALID N, XIE T, BELILOVSKY E, et al. Clip-mesh: generating textured meshes from text using pretrained image-text models[C]// Special Interest Group on Computer Graphics and Interactive Techniques Conference Asia 2022. New York: ACM, 2022: 1-8. |
[33] | BOZKIR E, ÖZDEL S, LAU K H C, et al. Embedding large language models into extended reality:opportunities and challenges for inclusion, engagement, and privacy[C]// The 6th ACM Conference on Conversational User Interfaces. New York: ACM, 2024: 1-7. |
[34] | HUANG D, XIANG K, ZHANG X. Privacy preservation of large language models in the metaverse era: research frontiers, categorical comparisons, and future directions[EB/OL]. [2024-07-29]. https://advance.sagepub.com/doi/full/10.22541/au.171308243.36746318/v1. |
[35] | CUI R K, SONG X B, SUN W X, et al. LAM3D: large image-point-cloud alignment model for 3D reconstruction from single image[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2405.15622. |
[36] | XIE R G, ZHENG W T, HUANG K, et al. LDM: large tensorial SDF model for textured mesh generation[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2405.14580. |
[37] | CHEN S J, CHEN X, PANG A Q, et al. MeshXL: neural coordinate field for generative 3D foundation models[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2405.20853. |
[38] | WANG Z Y, WANG Y K, CHEN Y F, et al. CRM: single image to 3D textured mesh with convolutional reconstruction model[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2403.05034. |
[39] | ZHANG L W, WANG Z Y, ZHANG Q X, et al. CLAY: a controllable large-scale generative model for creating high-quality 3D assets[J]. ACM Transactions on Graphics, 2024, 43(4): 1-20. |
[40] | XU J L, CHENG W H, GAO Y M, et al. InstantMesh: efficient 3D mesh generation from a single image with sparse-view large reconstruction models[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2404.07191. |
[41] | ZHUANG P Y, HAN S F, WANG C Y, et al. GTR: improving large 3D reconstruction models through geometry and texture refinement[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2406.05649. |
[42] | CHEN Y W, HE T, HUANG D, et al. MeshAnything: artist-created mesh generation with autoregressive transformers[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2406.10163. |
[43] | MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[C]// European Conference on Computer Vision. Cham: Springer, 2020: 405-421. |
[44] | POOLE B, JAIN A, BARRON J T, et al. DreamFusion: text-to-3D using 2D diffusion[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2209.14988. |
[45] | METZER G, RICHARDSON E, PATASHNIK O, et al. Latent-nerf for shape-guided generation of 3d shapes and textures[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12663-12673. |
[46] | LIN C H, GAO J, TANG L M, et al. Magic3D: high-resolution text-to-3D content creation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 300-309. |
[47] | WANG Z Y, LU C, WANG Y K, et al. ProlificDreamer: high-fidelity and diverse text-to-3d generation with variational score distillation[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 8406-8441. |
[48] | WANG P, TAN H, BI S, et al. PF-LRM: pose-free large reconstruction model for joint pose and shape prediction[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2311.12024. |
[49] | JIANG H W, HUANG Q X, PAVLAKOS G. Real3D: scaling up large reconstruction models with real-world images[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2406.08479. |
[50] | QI Z Y, YANG Y H, ZHANG M C, et al. Tailor3D: customized 3D assets editing and generation with dual-side images[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2407.06191. |
[51] | KERBL B, KOPANAS G, LEIMKÜHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139:1-139:14. |
[52] | LIANG Y X, YANG X, LIN J D, et al. LucidDreamer: towards high-fidelity text-to-3D generation via interval score matching[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle. New York: IEEE Press, 2024: 6517-6526. |
[53] | ZOU Z X, YU Z P, GUO Y C, et al. Triplane meets gaussian splatting: fast and generalizable single-view 3D reconstruction with transformers[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 10324-10335. |
[54] | YI X Y, WU Z K, SHEN Q H, et al. MVGamba: unify 3D content generation as state space sequence modeling[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2406.06367. |
[55] | REN J W, XIE K, MIRZAEI A, et al. L4GM: large 4D gaussian reconstruction model[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2406.10324. |
[56] | TANG J X, CHEN Z X, CHEN X K, et al. LGM: large multi-view gaussian model for high-resolution 3D content creation[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2402.05054. |
[57] | LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34(6): 1-16. |
[58] | CAO Y K, CAO Y P, HAN K, et al. DreamAvatar: text-and-shape guided 3D human avatar generation via diffusion models[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 958-968. |
[59] |
王吉, 王森, 蒋智文, 等. 基于深度条件扩散模型的零样本文本驱动虚拟人生成方法[J]. 图学学报, 2023, 44(6): 1218-1226.
DOI |
WANG J, WANG S, JIANG Z W, et al. Zero-shot text-driven avatar generation based on depth-conditioned diffusion model[J]. Journal of Graphics, 2023, 44(6): 1218-1226. (in Chinese)
DOI |
|
[60] | JIANG R X, WANG C, ZHANG J B, et al. AvatarCraft: transforming text into neural human avatars with parameterized shape and pose control[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 14371-14382. |
[61] | HUANG Y K, WANG J, ZENG A L, et al. DreamWaltz: make a scene with complex 3D animatable avatars[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 4566-4584. |
[62] | KOLOTOUROS N, ALLDIECK T, ZANFIR A, et al. DreamHuman: animatable 3D avatars from text[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 10516-10529. |
[63] | ALLDIECK T, XU H, SMINCHISESCU C. ImGHUM: implicit generative models of 3D human shape and articulated pose[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 5461-5470. |
[64] | LIAO T T, YI H W, XIU Y L, et al. Tada! text to animatable digital avatars[C]// 2024 International Conference on 3D Vision. New York: IEEE Press, 2024: 1508-1519. |
[65] | PAVLAKOS G, CHOUTAS V, GHORBANI N, et al. Expressive body capture: 3d hands, face, and body from a single image[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10975-10985. |
[66] | HUANG X, SHAO R Z, ZHANG Q W, et al. HumanNorm: learning normal diffusion model for high-quality and realistic 3D human generation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 4568-4577. |
[67] | WENG Z Z, WANG Z Y, YEUNG S. ZeroAvatar: zero-shot 3D avatar generation from a single image[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2305.16411. |
[68] | ZENG Y F, LU Y X, JI X Y, et al. AvatarBooth: high-quality and customizable 3D human avatar generation[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2306.09864. |
[69] | HAN X, CAO Y K, HAN K, et al. HeadSculpt: crafting 3D head avatars with text[C]. The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 36. |
[70] | LIU H Y, WANG X, WAN Z Y, et al. HeadArtist: text-conditioned 3D head generation with self score distillation[C]// Special Interest Group on Computer Graphics and Interactive Techniques Conference. papers. New York: ACM, 2024: 1-12. |
[71] | MENDIRATTA M, PAN X, ELGHARIB M, et al. AvatarStudio: text-driven editing of 3D dynamic human head avatars[J]. ACM Transactions on Graphics, 2023, 42(6): 1-18. |
[72] | DING Z, ZHANG X E, XIA Z H, et al. DiffusionRig: learning personalized priors for facial appearance editing[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12736-12746. |
[73] | ZHANG H, FENG Y, KULITS P, et al. Text-guided generation and editing of compositional 3D avatars[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2309.07125. |
[74] | ZHANG J R, ZHANG Y S, CUN X D, et al. Generating human motion from textual descriptions with discrete representations[C]// 2023 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 14730-14740. |
[75] | ZHONG C Y, HU L, ZHANG Z H, et al. AttT2M: text-driven human motion generation with multi-perspective attention mechanism[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 509-519. |
[76] | IZQUIERDO-DOMENECH J, LINARES-PELLICER J, FERRI-MOLLA I. Virtual reality and language models, a new frontier in Learning[J]. International Journal of Interactive Multimedia and Artificial Intelligence, 2024, 8: 46-54. |
[77] | YOUSRI R, ESSAM Z, KAREEM Y, et al. IllusionX: an LLM-powered mixed reality personal companion[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2402.07924. |
[78] | WAN H, ZHANG J, SURIA A A, et al. Building LLM-based ai agents in social virtual reality[C]// Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 1-7. |
[79] | LLANES-JURADO J, GÓMEZ-ZARAGOZÁ L, MINISSI M E, et al. Developing conversational virtual humans for social emotion elicitation based on large language models[J]. Expert Systems with Applications, 2024, 246: 123261. |
[80] | PEI J H, VIOLA I, HUANG H C, et al. Autonomous workflow for multimodal fine-grained training assistants towards mixed reality[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2405.13034. |
[81] | LI Z M, BABAR P P, BARRY M, et al. Exploring the use of large language model-driven chatbots in virtual reality to train autistic individuals in job communication skills[C]// Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 1-7. |
[82] | KAPADIA N, GOKHALE S, NEPOMUCENO A, et al. Evaluation of large language model generated dialogues for an AI based VR nurse training simulator[C]// International Conference on Human-Computer Interaction. Cham: Springer, 2024: 200-212. |
[83] | NG H W, KOH A, FOONG A, et al. Real-time hybrid language model for virtual patient conversations[C]// International Conference on Artificial Intelligence in Education. Cham: Springer, 2023: 780-785. |
[84] | AJRI S J, NGUYEN D, AGARWAL S, et al. Virtual AIVantage: leveraging large language models for enhanced VR interview preparation among underrepresented professionals in computing[C]// The 22nd International Conference on Mobile and Ubiquitous Multimedia. New York: ACM, 2023: 535-537. |
[85] | LEE U, LEE S, KOH J, et al. Generative agent for teacher training: designing educational problem-solving simulations with large language model-based agents for pre-service teachers[C]// NeurIPS’ 23 Workshop:Generative AI for Education. New York: ACM, 2023: 8. |
[86] | SONG Y J, WU K Y, DING J Y. Developing an immersive game-based learning platform with generative artificial intelligence and virtual reality technologies-“LearningverseVR”[J]. Computers & Education: X Reality, 2024, 4: 100069. |
[87] | ZHAO Y J, PAN J Y, DONG Y, et al. Language urban odyssey: a serious game for enhancing second language acquisition through large language models[C]// Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 1-7. |
[88] | LIU Z Y, ZHU Z Z, ZHU L J, et al. ClassMeta: designing interactive virtual classmate to promote VR classroom participation[C]// 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 1-17. |
[89] | CHEN L Q, CAI Y, WANG R Y, et al. Supporting text entry in virtual reality with large language models[C]// 2024 IEEE Conference Virtual Reality and 3D User Interfaces. New York: IEEE Press, 2024: 524-534. |
[90] | GUNTURU A, JADON S, ZHANG N, et al. RealitySummary: on-demand mixed reality document enhancement using large language models[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2405.18620. |
[91] | HUANG Z A, CHEN F, PU Y W, et al. DiffVL: scaling up soft body manipulation using vision-language driven differentiable physics[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 29875-29900. |
[92] | ZHANG Y, GAO P Z, KANG F Z, et al. OdorAgent: generate odor sequences for movies based on large language model[C]// 2024 IEEE Conference Virtual Reality and 3D User Interfaces. New York: IEEE Press, 2024: 105-114. |
[93] | HE Z Y, LI S Y, SONG Y P, et al. Towards building condition-based cross-modality intention-aware human-AI cooperation under VR environment[C]// 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 1-13. |
[94] | WANG Z, YUAN L P, WANG L W, et al. VirtuWander: enhancing multi-modal interaction for virtual tour guidance through large language models[C]// 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 1-20. |
[95] | KOUZELIS L R, SPANTIDI O. Synthesizing play-ready VR scenes with natural language prompts through GPT API[C]// International Symposium on Visual Computing. Cham: Springer, 2023: 15-26. |
[96] | FENG W X, ZHU W R, FU T J, et al. LayoutGPT: compositional visual planning and generation with large language models[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 18225-18250. |
[97] | DE LA TORRE F, FANG C M, HUANG H, et al. LLMR: real-time prompting of interactive worlds using large language models[C]// 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 1-22. |
[98] | KURAI R, HIRAKI T, HIROI Y, et al. MagicItem: dynamic behavior design of virtual objects with large language models in a consumer metaverse platform[EB/OL]. [2024-07-29]. https://arxiv.org/abs/2406.13242. |
[1] | 刘冀辰, 李金星, 吴佳, 张威, 齐宇诺, 周国亮. 大模型技术在电力行业的应用展望[J]. 图学学报, 2024, 45(6): 1132-1144. |
[2] | 吴精乙, 景峻, 贺熠凡, 张世渝, 康运锋, 唐维, 孔德兰, 刘向栋. 基于多模态大模型的高速公路场景交通异常事件分析方法[J]. 图学学报, 2024, 45(6): 1266-1276. |
[3] | 栾帅, 吴健, 樊润泽, 王莉莉. 基于观察质量场的虚拟对象协同操作方法[J]. 图学学报, 2024, 45(6): 1338-1348. |
[4] | 任洋甫, 于歌, 傅月瑶, 胥森哲, 何煜, 王巨宏, 张松海. 虚拟现实中场景和时间对用户空间方向认知的影响[J]. 图学学报, 2024, 45(6): 1349-1363. |
[5] | 厉向东, 夏涵飞, 单逸飞, 阴凯琳, 耿卫东. 图形用户界面自动生成技术的现状与挑战[J]. 图学学报, 2024, 45(3): 409-421. |
[6] | 严家豪, 吕健, 侯宇康, 莫心祝. 虚拟现实中眼动交互频率对视觉疲劳影响的研究[J]. 图学学报, 2024, 45(3): 528-538. |
[7] | 黄家晖, 穆太江. 动态三维场景重建研究综述[J]. 图学学报, 2024, 45(1): 14-25. |
[8] | 王浩淼, 桑胜举, 段晓东, 张伟华, 陶体伟, 马婷. 虚拟现实环境下的协同式三维建模方法[J]. 图学学报, 2024, 45(1): 169-182. |
[9] | 韩兆阳, 翁冬冬, 郭署山, 贺文杰, 江海燕, 李冬. 一种基于简易标记点编码的光学跟踪系统[J]. 图学学报, 2023, 44(5): 997-1012. |
[10] | 谢红霞, 胡毓宁, 张赟, 王亚奇, 杜辉, 秦爱红. 全景图像视频的场景分析与内容处理方法综述[J]. 图学学报, 2023, 44(4): 640-657. |
[11] | 高越, 韩红雷. 基于可发光设备的观众席画面展示系统[J]. 图学学报, 2023, 44(4): 784-793. |
[12] | 段艳花, 刘子建, 宁铎. 基于眼动技术的 TMA 界面评估及优化[J]. 图学学报, 2022, 43(4): 745-752. |
[13] | 姜莱, 于震, 王鹏飞, 周东生, 侯亚庆 . 音频驱动跨模态视觉生成算法综述[J]. 图学学报, 2022, 43(2): 181-188. |
[14] | 马宁, 王亚辉. 智能汽车座舱人机交互任务复杂度分析方法[J]. 图学学报, 2022, 43(2): 356-360. |
[15] | 王秋惠, 王雅馨. 医院消杀机器人作业安全与交互设计策略[J]. 图学学报, 2022, 43(1): 172-180. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||