自动驾驶图像合成方法综述：从模拟器到新范式

doi:10.11996/JG.j.2095-302X.2025050931

图学学报 ›› 2025, Vol. 46 ›› Issue (5): 931-949.DOI: 10.11996/JG.j.2095-302X.2025050931

自动驾驶图像合成方法综述：从模拟器到新范式

黄敬¹(), 时瑞浩¹, 宋文明¹, 郭和攀¹, 魏璜¹, 魏小松³, 姚剑²^,³()

¹广州汽车集团股份有限公司，广东广州511434
²武汉大学深圳研究院，广东深圳518063
³武汉大学遥感信息工程学院，湖北武汉430079

收稿日期:2025-01-26 接受日期:2025-04-21 出版日期:2025-10-30 发布日期:2025-09-10
通讯作者:姚剑(1975-)，男，教授，博士。主要研究方向为计算机视觉、机器视觉、图像处理、模式识别、机器学习、SLAM、机器人等。E-mail：jian.yao@whu.edu.cn
第一作者:黄敬(1978-)，男，工程师，硕士。主要研究方向为智能网联汽车的云与大数据。E-mail：huangjing@gacrnd.com
基金资助:
广东省科技计划项目(2023B1212020010)

A review of autonomous driving image synthesis methods: from simulators to new paradigms

HUANG Jing¹(), SHI Ruihao¹, SONG Wenming¹, GUO Hepan¹, WEI Huang¹, WEI Xiaosong³, YAO Jian²^,³()

¹Guangzhou Automobile Group Company Limited, Guangzhou Guangdong511434, China
²Wuhan University Shenzhen Research Institute, Shenzhen Guangdong518063, China
³School of Remote Sensing and Informaion Engineering, Wuhan University, Wuhan Hubei430079, China

Received:2025-01-26 Accepted:2025-04-21 Published:2025-10-30 Online:2025-09-10
First author：HUANG Jing (1978-), engineer, master. His main research interests cover cloud and big data of intelligent connected vehicles. E-mail：huangjing@gacrnd.com
Supported by:
Guangdong Provincial Science and Technology Plan Project(2023B1212020010)

摘要/Abstract

摘要：

图像合成技术对自动驾驶的发展至关重要，旨在低成本、高效率地为自动驾驶系统提供训练和测试数据。随着计算机视觉和人工智能(AI)技术的发展，神经辐射场(NeRF)、三维高斯溅射(3DGS)和生成模型在图像合成领域引起了广泛关注，这些新范式在自动驾驶场景构建和图像数据合成中表现出巨大潜力。鉴于这些方法对于自动驾驶技术发展的重要性，回顾了其发展历程并搜集了最新研究工作，从自动驾驶图像合成问题的实际角度重新观察相关方法，介绍了NeRF、3DGS、生成模型以及虚实融合的合成方法在自动驾驶领域的进展，其中尤其关注NeRF和3DGS这2种基于重建的方法。首先，分析了自动驾驶图像生成任务的一些重要问题；然后，从自动驾驶场景面临的有限视角问题、大规模场景问题、动态性问题和加速问题4个方面详细分析了NeRF和3DGS的代表性方案；考虑到生成模型对于创建自动驾驶极端场景(corner case)的潜在优势，还介绍了自动驾驶世界模型用于场景生成的实际问题及现有研究工作；接着，分析了当前业内虚实融合自动驾驶图像合成前沿应用，以及NeRF和3DGS结合AI生成模型在自动驾驶场景生成任务中的潜力；最后，总结了当前取得的成功及未来亟需探索的方向。

关键词: 自动驾驶, 图像合成, 神经辐射场, 三维高斯溅射, 生成模型

Abstract:

Image synthesis techniques are crucial for the development of autonomous driving, aiming to provide training and testing data for autonomous driving systems in a cost-effective manner. With the development of computer vision and artificial intelligence (AI) technologies, neural radiance fields (NeRF), 3D Gaussian splatting (3DGS), and generative modeling have attracted much attention in the field of image synthesis. These new paradigms show great potential in autonomous driving scene construction and image data synthesis. Recognizing the importance of these methods for the development of autonomous driving technology, their development history was reviewed and the latest research works were collected, and the methods were re-examined from the practical perspective of the autonomous driving image synthesis problem. The progress of NeRF, 3DGS, generative modeling, and reality-virtual fusion synthesis methods in the field of autonomous driving was introduced, with special focus on NeRF and 3DGS, two reconstruction-based methods. First, some important issues were analyzed for the task of autonomous driving image generation, followed by detailed examination of representative schemes of NeRF and 3DGS in terms of the limited viewpoint problem, large-scale scene problem, dynamics problem, and acceleration problem faced by autonomous driving scenes. Considering the potential benefits of generative models for creating corner cases of autonomous driving, practical issues and existing research works on the use of autonomous driving world models for scenario generation were also presented. Then, the cutting-edge applications of virtual-reality fusion for autonomous driving image synthesis were analyzed, as well as the potential of NeRF and 3DGS combined with AI generative modeling for the task of autonomous driving scenario generation. Finally, current achievements were summarized and future research directions were outlined.

Key words: autonomous driving, image synthesis, neural radiance field, 3D gaussian splatting, generation model

中图分类号:

黄敬, 时瑞浩, 宋文明, 郭和攀, 魏璜, 魏小松, 姚剑. 自动驾驶图像合成方法综述：从模拟器到新范式[J]. 图学学报, 2025, 46(5): 931-949.

HUANG Jing, SHI Ruihao, SONG Wenming, GUO Hepan, WEI Huang, WEI Xiaosong, YAO Jian. A review of autonomous driving image synthesis methods: from simulators to new paradigms[J]. Journal of Graphics, 2025, 46(5): 931-949.

图/表 20

参考文献 96

[1]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al.NeRF: representing scenes as neural radiance fields for view synthesis[C]//The 16th European Conference on Computer Vision. Cham:Springer, 2020: 405-421.
[2]	KERBL B, KOPANAS G, LEIMKUEHLER T, et al.3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139.
[3]	JOHANSSON R, WILLIAMS D, BERGLUND A, et al.Carsim: a system to visualize written road accident reports as animated 3D scenes[C]//The 2nd Workshop on Text Meaning and Interpretation. New York:ACL, 2004: 57-64.
[4]	宋振波.面向自动驾驶的视觉数据生成关键问题研究[D]. 南京:南京理工大学, 2022.
	SONG Z B.Research on visual data generation for autonomous driving[D]. Nanjing:Nanjing University of Science & Technology, 2022 (in Chinese).
[5]	王稚儒, 常远, 鲁鹏, 等.神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13.
	WANG Z R, CHANG Y, LU P, et al.A review on neural radiance fields acceleration[J]. Journal of Graphics, 2024, 45(1): 1-13 (in Chinese).
[6]	朱结, 宋滢.基于可微渲染的自由视点合成方法[J]. 图学学报, 2024, 45(5): 1030-1039.
	ZHU J, SONG Y.A free viewpoint synthesis method based on differentiable rendering[J]. Journal of Graphics, 2024, 45(5): 1030-1039 (in Chinese).
[7]	TANCIK M, SRINIVASAN P P, MILDENHALL B, et al.Fourier features let networks learn high frequency functions in low dimensional domains[C]//The 34th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc., 2020: 632.
[8]	ZWICKER M, PFISTER H, VAN BAAR J, et al.EWA splatting[J]. IEEE Transactions on Visualization and Computer Graphics, 2002, 8(3): 223-238.
[9]	XIE Z Y, ZHANG J G, LI W Y, et al.S-NeRF: neural radiance fields for street views[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2303.00749.
[10]	YANG J W, IVANOVIC B, LITANY O, et al.EmerNeRF: emergent spatial-temporal scene decomposition via self- supervision[EB/OL]. [2024-11-26]. https://arxiv.org/abs/2311.02077.
[11]	OST J, MANNAN F, THUEREY N, et al.Neural scene graphs for dynamic scenes[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 2855-2864.
[12]	YU A, YE V, TANCIK M, et al.PixelNeRF: neural radiance fields from one or few Images[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 4576-4585.
[13]	TURKI H, ZHANG J Y, FERRONI F, et al.SUDS: scalable urban dynamic scenes[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 12375-12385.
[14]	REMATAS K, LIU A, SRINIVASAN P, et al.Urban radiance fields[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 12922-12932.
[15]	BARRON J T, MILDENHALL B, TANCIK M, et al.Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2022: 5835-5844.
[16]	CHEN Y R, GU C, JIANG J Z, et al.Periodic vibration Gaussian: dynamic urban scene reconstruction and real-time Rendering[EB/OL]. [2024-03-20]. https://arxiv.org/abs/2311.18561.
[17]	YAN Y Z, LIN H T, ZHOU C X, et al.Street Gaussians: modeling dynamic urban scenes with Gaussian splatting[C]// The 18th European Conference on Computer Vision. Cham:Springer, 2025: 156-173.
[18]	ZHOU X Y, LIN Z W, SHAN X J, et al.DrivingGaussian: composite Gaussian splatting for surrounding dynamic autonomous driving scenes[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 21634-21643.
[19]	ZHOU H Y, SHAO J H, XU L, et al.HUGS: holistic urban 3D scene understanding via Gaussian splatting[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 21336-21345.
[20]	PENG C S, ZHANG C W, WANG Y X, et al.DeSiRe-GS:4D street Gaussians for static-dynamic decomposition and surface reconstruction for urban driving scenes[EB/OL]. [2024-11-18]. https://arxiv.org/abs/2411.11921.
[21]	WEI Y, LIU S H, RAO Y M, et al.NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5590-5599.
[22]	WANG G C, CHEN Z X, LOY C C, et al.SparseNeRF: distilling depth ranking for few-shot novel view synthesis[C]//2023 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2023: 9031-9042.
[23]	DENG K L, LIU A, ZHU J Y, et al.Depth-supervised NeRF: fewer views and faster training for free[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 12872-12881.
[24]	PARK J, JOO K, HU Z, et al.Non-local spatial propagation network for depth completion[C]//The 16th European Conference on Computer Vision. Cham:Springer, 2020: 120-136.
[25]	SUN S L, ZHUANG B B, JIANG Z Y, et al.LiDARF: delving into LiDAR for neural radiance field on street scenes[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 19563-19572.
[26]	XU Q G, XU Z X, PHILIP J, et al.Point-NeRF: point-based neural radiance fields[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5428-5438.
[27]	CHUNG J Y, OH J, LEE K M.Depth-regularized optimization for 3D Gaussian splatting in few-shot images[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 811-820.
[28]	TURKULAINEN M, REN X Q, MELEKHOV L, et al.DN-splatter: depth and normal priors for Gaussian splatting and meshing[C]//2025 IEEE/CVF Winter Conference on Applications of Computer Vision. New York:IEEE Press, 2421-2431.
[29]	KERBL B, MEULEMAN A, KOPANAS G, et al.A hierarchical 3D Gaussian representation for real-time rendering of very large datasets[J]. ACM Transactions on Graphics (TOG), 2024, 43(4): 62.
[30]	LI J H, ZHANG J W, BAI X, et al.DNGaussian: optimizing sparse-view 3D Gaussian radiance fields with global-local depth normalization[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 20775-20785.
[31]	XU W Z, GAO H H, SHEN S H, et al.MVPGS: excavating multi-view priors for Gaussian splatting from sparse input views[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2024: 203-220.
[32]	ZHU Z H, FAN Z W, JIANG Y F, et al.FSGS: real-time few-shot view synthesis using Gaussian splatting[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2024: 145-163.
[33]	YIN R H, YUGAY V, LI Y, et al.FewViewGS: Gaussian splatting with few view matching and multi-stage training[EB/OL]. [2024-11-05]. https://arxiv.org/abs/2411.02229.
[34]	CHEN Y T, MIHAJLOVIC M, CHEN X Y, et al.SplatFormer:point transformer for robust 3D Gaussian splatting[EB/OL]. [2024-11-12]. https://arxiv.org/abs/2411.06390.
[35]	HUANG N, WEI X B, ZHENG W Z, et al.S³Gaussian:self-supervised street Gaussians for autonomous driving[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2405.20323.
[36]	JIANG C J, GAO R L, SHAO K L, et al.LI-GS: Gaussian splatting with LiDAR incorporated for accurate large-scale reconstruction[J]. IEEE Robotics and Automation Letters, 2025, 10(2): 1864-1871.
[37]	KUNG P C, ZHANG X L, SKINNER K A, et al.LiHi-GS: LiDAR-supervised Gaussian splatting for highway driving scene reconstruction[EB/OL]. [2024-12-26]. https://arxiv.org/abs/2412.15447.
[38]	ZHANG K, RIEGLER G, SNAVELY N, et al.NeRF++: analyzing and improving neural radiance fields[EB/OL]. [2024-11-27]. http://arxiv.org/abs/2010.07492.
[39]	BARRON J T, MILDENHALL B, VERBIN D, et al. Mip-NeRF 360: unbounded anti-aliased neural radiance fields[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5460-5469.
[40]	TURKI H, RAMANAN D, SATYANARAYYANAN.Mega-NeRF: scalable construction of large-scale NeRFs for virtual fly-throughs[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 12912-12921.
[41]	TANCIK M, CASSER V, YAN X C, et al.Block-NeRF: scalable large scene neural view synthesis[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 8238-8248.
[42]	MI Z X, XU D.Switch-NeRF: learning scene decomposition with mixture of experts for large-scale neural radiance fields[EB/OL]. [2024-11-26]. https://openreview.net/forum?id=PQ2zoIZqvm.
[43]	董相涛, 马鑫, 潘成伟, 等.室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(5): 631-649.
	DONG X T, MA X, PAN C W, et al.A review of neural radiance fields for outdoor large scenes[J]. Journal of Graphics, 2024, 45(4): 631-649 (in Chinese).
[44]	XIANGLI Y B, XU L N, PAN X G, et al.BungeeNeRF: progressive neural radiance field for extreme multi-scale scene rendering[C]//The 17th European Conference on Computer Vision. Cham:Springer, 2022: 106-122.
[45]	XU L N, XIANGLI Y B, PENG S D, et al.Grid-guided neural radiance fields for large urban scenes[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 8296-8306.
[46]	LIN J Q, LI Z H, TANG X, et al.VastGaussian: vast 3D Gaussians for large scene reconstruction[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 5166-5175.
[47]	CHEN J Y, YE W C, WANG Y F, et al.GigaGS:scaling up planar-based 3D Gaussians for large scene surface reconstruction[EB/OL]. [2024-09-10]. https://arxiv.org/abs/2409.06685.
[48]	CHEN Y, LEE F H.DoGaussian:distributed-oriented Gaussian splatting for large-scale 3D reconstruction via Gaussian consensus[EB/OL]. [2024-11-26]. https://arxiv.org/abs/2405.13943.
[49]	FAN J X, LI W H, HAN Y F, et al.Momentum-GS: momentum Gaussian self-distillation for high-quality large scene reconstruction[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2412.04887.
[50]	GAO Y Y, DAI Y L, LI H, et al.CoSurfGS:collaborative 3D surface Gaussian splatting with distributed learning for large scene reconstruction[EB/OL]. [2024-12-23]. https://arxiv.org/abs/2412.17612.
[51]	MARTIN-BRUALLA R, RADWAN N, SAJJADI M S M, et al.NeRF in the wild: neural radiance fields for unconstrained photo collections[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 7206-7215.
[52]	CHEN D P, LI H, YE W C, et al.PGSR: planar-based Gaussian splatting for efficient and high-fidelity surface reconstruction[EB/OL]. [2024-06-10]. https://arxiv.org/abs/2406.06521.
[53]	PUMAROLA A, CORONA E, PONS-MOLL G, et al.D-NeRF: neural radiance fields for dynamic scenes[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 10313-10322.
[54]	LI Z Q, NIKLAUS S, SNAVELY N, et al.Neural scene flow fields for space-time view synthesis of dynamic scenes[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 6494-6504.
[55]	GAO C, SARAF A, KOPF J, et al.Dynamic view synthesis from dynamic monocular video[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5692-5701.
[56]	PARK K, SINHA U, BARRON J T, et al.NeRFIES: deformable neural radiance fields[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5845-5854.
[57]	ZHANG J B, LI X Y, WAN Z Y, et al.FDNeRF: few-shot dynamic neural radiance fields for face reconstruction and expression editing[C]//SIGGRAPH Asia 2022 Conference Papers. New York:ACM, 2022: 12.
[58]	ZHANG B Y, XU W B, ZHU Z, et al.detachable novel views synthesis of dynamic scenes using distribution-driven neural radiance fields[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2301.00411.
[59]	TONDERSKI A, LINDSTRÖM C, HESS G, et al.NeuRAD: neural rendering for autonomous driving[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 14895-14904.
[60]	YANG Z Y, GAO X Y, ZHOU W, et al.Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 20331-20341.
[61]	WU G J, YI T R, FANG J M, et al.4D Gaussian splatting for real-time dynamic scene rendering[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 20310-20320.
[62]	KRATIMENOS A, LEI J H, DANIILIDIS K.DynMF: neural motion factorization for real-time dynamic view synthesis with 3D Gaussian splatting[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2025: 252-269.
[63]	YANG Z Y, YANG H Y, PAN Z J, et al.Real-time photorealistic dynamic scene representation and rendering with 4D Gaussian splatting[EB/OL]. [2024-02-22]. https://arxiv.org/abs/2310.10642.
[64]	曹振中, 光金正, 张千一, 等.基于3D高斯溅射的3维重建技术综述[J]. 机器人, 2024, 46(5): 611-622.
	CAO Z Z, GUANG J Z, ZHANG Q Y, et al.Survey of 3D reconstruction techniques based on 3D Gaussian splatting[J]. Robot, 2024, 46(5): 611-622 (in Chinese).
[65]	FANG J M, YI T R, WANG X G, et al.Fast dynamic radiance fields with time-aware neural voxels[C]//SIGGRAPH Asia 2022 Conference Papers. New York:ACM, 2022: 11.
[66]	FRIDOVICH-KEIL S, MEANTI G, WARBURG F R, et al.K-planes: explicit radiance fields in space, time, and appearance[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 12479-12488.
[67]	CAO A, JOHNSON J.HexPlane: a fast representation for dynamic scenes[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 130-141.
[68]	GUO X, SUN J D, DAI Y C, et al.Forward flow for novel view synthesis of dynamic scenes[C]//2023 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2023: 15976-15987.
[69]	WANG F, CHEN Z L, WANG G K, et al.Masked space-time hash encoding for efficient dynamic scene reconstruction[C]// The 37th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc., 2024: 3089.
[70]	LU Z C, GUO X, HUI L, et al.3D Geometry-aware deformable Gaussian splatting for dynamic view synthesis[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 8900-8910.
[71]	YU H, JULIN J, MILACSKI Z Á, et al.CoGS: controllable Gaussian splatting[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 21624-21633.
[72]	MÜLLER T, EVANS A, SCHIED C, et al.Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 102.
[73]	FRIDOVICH-KEIL S, YU A, TANCIK M, et al.Plenoxels: radiance fields without neural networks[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5491-5500.
[74]	TREVITHICK A, YANG B.GRF: learning a general radiance field for 3D representation and rendering[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 15162-15172.
[75]	SUN C, SUN M, CHEN H T.Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5449-5459.
[76]	YU A, LI R L, TANCIK M, et al.PlenOctrees for real-time rendering of neural radiance fields[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5732-5741.
[77]	REN K R, JIANG L H, LU T, et al.Octree-GS:towards consistent real-time rendering with LOD-structured 3D Gaussians[EB/OL]. [2024-10-17]. https://arxiv.org/abs/2403.17898.
[78]	SWERDLOW A, XU R S, ZHOU B L.Street-view image generation from a bird’s-eye view layout[J]. IEEE Robotics and Automation Letters, 2024, 9(4): 3578-3585.
[79]	YANG K R, MA E H, PENG J B, et al.BEVcontrol: accurately controlling street-view elements with multi-perspective consistency via BEV sketch layout[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2308.01661.
[80]	WANG X F, ZHU Z, HUANG G, et al.DriveDreamer: towards real-world-driven world models for autonomous driving[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2309.09777.
[81]	KIM S W, PHILION J, TORRALBA A, et al.DriveGAN: towards a controllable high-quality neural simulation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 5816-5825.
[82]	LI X F, ZHANG Y F, YE X Q.DrivingDiffusion: layout-guided multi-view driving scenarios video generation with latent diffusion model[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2025: 469-485.
[83]	WEN Y Q, ZHAO Y C, LIU Y F, et al.Panacea: panoramic and controllable video generation for autonomous driving[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 6902-6912.
[84]	HU A, RUSSELL L, YEO H, et al.GAIA-1:a generative world model for autonomous driving[EB/OL]. [2024-12-29]. https://arxiv.org/abs/2309.17080.
[85]	JIA F, MAO W X, LIU Y F, et al.ADriver-I: a general world model for autonomous driving[EB/OL]. [2024-11-22]. https://arxiv.org/abs/2311.13549.
[86]	BOGDOLL D, YANG Y T, JOSEPH T, et al.MUVO: a multimodal generative world model for autonomous driving with geometric representations[EB/OL]. [2024-11-26]. https://arxiv.org/abs/2311.11762.
[87]	ZHENG W Z, CHEN W L, HUANG Y H, et al.OccWorld: learning a 3D occupancy world model for autonomous driving[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2025: 55-72.
[88]	WANG X F, ZHU Z, HUANG G, et al.WorldDreamer: towards general world models for video generation via predicting masked tokens[EB/OL]. [2024-11-18]. https://arxiv.org/abs/2401.09985.
[89]	LI Q F, JIA X S, WANG S B, et al.Think2Drive:efficient reinforcement learning by thinking in latent world model for quasi-realistic autonomous driving (in CARLA-v2)[EB/OL]. [2024-07-20]. https://arxiv.org/abs/2402.16720.
[90]	ZHENG W Z, SONG R Q, GUO X D, et al.GenAD: generative end-to-end autonomous driving[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2024: 87-104.
[91]	GUAN Y C, LIAO H C, LI Z N, et al.World models for autonomous driving: an initial survey[EB/OL]. (2024-05-08) [2024-11-26]. https://doi.org/10.1109/TIV.2024.3398357.
[92]	YANG J Z, GAO S Y, QIU Y H, et al.Generalized predictive model for autonomous driving[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 14662-14672.
[93]	YANG Z P, CHAI Y N, ANGUELOV D, et al.SurfelGAN: synthesizing realistic sensor data for autonomous driving[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2020: 11115-11124.
[94]	LIU X Y, XUE H, LUO K M, et al.GenN2N: generative NeRF2NeRF translation[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 5105-5114.
[95]	HUANG Y Z, LI Z, CHEN Z, et al.OrientDream:streamlining text-to-3D generation with explicit orientation control[EB/OL]. [2024-06-14]. https://arxiv.org/abs/2406.10000.
[96]	YAN J B, ZHAO A L, HU Y X.Dragen3D: multiview geometry consistent 3D Gaussian generation with drag-based control[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2502.16475.

方法	辅助先验	PSNR↑	SSIM↑	LPIPS↓	数据集
NeRF^[1]	无	18.56	0.557	0.554	KITTI
S-NeRF^[9]	LiDAR	18.71	0.606	0.352	KITTI
EmerNeRF^[10]	LiDAR、2D语义	25.24	0.801	0.237	KITTI
NSG^[11]	无	21.53	0.673	0.254	KITTI
PixelNeRF^[12]	无	20.10	0.761	0.175	KITTI
SUDS^[13]	LiDAR、2D光流	22.77	0.797	0.171	KITTI
Urban-NeRF^[14]	LiDAR	21.49	0.661	0.491	nuScenes
Mip-NeRF^[15]	无	18.22	0.655	0.421	nuScenes
3DGS^[2]	无	26.08	0.717	0.298	nuScenes
PVG^[16]	无	22.43	0.896	0.114	KITTI
StreetGaussian^[17]	LiDAR、2D语义	25.79	0.844	0.081	KITTI
DrivingGaussian^[18]	无	28.36	0.851	0.256	nuScenes
DrivingGaussian^[18]	LiDAR	28.74	0.865	0.237	nuScenes
HuGS^[19]	2D/3D语义、光流	26.81	0.866	0.059	KITTI
DeSiRe-GS^[20]	LiDAR	28.87	0.901	0.106	KITTI

方法	辅助先验	PSNR↑	SSIM↑	LPIPS↓	数据集
NeRF^[1]	无	18.56	0.557	0.554	KITTI
S-NeRF^[9]	LiDAR	18.71	0.606	0.352	KITTI
EmerNeRF^[10]	LiDAR、2D语义	25.24	0.801	0.237	KITTI
NSG^[11]	无	21.53	0.673	0.254	KITTI
PixelNeRF^[12]	无	20.10	0.761	0.175	KITTI
SUDS^[13]	LiDAR、2D光流	22.77	0.797	0.171	KITTI
Urban-NeRF^[14]	LiDAR	21.49	0.661	0.491	nuScenes
Mip-NeRF^[15]	无	18.22	0.655	0.421	nuScenes
3DGS^[2]	无	26.08	0.717	0.298	nuScenes
PVG^[16]	无	22.43	0.896	0.114	KITTI
StreetGaussian^[17]	LiDAR、2D语义	25.79	0.844	0.081	KITTI
DrivingGaussian^[18]	无	28.36	0.851	0.256	nuScenes
DrivingGaussian^[18]	LiDAR	28.74	0.865	0.237	nuScenes
HuGS^[19]	2D/3D语义、光流	26.81	0.866	0.059	KITTI
DeSiRe-GS^[20]	LiDAR	28.87	0.901	0.106	KITTI

方法	关键技术	注解
DR-Gaussian^[27]	${D}_{\text{dense}}=s\cdot {F}_{\theta }(I)+t$ ${D}_{\text{dense}}=s\cdot {F}_{\theta }(I)+t$	利用尺度系数s和偏移量t将单目深度F_θ(I)对齐到稀疏点D_sparse，ω归一化特征点可靠性权值
DN-Splatter^[28]	$s,t=\mathrm{arg}\underset{s,t}{\mathrm{min}}{\displaystyle \sum _{p\in {\text{D}}_{\text{sparse}}}{‖(s\cdot {D}_{\text{mono}}(p)+t)-{D}_{\text{sparse}}(p)‖}^{2}}$ $s,t=\mathrm{arg}\underset{s,t}{\mathrm{min}}{\displaystyle \sum _{p\in {\text{D}}_{\text{sparse}}}{‖(s\cdot {D}_{\text{mono}}(p)+t)-{D}_{\text{sparse}}(p)‖}^{2}}$	利用单目深度D_mono(p)到稀疏点D_sparse(p)的线性回归求解深度尺度系数s和偏移量t，g_rgb=exp(-▽I)作为绝对尺度可靠性度量
Hierarchy GS^[29]	$D=\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}D+t\left({D}_{\text{sparse}}\right)-t(D)\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}$ $D=\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}D+t\left({D}_{\text{sparse}}\right)-t(D)\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}$	将单目逆深度图D对齐到SfM尺度D_sparse
DNGaussian^[30]	$D*(x)=\frac{D(x)-\text{mean}(D(p))}{\text{std}(D(p))}$	将深度图分割为小块p，然后利用块内深度均值meanD(p)和标准差stdD(p)归一化深度分布函数

方法	关键技术	注解
DR-Gaussian^[27]	${D}_{\text{dense}}=s\cdot {F}_{\theta }(I)+t$ ${D}_{\text{dense}}=s\cdot {F}_{\theta }(I)+t$	利用尺度系数s和偏移量t将单目深度F_θ(I)对齐到稀疏点D_sparse，ω归一化特征点可靠性权值
DN-Splatter^[28]	$s,t=\mathrm{arg}\underset{s,t}{\mathrm{min}}{\displaystyle \sum _{p\in {\text{D}}_{\text{sparse}}}{‖(s\cdot {D}_{\text{mono}}(p)+t)-{D}_{\text{sparse}}(p)‖}^{2}}$ $s,t=\mathrm{arg}\underset{s,t}{\mathrm{min}}{\displaystyle \sum _{p\in {\text{D}}_{\text{sparse}}}{‖(s\cdot {D}_{\text{mono}}(p)+t)-{D}_{\text{sparse}}(p)‖}^{2}}$	利用单目深度D_mono(p)到稀疏点D_sparse(p)的线性回归求解深度尺度系数s和偏移量t，g_rgb=exp(-▽I)作为绝对尺度可靠性度量
Hierarchy GS^[29]	$D=\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}D+t\left({D}_{\text{sparse}}\right)-t(D)\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}$ $D=\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}D+t\left({D}_{\text{sparse}}\right)-t(D)\frac{s\left({D}_{\text{sparse}}\right)}{s(D)}$	将单目逆深度图D对齐到SfM尺度D_sparse
DNGaussian^[30]	$D*(x)=\frac{D(x)-\text{mean}(D(p))}{\text{std}(D(p))}$	将深度图分割为小块p，然后利用块内深度均值meanD(p)和标准差stdD(p)归一化深度分布函数

Scenes	Building			Rubble			Campus			Residence			Sci-Art
Scenes	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
Mega-NeRF^[40]	20.92	0.547	0.454	24.06	0.553	0.508	23.42	0.537	0.636	22.08	0.628	0.401	25.60	0.770	0.312
Switch-NeRF^[42]	21.54	0.579	0.397	23.41	0.562	0.478	23.62	0.541	0.616	22.57	0.654	0.352	26.51	0.795	0.271
3DGS^[2]	22.53	0.738	0.214	25.51	0.725	0.316	23.67	0.688	0.347	22.36	0.745	0.247	24.13	0.791	0.262
VastGaussian^[46]	21.80	0.728	0.225	25.20	0.742	0.264	23.82	0.695	0.329	21.01	0.699	0.261	22.64	0.761	0.261
Hierarchy GS^[29]	21.52	0.723	0.297	24.64	0.755	0.284
DoGaussian^[48]	22.73	0.759	0.204	25.78	0.765	0.257	24.01	0.681	0.377	21.94	0.740	0.244	24.42	0.804	0.219
CoSurfGS^[50]	22.40	0.750	0.262	25.39	0.774	0.267	23.63	0.719	0.360	22.31	0.776	0.261	23.29	0.802	0.277

自动驾驶图像合成方法综述：从模拟器到新范式

A review of autonomous driving image synthesis methods: from simulators to new paradigms

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 20

参考文献 96

相关文章 13

编辑推荐

Metrics

本文评价

数据集	方法	PSNR↑	SSIM↑	LPIPS↓
Panda PC	Instant-NGP	24.03	0.708	0.451
	UniSim	25.63	0.745	0.277
	NeuRAD	26.58	0.778	0.190
Panda 360	UniSim	23.50	0.692	0.330
Panda 360	NeuRAD	25.97	0.758	0.242
nuScenes	Mip360	24.37	0.795	0.240
	S-NeRF	26.21	0.831	0.228
	NeuRAD	26.99	0.815	0.225
KITTI MOT	SUDS	23.12	0.821	0.135
	MARS	24.00	0.801	0.164
	NeuRAD	27.00	0.795	0.082
Argo2	UniSim	23.22	0.661	0.412
Argo2	NeuRAD	26.22	0.717	0.315
ZOD	UniSim	27.97	0.777	0.239
ZOD	NeuRAD	29.49	0.809	0.226

方法	是否GS	PSNR↑	SSIM↑	LPIPS↓
D-NeRF^[53]	否	30.50	0.95	0.07
TiNeuVox-B^[65]	否	32.67	0.97	0.04
Kplanes^[66]	否	31.61	0.97
HexPlane^[67]	否	32.68	0.97	0.02
FFDNeRF^[68]	否	32.68	0.97	0.02
MSTH^[69]	否	31.34	0.98	0.02
3DGS^[2]	是	23.19	0.93	0.08
RP-4DGS^[63]	是	34.09	0.98
4DGS^[61]	是	34.05	0.98	0.02
GaGS^[70]	是	37.36	0.99	0.01
CoGS^[71]	是	37.90	0.98	0.02
D-3DGS^[60]	是	39.51	0.99	0.01

方法	编码方式	训练时间	迭代次数/K
NeRF^[1]	位置编码	>12 h	300
PixelNeRF^[12]	位置编码	>12 h	400
Mip-NeRF^[15]	集成位置编码	≈6 h	612
GRF^[75]	位置编码
Point-NeRF^[26]	位置编码	≈7 h	200
Instant NGP^[72]	哈希编码	≈5 min	256
Plenoxels^[73]	位置编码	≈11 min	10
DVGO^[74]	位置编码	≈15 min	20
PlenOctree^[76]	位置编码	>12 h

方法	多视角	多帧	FID↓	FVD↓
BEVGen^[78]			25.54
BEVControl^[79]			24.85
DriveDreamer^[80]			52.60	452
DriveGAN^[81]			73.40	502
DrivingDiffusion^[82]			15.89
DrivingDiffusion^[82]			15.85	335
DrivingDiffusion^[82]			15.83	332
Panacea^[83]			16.96	139
GenAD^[90]			15.40	184

[1]	王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509.
[2]	周峥, 戴亚桥, 易任娇, 蓝龙, 朱晨阳. 基于RGB特征的下一个最优视图导航技术[J]. 图学学报, 2025, 46(3): 551-557.
[3]	邱佳新, 宋倩云, 徐丹. 基于改进神经辐射场的民族舞蹈重建方法[J]. 图学学报, 2025, 46(2): 415-424.
[4]	吴磊, 盛芹芹, 赵睿思. 汽车自动驾驶接管系统人机界面用户体验评价研究[J]. 图学学报, 2025, 46(2): 459-468.
[5]	谢文想, 许威威. 辐射场表面物点引导的主动视图选择[J]. 图学学报, 2025, 46(1): 179-187.
[6]	董相涛, 马鑫, 潘成伟, 鲁鹏. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649.
[7]	王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13.
[8]	成欢, 王硕, 李孟, 秦伦明, 赵芳. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103.
[9]	范腾, 杨浩, 尹稳, 周冬明. 基于神经辐射场的多尺度视图合成研究[J]. 图学学报, 2023, 44(6): 1140-1148.
[10]	杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型[J]. 图学学报, 2023, 44(5): 955-965.
[11]	史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530.
[12]	常远, 盖孟. 基于神经辐射场的视点合成算法综述[J]. 图学学报, 2021, 42(3): 376-384.
[13]	温利龙，徐丹，张熹，钱文华. 基于生成模型的古壁画非规则破损部分修复方法[J]. 图学学报, 2019, 40(5): 925-931.