图学学报 ›› 2025, Vol. 46 ›› Issue (5): 931-949.DOI: 10.11996/JG.j.2095-302X.2025050931
黄敬1(), 时瑞浩1, 宋文明1, 郭和攀1, 魏璜1, 魏小松3, 姚剑2,3(
)
收稿日期:
2025-01-26
接受日期:
2025-04-21
出版日期:
2025-10-30
发布日期:
2025-09-10
通讯作者:
姚剑(1975-),男,教授,博士。主要研究方向为计算机视觉、机器视觉、图像处理、模式识别、机器学习、SLAM、机器人等。E-mail:jian.yao@whu.edu.cn第一作者:
黄敬(1978-),男,工程师,硕士。主要研究方向为智能网联汽车的云与大数据。E-mail:huangjing@gacrnd.com
基金资助:
HUANG Jing1(), SHI Ruihao1, SONG Wenming1, GUO Hepan1, WEI Huang1, WEI Xiaosong3, YAO Jian2,3(
)
Received:
2025-01-26
Accepted:
2025-04-21
Published:
2025-10-30
Online:
2025-09-10
First author:
HUANG Jing (1978-), engineer, master. His main research interests cover cloud and big data of intelligent connected vehicles. E-mail:huangjing@gacrnd.com
Supported by:
摘要:
图像合成技术对自动驾驶的发展至关重要,旨在低成本、高效率地为自动驾驶系统提供训练和测试数据。随着计算机视觉和人工智能(AI)技术的发展,神经辐射场(NeRF)、三维高斯溅射(3DGS)和生成模型在图像合成领域引起了广泛关注,这些新范式在自动驾驶场景构建和图像数据合成中表现出巨大潜力。鉴于这些方法对于自动驾驶技术发展的重要性,回顾了其发展历程并搜集了最新研究工作,从自动驾驶图像合成问题的实际角度重新观察相关方法,介绍了NeRF、3DGS、生成模型以及虚实融合的合成方法在自动驾驶领域的进展,其中尤其关注NeRF和3DGS这2种基于重建的方法。首先,分析了自动驾驶图像生成任务的一些重要问题;然后,从自动驾驶场景面临的有限视角问题、大规模场景问题、动态性问题和加速问题4个方面详细分析了NeRF和3DGS的代表性方案;考虑到生成模型对于创建自动驾驶极端场景(corner case)的潜在优势,还介绍了自动驾驶世界模型用于场景生成的实际问题及现有研究工作;接着,分析了当前业内虚实融合自动驾驶图像合成前沿应用,以及NeRF和3DGS结合AI生成模型在自动驾驶场景生成任务中的潜力;最后,总结了当前取得的成功及未来亟需探索的方向。
中图分类号:
黄敬, 时瑞浩, 宋文明, 郭和攀, 魏璜, 魏小松, 姚剑. 自动驾驶图像合成方法综述:从模拟器到新范式[J]. 图学学报, 2025, 46(5): 931-949.
HUANG Jing, SHI Ruihao, SONG Wenming, GUO Hepan, WEI Huang, WEI Xiaosong, YAO Jian. A review of autonomous driving image synthesis methods: from simulators to new paradigms[J]. Journal of Graphics, 2025, 46(5): 931-949.
方法 | 辅助先验 | PSNR↑ | SSIM↑ | LPIPS↓ | 数据集 |
---|---|---|---|---|---|
NeRF[ | 无 | 18.56 | 0.557 | 0.554 | KITTI |
S-NeRF[ | LiDAR | 18.71 | 0.606 | 0.352 | KITTI |
EmerNeRF[ | LiDAR、2D语义 | 25.24 | 0.801 | 0.237 | KITTI |
NSG[ | 无 | 21.53 | 0.673 | 0.254 | KITTI |
PixelNeRF[ | 无 | 20.10 | 0.761 | 0.175 | KITTI |
SUDS[ | LiDAR、2D光流 | 22.77 | 0.797 | 0.171 | KITTI |
Urban-NeRF[ | LiDAR | 21.49 | 0.661 | 0.491 | nuScenes |
Mip-NeRF[ | 无 | 18.22 | 0.655 | 0.421 | nuScenes |
3DGS[ | 无 | 26.08 | 0.717 | 0.298 | nuScenes |
PVG[ | 无 | 22.43 | 0.896 | 0.114 | KITTI |
StreetGaussian[ | LiDAR、2D语义 | 25.79 | 0.844 | 0.081 | KITTI |
DrivingGaussian[ | 无 | 28.36 | 0.851 | 0.256 | nuScenes |
DrivingGaussian[ | LiDAR | 28.74 | 0.865 | 0.237 | nuScenes |
HuGS[ | 2D/3D语义、光流 | 26.81 | 0.866 | 0.059 | KITTI |
DeSiRe-GS[ | LiDAR | 28.87 | 0.901 | 0.106 | KITTI |
表1 NeRF和3DGS自动驾驶场景数据集上的比较
Table 1 Comparison of NeRF and 3DGS on the autonomous driving scene datasets
方法 | 辅助先验 | PSNR↑ | SSIM↑ | LPIPS↓ | 数据集 |
---|---|---|---|---|---|
NeRF[ | 无 | 18.56 | 0.557 | 0.554 | KITTI |
S-NeRF[ | LiDAR | 18.71 | 0.606 | 0.352 | KITTI |
EmerNeRF[ | LiDAR、2D语义 | 25.24 | 0.801 | 0.237 | KITTI |
NSG[ | 无 | 21.53 | 0.673 | 0.254 | KITTI |
PixelNeRF[ | 无 | 20.10 | 0.761 | 0.175 | KITTI |
SUDS[ | LiDAR、2D光流 | 22.77 | 0.797 | 0.171 | KITTI |
Urban-NeRF[ | LiDAR | 21.49 | 0.661 | 0.491 | nuScenes |
Mip-NeRF[ | 无 | 18.22 | 0.655 | 0.421 | nuScenes |
3DGS[ | 无 | 26.08 | 0.717 | 0.298 | nuScenes |
PVG[ | 无 | 22.43 | 0.896 | 0.114 | KITTI |
StreetGaussian[ | LiDAR、2D语义 | 25.79 | 0.844 | 0.081 | KITTI |
DrivingGaussian[ | 无 | 28.36 | 0.851 | 0.256 | nuScenes |
DrivingGaussian[ | LiDAR | 28.74 | 0.865 | 0.237 | nuScenes |
HuGS[ | 2D/3D语义、光流 | 26.81 | 0.866 | 0.059 | KITTI |
DeSiRe-GS[ | LiDAR | 28.87 | 0.901 | 0.106 | KITTI |
图3 NeRF和3DGS在自动驾驶场景中的问题((a) 实况1;(b) NeRF;(c) 实况2;(d) 3DGS)
Fig. 3 Problems of NeRF and 3DGS in autonomous driving scenes ((a) Ground truth 1; (b) NeRF; (c) Ground truth 2; (d) 3DGS)
图4 LiDARF与S-NeRF在nuScenes数据集上的结果对比((a) 实况;(b) LiDARF;(c) S-NeRF)
Fig. 4 Comparison of LiDARF and S-NeRF on nuScenes dataset ((a) Ground truth; (b) LiDARF; (c) S-NeRF)
方法 | 关键技术 | 注解 |
---|---|---|
DR-Gaussian[ | 利用尺度系数s和偏移量t将单目深度Fθ(I)对齐到稀疏点Dsparse,ω归一化特征点可靠性权值 | |
DN-Splatter[ | 利用单目深度Dmono(p)到稀疏点Dsparse(p)的线性回归求解深度尺度系数s和偏移量t,grgb=exp(-▽I)作为绝对尺度可靠性度量 | |
Hierarchy GS[ | 将单目逆深度图D对齐到SfM尺度Dsparse | |
DNGaussian[ | 将深度图分割为小块p,然后利用块内深度均值meanD(p)和标准差stdD(p)归一化深度分布函数 |
表2 单目深度正则核心思想
Table 2 Core ideas of monocular depth regularization
方法 | 关键技术 | 注解 |
---|---|---|
DR-Gaussian[ | 利用尺度系数s和偏移量t将单目深度Fθ(I)对齐到稀疏点Dsparse,ω归一化特征点可靠性权值 | |
DN-Splatter[ | 利用单目深度Dmono(p)到稀疏点Dsparse(p)的线性回归求解深度尺度系数s和偏移量t,grgb=exp(-▽I)作为绝对尺度可靠性度量 | |
Hierarchy GS[ | 将单目逆深度图D对齐到SfM尺度Dsparse | |
DNGaussian[ | 将深度图分割为小块p,然后利用块内深度均值meanD(p)和标准差stdD(p)归一化深度分布函数 |
图5 SplatFormer[34]合成分布之外新视角图像((a) 仰角20°;(b) 仰角40°;(c) 仰角60°;(d) 仰角80°)
Fig. 5 SplatFormer[34] synthesizes novel views out of distribution ((a) Elevation is 20°; (b) Elevation is 40°; (c) Elevation is 60°; (d) Elevation is 80°)
图7 基于可见性的分区策略[1] ((a) 输入数据;(b) 基于相机位置的区域划分;(c) 基于位置的数据选择;(d) 基于可见性的相机选择;(e) 基于覆盖域的点选择;(f) 空域无关解;(g) 空域感知解;(h) 深度模糊产生的漂浮物)
Fig. 7 Visibility-based partitioning strategy[1] ((a) Input data; (b) Camera-position-based region division; (c) Position-based data selection; (d) Visibility-based camera selection; (e) Coverage-based point selection; (f) Native solution: airspace-agnostic; (g) Our solution: airspace-aware; (h) Floaters caused by depth ambiguity)
Scenes | Building | Rubble | Campus | Residence | Sci-Art | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | |
Mega-NeRF[ | 20.92 | 0.547 | 0.454 | 24.06 | 0.553 | 0.508 | 23.42 | 0.537 | 0.636 | 22.08 | 0.628 | 0.401 | 25.60 | 0.770 | 0.312 |
Switch-NeRF[ | 21.54 | 0.579 | 0.397 | 23.41 | 0.562 | 0.478 | 23.62 | 0.541 | 0.616 | 22.57 | 0.654 | 0.352 | 26.51 | 0.795 | 0.271 |
3DGS[ | 22.53 | 0.738 | 0.214 | 25.51 | 0.725 | 0.316 | 23.67 | 0.688 | 0.347 | 22.36 | 0.745 | 0.247 | 24.13 | 0.791 | 0.262 |
VastGaussian[ | 21.80 | 0.728 | 0.225 | 25.20 | 0.742 | 0.264 | 23.82 | 0.695 | 0.329 | 21.01 | 0.699 | 0.261 | 22.64 | 0.761 | 0.261 |
Hierarchy GS[ | 21.52 | 0.723 | 0.297 | 24.64 | 0.755 | 0.284 | |||||||||
DoGaussian[ | 22.73 | 0.759 | 0.204 | 25.78 | 0.765 | 0.257 | 24.01 | 0.681 | 0.377 | 21.94 | 0.740 | 0.244 | 24.42 | 0.804 | 0.219 |
CoSurfGS[ | 22.40 | 0.750 | 0.262 | 25.39 | 0.774 | 0.267 | 23.63 | 0.719 | 0.360 | 22.31 | 0.776 | 0.261 | 23.29 | 0.802 | 0.277 |
表3 NeRF和3DGS方法在大场景数据集上的对比
Table 3 Comparison of NeRF and 3DGS methods on large scene datasets
Scenes | Building | Rubble | Campus | Residence | Sci-Art | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | |
Mega-NeRF[ | 20.92 | 0.547 | 0.454 | 24.06 | 0.553 | 0.508 | 23.42 | 0.537 | 0.636 | 22.08 | 0.628 | 0.401 | 25.60 | 0.770 | 0.312 |
Switch-NeRF[ | 21.54 | 0.579 | 0.397 | 23.41 | 0.562 | 0.478 | 23.62 | 0.541 | 0.616 | 22.57 | 0.654 | 0.352 | 26.51 | 0.795 | 0.271 |
3DGS[ | 22.53 | 0.738 | 0.214 | 25.51 | 0.725 | 0.316 | 23.67 | 0.688 | 0.347 | 22.36 | 0.745 | 0.247 | 24.13 | 0.791 | 0.262 |
VastGaussian[ | 21.80 | 0.728 | 0.225 | 25.20 | 0.742 | 0.264 | 23.82 | 0.695 | 0.329 | 21.01 | 0.699 | 0.261 | 22.64 | 0.761 | 0.261 |
Hierarchy GS[ | 21.52 | 0.723 | 0.297 | 24.64 | 0.755 | 0.284 | |||||||||
DoGaussian[ | 22.73 | 0.759 | 0.204 | 25.78 | 0.765 | 0.257 | 24.01 | 0.681 | 0.377 | 21.94 | 0.740 | 0.244 | 24.42 | 0.804 | 0.219 |
CoSurfGS[ | 22.40 | 0.750 | 0.262 | 25.39 | 0.774 | 0.267 | 23.63 | 0.719 | 0.360 | 22.31 | 0.776 | 0.261 | 23.29 | 0.802 | 0.277 |
数据集 | 方法 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
Panda PC | Instant-NGP | 24.03 | 0.708 | 0.451 |
UniSim | 25.63 | 0.745 | 0.277 | |
NeuRAD | 26.58 | 0.778 | 0.190 | |
Panda 360 | UniSim | 23.50 | 0.692 | 0.330 |
NeuRAD | 25.97 | 0.758 | 0.242 | |
nuScenes | Mip360 | 24.37 | 0.795 | 0.240 |
S-NeRF | 26.21 | 0.831 | 0.228 | |
NeuRAD | 26.99 | 0.815 | 0.225 | |
KITTI MOT | SUDS | 23.12 | 0.821 | 0.135 |
MARS | 24.00 | 0.801 | 0.164 | |
NeuRAD | 27.00 | 0.795 | 0.082 | |
Argo2 | UniSim | 23.22 | 0.661 | 0.412 |
NeuRAD | 26.22 | 0.717 | 0.315 | |
ZOD | UniSim | 27.97 | 0.777 | 0.239 |
NeuRAD | 29.49 | 0.809 | 0.226 |
表4 NeRF在自动驾驶场景中的对比[59]
Table 4 Comparison of NeRF in autonomous driving scenes[59]
数据集 | 方法 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
Panda PC | Instant-NGP | 24.03 | 0.708 | 0.451 |
UniSim | 25.63 | 0.745 | 0.277 | |
NeuRAD | 26.58 | 0.778 | 0.190 | |
Panda 360 | UniSim | 23.50 | 0.692 | 0.330 |
NeuRAD | 25.97 | 0.758 | 0.242 | |
nuScenes | Mip360 | 24.37 | 0.795 | 0.240 |
S-NeRF | 26.21 | 0.831 | 0.228 | |
NeuRAD | 26.99 | 0.815 | 0.225 | |
KITTI MOT | SUDS | 23.12 | 0.821 | 0.135 |
MARS | 24.00 | 0.801 | 0.164 | |
NeuRAD | 27.00 | 0.795 | 0.082 | |
Argo2 | UniSim | 23.22 | 0.661 | 0.412 |
NeuRAD | 26.22 | 0.717 | 0.315 | |
ZOD | UniSim | 27.97 | 0.777 | 0.239 |
NeuRAD | 29.49 | 0.809 | 0.226 |
图10 DeSiRe-GS,S3Gaussian和PVG对比[20] ((a) 渲染的图像;(b) 静态;(c) 动态;(d) 渲染的深度图;(e) 高斯点)
Fig. 10 Comparison of DeSiRe-GS, S3Gaussian and PVG[20] ((a) Rendered image; (b) Static; (c) Dynamic; (d) Rendered depth; (e) Gaussians point)
方法 | 是否GS | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
D-NeRF[ | 否 | 30.50 | 0.95 | 0.07 |
TiNeuVox-B[ | 否 | 32.67 | 0.97 | 0.04 |
Kplanes[ | 否 | 31.61 | 0.97 | |
HexPlane[ | 否 | 32.68 | 0.97 | 0.02 |
FFDNeRF[ | 否 | 32.68 | 0.97 | 0.02 |
MSTH[ | 否 | 31.34 | 0.98 | 0.02 |
3DGS[ | 是 | 23.19 | 0.93 | 0.08 |
RP-4DGS[ | 是 | 34.09 | 0.98 | |
4DGS[ | 是 | 34.05 | 0.98 | 0.02 |
GaGS[ | 是 | 37.36 | 0.99 | 0.01 |
CoGS[ | 是 | 37.90 | 0.98 | 0.02 |
D-3DGS[ | 是 | 39.51 | 0.99 | 0.01 |
表5 部分方法在D-NeRF数据集上的性能对比[64]
Table 5 Performance comparison of selected methods on the D-NeRF dataset[64]
方法 | 是否GS | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
D-NeRF[ | 否 | 30.50 | 0.95 | 0.07 |
TiNeuVox-B[ | 否 | 32.67 | 0.97 | 0.04 |
Kplanes[ | 否 | 31.61 | 0.97 | |
HexPlane[ | 否 | 32.68 | 0.97 | 0.02 |
FFDNeRF[ | 否 | 32.68 | 0.97 | 0.02 |
MSTH[ | 否 | 31.34 | 0.98 | 0.02 |
3DGS[ | 是 | 23.19 | 0.93 | 0.08 |
RP-4DGS[ | 是 | 34.09 | 0.98 | |
4DGS[ | 是 | 34.05 | 0.98 | 0.02 |
GaGS[ | 是 | 37.36 | 0.99 | 0.01 |
CoGS[ | 是 | 37.90 | 0.98 | 0.02 |
D-3DGS[ | 是 | 39.51 | 0.99 | 0.01 |
方法 | 编码方式 | 训练时间 | 迭代次数/K |
---|---|---|---|
NeRF[ | 位置编码 | >12 h | 300 |
PixelNeRF[ | 位置编码 | >12 h | 400 |
Mip-NeRF[ | 集成位置编码 | ≈6 h | 612 |
GRF[ | 位置编码 | ||
Point-NeRF[ | 位置编码 | ≈7 h | 200 |
Instant NGP[ | 哈希编码 | ≈5 min | 256 |
Plenoxels[ | 位置编码 | ≈11 min | 10 |
DVGO[ | 位置编码 | ≈15 min | 20 |
PlenOctree[ | 位置编码 | >12 h |
表6 NeRF方法训练成本对比
Table 6 Comparison of training cost of NeRF methods
方法 | 编码方式 | 训练时间 | 迭代次数/K |
---|---|---|---|
NeRF[ | 位置编码 | >12 h | 300 |
PixelNeRF[ | 位置编码 | >12 h | 400 |
Mip-NeRF[ | 集成位置编码 | ≈6 h | 612 |
GRF[ | 位置编码 | ||
Point-NeRF[ | 位置编码 | ≈7 h | 200 |
Instant NGP[ | 哈希编码 | ≈5 min | 256 |
Plenoxels[ | 位置编码 | ≈11 min | 10 |
DVGO[ | 位置编码 | ≈15 min | 20 |
PlenOctree[ | 位置编码 | >12 h |
方法 | 多视角 | 多帧 | FID↓ | FVD↓ |
---|---|---|---|---|
BEVGen[ | | 25.54 | ||
BEVControl[ | | 24.85 | ||
DriveDreamer[ | | 52.60 | 452 | |
DriveGAN[ | | | 73.40 | 502 |
DrivingDiffusion[ | | 15.89 | ||
DrivingDiffusion[ | | 15.85 | 335 | |
DrivingDiffusion[ | | | 15.83 | 332 |
Panacea[ | | | 16.96 | 139 |
GenAD[ | | | 15.40 | 184 |
表7 生成模型在nuScenes数据集上的性能比较
Table 7 Comparison of the generation models on the nuScenes dataset
方法 | 多视角 | 多帧 | FID↓ | FVD↓ |
---|---|---|---|---|
BEVGen[ | | 25.54 | ||
BEVControl[ | | 24.85 | ||
DriveDreamer[ | | 52.60 | 452 | |
DriveGAN[ | | | 73.40 | 502 |
DrivingDiffusion[ | | 15.89 | ||
DrivingDiffusion[ | | 15.85 | 335 | |
DrivingDiffusion[ | | | 15.83 | 332 |
Panacea[ | | | 16.96 | 139 |
GenAD[ | | | 15.40 | 184 |
[1] | MILDENHALL B, SRINIVASAN P P, TANCIK M, et al.NeRF: representing scenes as neural radiance fields for view synthesis[C]//The 16th European Conference on Computer Vision. Cham:Springer, 2020: 405-421. |
[2] | KERBL B, KOPANAS G, LEIMKUEHLER T, et al.3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139. |
[3] | JOHANSSON R, WILLIAMS D, BERGLUND A, et al.Carsim: a system to visualize written road accident reports as animated 3D scenes[C]//The 2nd Workshop on Text Meaning and Interpretation. New York:ACL, 2004: 57-64. |
[4] | 宋振波.面向自动驾驶的视觉数据生成关键问题研究[D]. 南京:南京理工大学, 2022. |
SONG Z B.Research on visual data generation for autonomous driving[D]. Nanjing:Nanjing University of Science & Technology, 2022 (in Chinese). | |
[5] | 王稚儒, 常远, 鲁鹏, 等.神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13. |
WANG Z R, CHANG Y, LU P, et al.A review on neural radiance fields acceleration[J]. Journal of Graphics, 2024, 45(1): 1-13 (in Chinese). | |
[6] | 朱结, 宋滢.基于可微渲染的自由视点合成方法[J]. 图学学报, 2024, 45(5): 1030-1039. |
ZHU J, SONG Y.A free viewpoint synthesis method based on differentiable rendering[J]. Journal of Graphics, 2024, 45(5): 1030-1039 (in Chinese). | |
[7] | TANCIK M, SRINIVASAN P P, MILDENHALL B, et al.Fourier features let networks learn high frequency functions in low dimensional domains[C]//The 34th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc., 2020: 632. |
[8] | ZWICKER M, PFISTER H, VAN BAAR J, et al.EWA splatting[J]. IEEE Transactions on Visualization and Computer Graphics, 2002, 8(3): 223-238. |
[9] | XIE Z Y, ZHANG J G, LI W Y, et al.S-NeRF: neural radiance fields for street views[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2303.00749. |
[10] | YANG J W, IVANOVIC B, LITANY O, et al.EmerNeRF: emergent spatial-temporal scene decomposition via self- supervision[EB/OL]. [2024-11-26]. https://arxiv.org/abs/2311.02077. |
[11] | OST J, MANNAN F, THUEREY N, et al.Neural scene graphs for dynamic scenes[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 2855-2864. |
[12] | YU A, YE V, TANCIK M, et al.PixelNeRF: neural radiance fields from one or few Images[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 4576-4585. |
[13] | TURKI H, ZHANG J Y, FERRONI F, et al.SUDS: scalable urban dynamic scenes[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 12375-12385. |
[14] | REMATAS K, LIU A, SRINIVASAN P, et al.Urban radiance fields[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 12922-12932. |
[15] | BARRON J T, MILDENHALL B, TANCIK M, et al.Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2022: 5835-5844. |
[16] | CHEN Y R, GU C, JIANG J Z, et al.Periodic vibration Gaussian: dynamic urban scene reconstruction and real-time Rendering[EB/OL]. [2024-03-20]. https://arxiv.org/abs/2311.18561. |
[17] | YAN Y Z, LIN H T, ZHOU C X, et al.Street Gaussians: modeling dynamic urban scenes with Gaussian splatting[C]// The 18th European Conference on Computer Vision. Cham:Springer, 2025: 156-173. |
[18] | ZHOU X Y, LIN Z W, SHAN X J, et al.DrivingGaussian: composite Gaussian splatting for surrounding dynamic autonomous driving scenes[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 21634-21643. |
[19] | ZHOU H Y, SHAO J H, XU L, et al.HUGS: holistic urban 3D scene understanding via Gaussian splatting[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 21336-21345. |
[20] | PENG C S, ZHANG C W, WANG Y X, et al.DeSiRe-GS:4D street Gaussians for static-dynamic decomposition and surface reconstruction for urban driving scenes[EB/OL]. [2024-11-18]. https://arxiv.org/abs/2411.11921. |
[21] | WEI Y, LIU S H, RAO Y M, et al.NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5590-5599. |
[22] | WANG G C, CHEN Z X, LOY C C, et al.SparseNeRF: distilling depth ranking for few-shot novel view synthesis[C]//2023 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2023: 9031-9042. |
[23] | DENG K L, LIU A, ZHU J Y, et al.Depth-supervised NeRF: fewer views and faster training for free[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 12872-12881. |
[24] | PARK J, JOO K, HU Z, et al.Non-local spatial propagation network for depth completion[C]//The 16th European Conference on Computer Vision. Cham:Springer, 2020: 120-136. |
[25] | SUN S L, ZHUANG B B, JIANG Z Y, et al.LiDARF: delving into LiDAR for neural radiance field on street scenes[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 19563-19572. |
[26] | XU Q G, XU Z X, PHILIP J, et al.Point-NeRF: point-based neural radiance fields[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5428-5438. |
[27] | CHUNG J Y, OH J, LEE K M.Depth-regularized optimization for 3D Gaussian splatting in few-shot images[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 811-820. |
[28] | TURKULAINEN M, REN X Q, MELEKHOV L, et al.DN-splatter: depth and normal priors for Gaussian splatting and meshing[C]//2025 IEEE/CVF Winter Conference on Applications of Computer Vision. New York:IEEE Press, 2421-2431. |
[29] | KERBL B, MEULEMAN A, KOPANAS G, et al.A hierarchical 3D Gaussian representation for real-time rendering of very large datasets[J]. ACM Transactions on Graphics (TOG), 2024, 43(4): 62. |
[30] | LI J H, ZHANG J W, BAI X, et al.DNGaussian: optimizing sparse-view 3D Gaussian radiance fields with global-local depth normalization[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 20775-20785. |
[31] | XU W Z, GAO H H, SHEN S H, et al.MVPGS: excavating multi-view priors for Gaussian splatting from sparse input views[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2024: 203-220. |
[32] | ZHU Z H, FAN Z W, JIANG Y F, et al.FSGS: real-time few-shot view synthesis using Gaussian splatting[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2024: 145-163. |
[33] | YIN R H, YUGAY V, LI Y, et al.FewViewGS: Gaussian splatting with few view matching and multi-stage training[EB/OL]. [2024-11-05]. https://arxiv.org/abs/2411.02229. |
[34] | CHEN Y T, MIHAJLOVIC M, CHEN X Y, et al.SplatFormer:point transformer for robust 3D Gaussian splatting[EB/OL]. [2024-11-12]. https://arxiv.org/abs/2411.06390. |
[35] | HUANG N, WEI X B, ZHENG W Z, et al.S3Gaussian:self-supervised street Gaussians for autonomous driving[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2405.20323. |
[36] | JIANG C J, GAO R L, SHAO K L, et al.LI-GS: Gaussian splatting with LiDAR incorporated for accurate large-scale reconstruction[J]. IEEE Robotics and Automation Letters, 2025, 10(2): 1864-1871. |
[37] | KUNG P C, ZHANG X L, SKINNER K A, et al.LiHi-GS: LiDAR-supervised Gaussian splatting for highway driving scene reconstruction[EB/OL]. [2024-12-26]. https://arxiv.org/abs/2412.15447. |
[38] | ZHANG K, RIEGLER G, SNAVELY N, et al.NeRF++: analyzing and improving neural radiance fields[EB/OL]. [2024-11-27]. http://arxiv.org/abs/2010.07492. |
[39] | BARRON J T, MILDENHALL B, VERBIN D, et al. Mip-NeRF 360: unbounded anti-aliased neural radiance fields[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5460-5469. |
[40] | TURKI H, RAMANAN D, SATYANARAYYANAN.Mega-NeRF: scalable construction of large-scale NeRFs for virtual fly-throughs[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 12912-12921. |
[41] | TANCIK M, CASSER V, YAN X C, et al.Block-NeRF: scalable large scene neural view synthesis[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 8238-8248. |
[42] | MI Z X, XU D.Switch-NeRF: learning scene decomposition with mixture of experts for large-scale neural radiance fields[EB/OL]. [2024-11-26]. https://openreview.net/forum?id=PQ2zoIZqvm. |
[43] | 董相涛, 马鑫, 潘成伟, 等.室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(5): 631-649. |
DONG X T, MA X, PAN C W, et al.A review of neural radiance fields for outdoor large scenes[J]. Journal of Graphics, 2024, 45(4): 631-649 (in Chinese). | |
[44] | XIANGLI Y B, XU L N, PAN X G, et al.BungeeNeRF: progressive neural radiance field for extreme multi-scale scene rendering[C]//The 17th European Conference on Computer Vision. Cham:Springer, 2022: 106-122. |
[45] | XU L N, XIANGLI Y B, PENG S D, et al.Grid-guided neural radiance fields for large urban scenes[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 8296-8306. |
[46] | LIN J Q, LI Z H, TANG X, et al.VastGaussian: vast 3D Gaussians for large scene reconstruction[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 5166-5175. |
[47] | CHEN J Y, YE W C, WANG Y F, et al.GigaGS:scaling up planar-based 3D Gaussians for large scene surface reconstruction[EB/OL]. [2024-09-10]. https://arxiv.org/abs/2409.06685. |
[48] | CHEN Y, LEE F H.DoGaussian:distributed-oriented Gaussian splatting for large-scale 3D reconstruction via Gaussian consensus[EB/OL]. [2024-11-26]. https://arxiv.org/abs/2405.13943. |
[49] | FAN J X, LI W H, HAN Y F, et al.Momentum-GS: momentum Gaussian self-distillation for high-quality large scene reconstruction[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2412.04887. |
[50] | GAO Y Y, DAI Y L, LI H, et al.CoSurfGS:collaborative 3D surface Gaussian splatting with distributed learning for large scene reconstruction[EB/OL]. [2024-12-23]. https://arxiv.org/abs/2412.17612. |
[51] | MARTIN-BRUALLA R, RADWAN N, SAJJADI M S M, et al.NeRF in the wild: neural radiance fields for unconstrained photo collections[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 7206-7215. |
[52] | CHEN D P, LI H, YE W C, et al.PGSR: planar-based Gaussian splatting for efficient and high-fidelity surface reconstruction[EB/OL]. [2024-06-10]. https://arxiv.org/abs/2406.06521. |
[53] | PUMAROLA A, CORONA E, PONS-MOLL G, et al.D-NeRF: neural radiance fields for dynamic scenes[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 10313-10322. |
[54] | LI Z Q, NIKLAUS S, SNAVELY N, et al.Neural scene flow fields for space-time view synthesis of dynamic scenes[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 6494-6504. |
[55] | GAO C, SARAF A, KOPF J, et al.Dynamic view synthesis from dynamic monocular video[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5692-5701. |
[56] | PARK K, SINHA U, BARRON J T, et al.NeRFIES: deformable neural radiance fields[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5845-5854. |
[57] | ZHANG J B, LI X Y, WAN Z Y, et al.FDNeRF: few-shot dynamic neural radiance fields for face reconstruction and expression editing[C]//SIGGRAPH Asia 2022 Conference Papers. New York:ACM, 2022: 12. |
[58] | ZHANG B Y, XU W B, ZHU Z, et al.detachable novel views synthesis of dynamic scenes using distribution-driven neural radiance fields[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2301.00411. |
[59] | TONDERSKI A, LINDSTRÖM C, HESS G, et al.NeuRAD: neural rendering for autonomous driving[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 14895-14904. |
[60] | YANG Z Y, GAO X Y, ZHOU W, et al.Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 20331-20341. |
[61] | WU G J, YI T R, FANG J M, et al.4D Gaussian splatting for real-time dynamic scene rendering[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 20310-20320. |
[62] | KRATIMENOS A, LEI J H, DANIILIDIS K.DynMF: neural motion factorization for real-time dynamic view synthesis with 3D Gaussian splatting[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2025: 252-269. |
[63] | YANG Z Y, YANG H Y, PAN Z J, et al.Real-time photorealistic dynamic scene representation and rendering with 4D Gaussian splatting[EB/OL]. [2024-02-22]. https://arxiv.org/abs/2310.10642. |
[64] | 曹振中, 光金正, 张千一, 等.基于3D高斯溅射的3维重建技术综述[J]. 机器人, 2024, 46(5): 611-622. |
CAO Z Z, GUANG J Z, ZHANG Q Y, et al.Survey of 3D reconstruction techniques based on 3D Gaussian splatting[J]. Robot, 2024, 46(5): 611-622 (in Chinese). | |
[65] | FANG J M, YI T R, WANG X G, et al.Fast dynamic radiance fields with time-aware neural voxels[C]//SIGGRAPH Asia 2022 Conference Papers. New York:ACM, 2022: 11. |
[66] | FRIDOVICH-KEIL S, MEANTI G, WARBURG F R, et al.K-planes: explicit radiance fields in space, time, and appearance[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 12479-12488. |
[67] | CAO A, JOHNSON J.HexPlane: a fast representation for dynamic scenes[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2023: 130-141. |
[68] | GUO X, SUN J D, DAI Y C, et al.Forward flow for novel view synthesis of dynamic scenes[C]//2023 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2023: 15976-15987. |
[69] | WANG F, CHEN Z L, WANG G K, et al.Masked space-time hash encoding for efficient dynamic scene reconstruction[C]// The 37th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc., 2024: 3089. |
[70] | LU Z C, GUO X, HUI L, et al.3D Geometry-aware deformable Gaussian splatting for dynamic view synthesis[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 8900-8910. |
[71] | YU H, JULIN J, MILACSKI Z Á, et al.CoGS: controllable Gaussian splatting[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 21624-21633. |
[72] | MÜLLER T, EVANS A, SCHIED C, et al.Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 102. |
[73] | FRIDOVICH-KEIL S, YU A, TANCIK M, et al.Plenoxels: radiance fields without neural networks[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5491-5500. |
[74] | TREVITHICK A, YANG B.GRF: learning a general radiance field for 3D representation and rendering[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 15162-15172. |
[75] | SUN C, SUN M, CHEN H T.Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2022: 5449-5459. |
[76] | YU A, LI R L, TANCIK M, et al.PlenOctrees for real-time rendering of neural radiance fields[C]//2021 IEEE/CVF International Conference on Computer Vision. New York:IEEE Press, 2021: 5732-5741. |
[77] | REN K R, JIANG L H, LU T, et al.Octree-GS:towards consistent real-time rendering with LOD-structured 3D Gaussians[EB/OL]. [2024-10-17]. https://arxiv.org/abs/2403.17898. |
[78] | SWERDLOW A, XU R S, ZHOU B L.Street-view image generation from a bird’s-eye view layout[J]. IEEE Robotics and Automation Letters, 2024, 9(4): 3578-3585. |
[79] | YANG K R, MA E H, PENG J B, et al.BEVcontrol: accurately controlling street-view elements with multi-perspective consistency via BEV sketch layout[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2308.01661. |
[80] | WANG X F, ZHU Z, HUANG G, et al.DriveDreamer: towards real-world-driven world models for autonomous driving[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2309.09777. |
[81] | KIM S W, PHILION J, TORRALBA A, et al.DriveGAN: towards a controllable high-quality neural simulation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2021: 5816-5825. |
[82] | LI X F, ZHANG Y F, YE X Q.DrivingDiffusion: layout-guided multi-view driving scenarios video generation with latent diffusion model[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2025: 469-485. |
[83] | WEN Y Q, ZHAO Y C, LIU Y F, et al.Panacea: panoramic and controllable video generation for autonomous driving[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 6902-6912. |
[84] | HU A, RUSSELL L, YEO H, et al.GAIA-1:a generative world model for autonomous driving[EB/OL]. [2024-12-29]. https://arxiv.org/abs/2309.17080. |
[85] | JIA F, MAO W X, LIU Y F, et al.ADriver-I: a general world model for autonomous driving[EB/OL]. [2024-11-22]. https://arxiv.org/abs/2311.13549. |
[86] | BOGDOLL D, YANG Y T, JOSEPH T, et al.MUVO: a multimodal generative world model for autonomous driving with geometric representations[EB/OL]. [2024-11-26]. https://arxiv.org/abs/2311.11762. |
[87] | ZHENG W Z, CHEN W L, HUANG Y H, et al.OccWorld: learning a 3D occupancy world model for autonomous driving[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2025: 55-72. |
[88] | WANG X F, ZHU Z, HUANG G, et al.WorldDreamer: towards general world models for video generation via predicting masked tokens[EB/OL]. [2024-11-18]. https://arxiv.org/abs/2401.09985. |
[89] | LI Q F, JIA X S, WANG S B, et al.Think2Drive:efficient reinforcement learning by thinking in latent world model for quasi-realistic autonomous driving (in CARLA-v2)[EB/OL]. [2024-07-20]. https://arxiv.org/abs/2402.16720. |
[90] | ZHENG W Z, SONG R Q, GUO X D, et al.GenAD: generative end-to-end autonomous driving[C]//The 18th European Conference on Computer Vision. Cham:Springer, 2024: 87-104. |
[91] | GUAN Y C, LIAO H C, LI Z N, et al.World models for autonomous driving: an initial survey[EB/OL]. (2024-05-08) [2024-11-26]. https://doi.org/10.1109/TIV.2024.3398357. |
[92] | YANG J Z, GAO S Y, QIU Y H, et al.Generalized predictive model for autonomous driving[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 14662-14672. |
[93] | YANG Z P, CHAI Y N, ANGUELOV D, et al.SurfelGAN: synthesizing realistic sensor data for autonomous driving[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2020: 11115-11124. |
[94] | LIU X Y, XUE H, LUO K M, et al.GenN2N: generative NeRF2NeRF translation[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE Press, 2024: 5105-5114. |
[95] | HUANG Y Z, LI Z, CHEN Z, et al.OrientDream:streamlining text-to-3D generation with explicit orientation control[EB/OL]. [2024-06-14]. https://arxiv.org/abs/2406.10000. |
[96] | YAN J B, ZHAO A L, HU Y X.Dragen3D: multiview geometry consistent 3D Gaussian generation with drag-based control[EB/OL]. [2024-11-27]. https://arxiv.org/abs/2502.16475. |
[1] | 王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509. |
[2] | 周峥, 戴亚桥, 易任娇, 蓝龙, 朱晨阳. 基于RGB特征的下一个最优视图导航技术[J]. 图学学报, 2025, 46(3): 551-557. |
[3] | 邱佳新, 宋倩云, 徐丹. 基于改进神经辐射场的民族舞蹈重建方法[J]. 图学学报, 2025, 46(2): 415-424. |
[4] | 吴磊, 盛芹芹, 赵睿思. 汽车自动驾驶接管系统人机界面用户体验评价研究[J]. 图学学报, 2025, 46(2): 459-468. |
[5] | 谢文想, 许威威. 辐射场表面物点引导的主动视图选择[J]. 图学学报, 2025, 46(1): 179-187. |
[6] | 董相涛, 马鑫, 潘成伟, 鲁鹏. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649. |
[7] | 王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13. |
[8] | 成欢, 王硕, 李孟, 秦伦明, 赵芳. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103. |
[9] | 范腾, 杨浩, 尹稳, 周冬明. 基于神经辐射场的多尺度视图合成研究[J]. 图学学报, 2023, 44(6): 1140-1148. |
[10] | 杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型[J]. 图学学报, 2023, 44(5): 955-965. |
[11] | 史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530. |
[12] | 常远, 盖孟. 基于神经辐射场的视点合成算法综述[J]. 图学学报, 2021, 42(3): 376-384. |
[13] | 温利龙, 徐 丹, 张 熹, 钱文华. 基于生成模型的古壁画非规则破损部分修复方法[J]. 图学学报, 2019, 40(5): 925-931. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||