Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2025, Vol. 46 ›› Issue (2): 312-321.DOI: 10.11996/JG.j.2095-302X.2025020312

• Computer Graphics and Virtual Reality • Previous Articles     Next Articles

3D Gaussian splatting semantic segmentation and editing based on 2D feature distillation

LIU Gaoyi1(), HU Ruizhen2(), LIU Ligang1   

  1. 1. School of Mathematical Sciences, University of Science and Technology of China, Hefei Anhui 230026, China
    2. College of Computer Science & Software Engineering, Shenzhen University, Shenzhen Guangdong 518060, China
  • Received:2024-08-22 Accepted:2024-12-22 Online:2025-04-30 Published:2025-04-24
  • Contact: HU Ruizhen
  • About author:First author contact:

    LIU Gaoyi (1998-), master student. His main research interest covers computer graphics. E-mail:liugaoyi@mail.ustc.edu.cn

  • Supported by:
    National Natural Science Foundation of China(62025207)

Abstract:

Semantic understanding of 3D scenes constitutes one of the fundamental ways humans perceive the world. Some semantic tasks, such as open vocabulary segmentation, and semantic editing, are essential research domains in computer vision and computer graphics. However, the absence of large and diverse segmentation datasets of 3D open vocabulary makes it challenging to directly train a robust and generalizable model. To address this issue, 3D Gaussian splatting based on 2D feature distillation was proposed, which distills semantic embeddings from the SAM and CLIP macromodels into 3D Gaussians. For each scene, pixel-wise semantic features were obtained via SAM and CLIP, and training was conducted using 3D Gaussian differentiable rendering to generate a scene-specific semantic feature field. In the semantic segmentation task, in order to obtain the accurate segmentation boundary of each object in the scene, a multi-step segmentation mask selection strategy was designed to obtain the accurate open vocabulary semantic segmentation for the new perspective images without requiring the tedious hierarchical feature extraction and training processes. Through explicit 3D Gaussian scene representations, the correspondence between text and 3D objects was effectively established, enabling semantic editing. Experiments demonstrated that the method achieved comparable or superior qualitative and quantitative results in semantic segmentation tasks compared to existing methods, while enabling open vocabulary semantic editing through a 3D Gaussian semantic feature field.

Key words: 3D scene, 3D Gaussian splatting, semantic segmentation, feature field, open vocabulary semantic editing

CLC Number: