EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

Dong In Lee1, Hyeongcheol Park1, Jiyoung Seo1, Eunbyung Park2,
Hyunje Park1, Ha Dam Baek1, Shin Sangheon3, Sangmin Kim3, Sangpil Kim1


1Korea University 2Sungkyunkwan University 3Hanhwa Systems

Overview

Teaser
Editsplat Overview. EditSplat consists of two main methods: (1) Multi-view Fusion Guidance (MFG), which aligns multi-view information with text prompts and source images to ensure multi-view consistency; (2) Attention-Guided Trimming (AGT), which prunes pre-trained Gaussians for optimization efficiency and selectively optimizes Gaussians for semantic local editing.

Abstract

Recent advancements in 3D editing have highlighted the potential of text-driven methods in real-time, user-friendly AR/VR applications. However, current methods rely on 2D diffusion models without adequately considering multi-view information, resulting in multi-view inconsistency. While 3D Gaussian Splatting (3DGS) significantly improves rendering quality and speed, its 3D editing process encounters difficulties with inefficient optimization, as pre-trained Gaussians retain excessive source information, hindering optimization. To address these limitations, we propose EditSplat, a novel 3D editing framework that integrates Multi-view Fusion Guidance (MFG) and Attention-Guided Trimming (AGT). Our MFG ensures multi-view consistency by incorporating essential multi-view information into the diffusion process, leveraging classifier-free guidance from the text-to-image diffusion model and the geometric properties of 3DGS. Additionally, our AGT leverages the explicit representation of 3DGS to selectively prune and optimize 3D Gaussians, enhancing optimization efficiency and enabling precise, semantically rich local edits. Through extensive qualitative and quantitative evaluations, EditSplat achieves superior multi-view consistency and editing quality over existing methods, significantly enhancing overall efficiency.

Rendered Video Results

EditSplat . demonstrates its capability for flexible and high-quality 3D scene editing.


Video Sync
Source
"Turn the bear statue into wild boar"
"Turn the bear statue into polar bear"
"Turn the bear statue into a metallic robot"

Source
"Make it autumn"
"Turn the ground into a namibian desert"
"Make the entire scene look as if it's painted in watercolor style"

Source
"Change the bonsai to look like it's made of paper, folded intricate origami shapes"
"Change the bonsai flowers into autumn leaves"
"Make the bonsai snowy"

Source
"Turn him into Harry Potter"
"Make his face resemble that of a marble sculpture"
"Make him appear like paper with folded edges"

Source
"Turn him into Van Gogh"
"Turn him into Steve Jobs"
"Turn him into a Pixar character"

Source
"Make it autumn"
"Make the scene look foggy"
"Make the scene appear as though it's underwater"

Source
"Make him wear a suit"
"Turn him into a Minecraft character"
"Turn him into a robot"

Source
"Turn the horse statue into a jade carving"
"Turn the horse statue into a wooden carving"
"Make the stone horse appear as a metallic robot horse"


Sync settings are enabled, so videos might not play immediately 😔
Please refresh the page! Thank you for your patience 😊





Qualitative Comparison


📷 Click on video to play or pause. Move mouse to change the hover 📷

Scene Editing Results Comparison Slider
Teaser
Qualitative Comparison. EditSplat provides more intense and precise editing compared to other baselines. The leftmost column shows source images, while the right columns show rendering images from edited 3DGS. In each corner of the images, we include different views of the corresponding image to compare multi-view consistency. Note that our EditSplat outperforms both local and global editing.

Quantitative Comparison

Teaser
Quantitative Comparison. CLIPdir: CLIPtext-image direction similarity; CLIPsim: CLIPtext-image similarity. Userstudy conducted to evaluate human preference.

BibTeX

@article{lee2024editsplat,
      title={EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting},
      author={Dong In Lee and Hyeongcheol Park and Jiyoung Seo and Eunbyung Park and Hyunje Park and Ha Dam Baek and Shin Sangheon and Sangmin Kim and Sangpil Kim},
      journal={arXiv preprint arXiv:2412.11520},
      year={2024},
}