
CATSplat takes an image and predicts 3D Gaussian primitives to construct a scene-representative 3D radiance field in a single
forward pass.
In this paradigm, our primary goal is to go beyond the finite knowledge inherent in a single image with our two
innovative priors.
Through cross-attention layers, we enhance image features to be highlyinformative by incorporating valuable
insights: contextual cues from text features, and spatial cues from 3D point features.