Efficient Trajectory-Conditioned Text-to-4D Gaussian Splatting
Lin Shao, Fan Lu, Haiyun Wei, Sanqing Qu, Alois Knoll, Guang Chen
AI summary
Problem
Existing trajectory-conditioned 4D generation methods either lack spatial controllability or suffer from prohibitively slow generation speeds due to their reliance on implicit NeRF representations.
Approach
The method decomposes complex trajectories into segments and optimizes a HexPlane-based deformation field only on sparse control nodes, using k-Nearest Neighbor interpolation to efficiently drive the full 4D Gaussian Splatting model.
Key results
- 13× faster generation (26h to 2h)
- Superior dynamic quality and visual fidelity over prior SOTA
- Robust handling of long trajectories via chunk-based decomposition
- Enhanced multi-view consistency via temporal perturbation sampling
Why it matters
It enables rapid, controllable synthesis of dynamic 4D assets for VR, gaming, and simulation, removing the computational bottlenecks of prior implicit methods.
Abstract
Recent text-to-4D generation methods have achieved remarkable progress thanks to advances in text-to- video models. Existing approaches typically reconstruct 4D scenes from generated videos or distill them from pre-trained text-to-video models. However, these methods often restrict the scene to a local region or lack spatial controllability. TC4D pioneered trajectory-controllable 4D asset generation by decomposing motion into global transformation and local deformation. While it achieves high visual quality, TC4D suffers from extremely low generation efficiency due to its NeRF-based framework. To overcome this limitation, we propose Efficient TC4DGS, which replaces NeRF with 4D Gaussian Splatting (4DGS) to significantly improve efficiency. Nevertheless, the discrete representation of 4DGS makes optimization challeng- ing, leading to noticeable degradation in visual and motion quality. Thus, we propose a HexPlane-based 4D representation combined with a key-node control scheme. By computing the deformation only for the control nodes and getting overall de- formation through interpolation, we greatly improve generation efficiency while maintaining quality. Compared with TC4D, the previous SOTA, we have improved the generation efficiency by 13× (reducing the generation time from 26 hours to 2 hours), while also achieving superior performance in terms of the dynamic quality of the generated objects.