← Back ICRA 2026

Efficient Trajectory-Conditioned Text-to-4D Gaussian Splatting

Lin Shao, Fan Lu, Haiyun Wei, Sanqing Qu, Alois Knoll, Guang Chen

PDF

AI summary

Key figure (auto-extracted from paper)

Efficient TC4DGS replaces NeRF with 4D Gaussian Splatting and sparse control nodes to generate trajectory-conditioned 4D objects 13× faster with superior visual and motion quality.

Trajectory-conditioned generation 4D Gaussian Splatting Text-to-4D HexPlane deformation Sparse control nodes Dynamic scene synthesis

Problem

Existing trajectory-conditioned 4D generation methods either lack spatial controllability or suffer from prohibitively slow generation speeds due to their reliance on implicit NeRF representations.

Approach

The method decomposes complex trajectories into segments and optimizes a HexPlane-based deformation field only on sparse control nodes, using k-Nearest Neighbor interpolation to efficiently drive the full 4D Gaussian Splatting model.

Key results

13× faster generation (26h to 2h)
Superior dynamic quality and visual fidelity over prior SOTA
Robust handling of long trajectories via chunk-based decomposition
Enhanced multi-view consistency via temporal perturbation sampling

Why it matters

It enables rapid, controllable synthesis of dynamic 4D assets for VR, gaming, and simulation, removing the computational bottlenecks of prior implicit methods.

Abstract

Recent text-to-4D generation methods have achieved remarkable progress thanks to advances in text-to- video models. Existing approaches typically reconstruct 4D scenes from generated videos or distill them from pre-trained text-to-video models. However, these methods often restrict the scene to a local region or lack spatial controllability. TC4D pioneered trajectory-controllable 4D asset generation by decomposing motion into global transformation and local deformation. While it achieves high visual quality, TC4D suffers from extremely low generation efficiency due to its NeRF-based framework. To overcome this limitation, we propose Efficient TC4DGS, which replaces NeRF with 4D Gaussian Splatting (4DGS) to significantly improve efficiency. Nevertheless, the discrete representation of 4DGS makes optimization challeng- ing, leading to noticeable degradation in visual and motion quality. Thus, we propose a HexPlane-based 4D representation combined with a key-node control scheme. By computing the deformation only for the control nodes and getting overall de- formation through interpolation, we greatly improve generation efficiency while maintaining quality. Compared with TC4D, the previous SOTA, we have improved the generation efficiency by 13× (reducing the generation time from 26 hours to 2 hours), while also achieving superior performance in terms of the dynamic quality of the generated objects.

Index terms

Simulation and Animation