Diverse Skill Discovery in Fourier Latent Space Via Unsupervised Learning
Ruopeng Cui, Yucong Sun, Xizhou Bu, Wang Chao, Wei Li
AI summary
Problem
Existing unsupervised skill discovery methods measure diversity using single-step states, which ignores trajectory phase coherence, disrupts motion smoothness, and limits the discovery of transitional behaviors.
Approach
FLSD employs a Periodic Autoencoder to map robot motion sequences into a Fourier latent space, using phase-aware features to measure diversity and guide a mutual-information-based reward for training a versatile locomotion policy.
Key results
- Reduces high-frequency motion jitter by 73%
- Increases state space coverage by 133%
- Discovers varied gaits including three-legged locomotion
- Enables reliable real-world task execution via high-level orchestration
Why it matters
It eliminates manual reward engineering and task-specific data requirements, providing a scalable framework for autonomous locomotion skill acquisition in complex robotic systems.
Abstract
Unsupervised skill discovery acquires a diverse repertoire of skills through intrinsic motivation, offering the potential to alleviate the labor-intensive reward engineering in reinforcement learning and the reliance on costly task-specific data in imitation learning. However, such methods typically measure diversity based on single-step states, neglecting the trajectory phase coherence, whose absence disrupts the smooth- ness of state transitions. In this work, we explore skills in Fourier latent space via a simple mutual-information-based reward function, aiming to train a single versatile policy capable of executing diverse state transition patterns. Specifically, we utilize a spatio-temporal representation learned through a Periodic Autoencoder, which effectively captures the periodic or quasi-periodic nature of motion. These features, rather than raw states, are used to measure skill diversity. We validate our method on the 12-DOF quadruped robot Unitree A1, achieving varied gaits. Simulation results show that our method reduces high-frequency power by 73%, while improving state space coverage by 133% compared to the baseline. To accomplish specific tasks, we trained a high-level controller to orchestrate the learned skills, which improves training efficiency. Real-world experiments demonstrate that the learned skills can reliably execute tasks.