CATALYST: Cognitive-To-Autonomy-Inspired Two-Stage Training Data Generation with Local-System-Aware Selection Technique
Taehoon Kim, Sehoon Oh
AI summary
Problem
Conventional learning-based robotic dynamics modeling relies on random or uniform data sampling, which often misses dynamically critical regions and degrades model performance and generalization.
Approach
The CATALYST framework first identifies optimal local model centers using CAD-derived inertia matrices, then optimizes excitation trajectories to visit these centers while enforcing physical constraints and maximizing informative velocity-acceleration statistics.
Key results
- Identifies optimal GMM cluster centers using CAD inertia priors
- Generates operating-point-centered excitation trajectories satisfying RoM and statistical constraints
- Achieves lower torque regression error than Spread, RoM, Tukey-chirp, and cubic baselines
- Delivers more reliable feedforward control performance in simulation
Why it matters
It provides a principled, physics-informed pipeline for training data design that enhances sample efficiency and control accuracy for data-driven robotic dynamics modeling.
Abstract
In conventional learning-based robotic dynamics modeling, physical information is mostly incorporated into the model or loss function, while the design of training data often relies on random sampling or uniform coverage, which can limit performance. To address this gap, this paper proposes the Cognitive-to-Autonomy-inspired Two-stage trAining data generation with Local-sYstem-aware Selection Technique (CAT- ALYST) framework, which generates optimal training data based on physics priors and the modeling structure of the chosen learning model. Stage 1 uses the CAD-derived inertia matrix M(q) to approximate the joint distribution of [q, M] with a Probabilistic Local Model (PLM), thereby identifying the optimal locations for the local model centers (μopt k ). Stage 2 then optimizes an Operating-Point-Centered Excitation Trajec- tory (OPCET). This optimization simultaneously (i) aligns the trajectory with the target operating points (lm), (ii) enforces range-of-motion (RoM) constraints (lr), and (iii) achieves de- sirable velocity–acceleration statistics (large volume, isotropy, low correlation, captured by ls). We validate the approach in simulation using a 3-DoF yaw–pitch–pitch manipulator, which allows visual demonstration of the process and outcomes. We then analyze the framework step by step. Results show that each stage meets its objective. A PLM trained on data generated by the proposed trajectories outperforms baselines (Spread/RoM, ill-centered, Tukey-windowed chirp, and cubic) in both torque regression and control. Thus, CATALYST yields more accurate regression and more reliable feedforward control than conventional designs.