OVITA: Open-Vocabulary Interpretable Trajectory Adaptations
Anurag Maurya, TASHMOY GHOSH, Anh Nguyen, Ravi Prakash
AI summary
Problem
Adapting robot trajectories to dynamic environments and user preferences remains difficult for non-experts due to the need for complex parameterization and rigid command structures. Existing methods often lack interpretability, require extensive training data, or fail to support precise, open-ended instructions.
Approach
OVITA leverages multiple pre-trained LLMs to translate open-vocabulary natural language instructions into executable Python code that modifies trajectory waypoints, followed by a quadratic programming module to ensure safety and smoothness. An integrated code explainer and iterative feedback loop allow non-expert users to understand and refine adaptations intuitively.
Key results
- Supports exact numerical, open-ended, and multi-step language commands without fine-tuning
- Generates executable Python code as an interpretable adaptation policy
- Validated across diverse tasks on heterogeneous platforms including manipulators, ground robots, and drones
- Integrates a QP-based constraint module to guarantee physically feasible and smooth trajectories
Why it matters
Bridges the gap between high-level human intent and low-level robotic control, enabling scalable, interpretable, and real-time trajectory adaptation for non-expert users in dynamic environments.
Abstract
Adapting trajectories to dynamic situations and user preferences is crucial for robot operation in unstructured environments with non-expert users. Natural language enables users to express these adjustments in an interactive manner. We introduce OVITA, an interpretable, open-vocabulary, language- driven framework designed for adapting robot trajectories in dynamic and novel situations based on human instructions. OVITA leverages multiple pre-trained Large Language Models (LLMs) to integrate user commands into trajectories generated by motion planners or those learned through demonstrations. OVITA employs code as an adaptation policy generated by an LLM, enabling users to adjust individual waypoints, thus providing flexible control. Another LLM, which acts as a code explainer, removes the need for expert users, enabling intuitive interactions. The efficacy and significance of the proposed OVITA framework is demonstrated through extensive simulations and real-world environments with diverse tasks involving spatiotem- poral variations on heterogeneous robotic platforms such as a KUKA IIWA robot manipulator, Clearpath Jackal ground robot, and CrazyFlie drone.