TeNet: Text-To-Network for Compact Policy Synthesis
Ariyan Bighashdel, Kevin Sebastian Luck
AI summary
Problem
Current language-conditioned robotic controllers are either too large and slow for real-time deployment or require demonstration prompts at inference. This creates a gap between expressive language interfaces and efficient, deployable low-level control.
Approach
TeNet conditions a hypernetwork on pretrained text embeddings to instantly generate the parameters of a task-specific policy, optionally aligning text with trajectory data during training to improve generalization.
Key results
- Achieves strong performance on MuJoCo and Meta-World benchmarks
- Generates policies with ~40K parameters, orders of magnitude smaller than baselines
- Enables high-frequency control (>9 kHz) without requiring demonstrations at inference
- Language-grounded training improves generalization to unseen tasks
Why it matters
Provides a practical, deployable framework for real-time robot control that bridges expressive natural language interfaces with efficient low-level execution.
Abstract
Robots that follow natural-language instructions typically rely on either high-level planners with hand-designed interfaces or large end-to-end models that are difficult to deploy for real-time control. We propose TeNet (Text-to-Network), a framework that instantiates compact, task-specific policies directly from natural language. TeNet conditions a hypernet- work on embeddings from a pretrained language model to generate a fully executable policy, which operates solely on low- dimensional state inputs at high control frequencies. By using language only once at policy instantiation, TeNet combines the expressiveness of large language models with efficient execution. To improve generalization, we optionally ground language in behavior during training, without requiring demonstrations at inference. Experiments on MuJoCo and Meta-World show that TeNet produces policies that are orders of magnitude smaller than sequence-based baselines, while achieving strong performance in both multi-task and meta-learning settings and enabling high-frequency control. These results demonstrate that text-conditioned hypernetworks provide a practical approach for compact, language-driven robot control.