Research Analyzer
← Back ICRA 2026

TeNet: Text-To-Network for Compact Policy Synthesis

Ariyan Bighashdel, Kevin Sebastian Luck

PDF

AI summary

Key figure (auto-extracted from paper)
TeNet directly generates compact, high-frequency robot policies from natural language using text-conditioned hypernetworks, outperforming sequence-based baselines while being orders of magnitude smaller.
Text-to-Network Hypernetworks Robot Control Language Conditioning Policy Synthesis Real-time Control

Problem

Current language-conditioned robotic controllers are either too large and slow for real-time deployment or require demonstration prompts at inference. This creates a gap between expressive language interfaces and efficient, deployable low-level control.

Approach

TeNet conditions a hypernetwork on pretrained text embeddings to instantly generate the parameters of a task-specific policy, optionally aligning text with trajectory data during training to improve generalization.

Key results

  • Achieves strong performance on MuJoCo and Meta-World benchmarks
  • Generates policies with ~40K parameters, orders of magnitude smaller than baselines
  • Enables high-frequency control (>9 kHz) without requiring demonstrations at inference
  • Language-grounded training improves generalization to unseen tasks

Why it matters

Provides a practical, deployable framework for real-time robot control that bridges expressive natural language interfaces with efficient low-level execution.

Abstract

Robots that follow natural-language instructions typically rely on either high-level planners with hand-designed interfaces or large end-to-end models that are difficult to deploy for real-time control. We propose TeNet (Text-to-Network), a framework that instantiates compact, task-specific policies directly from natural language. TeNet conditions a hypernet- work on embeddings from a pretrained language model to generate a fully executable policy, which operates solely on low- dimensional state inputs at high control frequencies. By using language only once at policy instantiation, TeNet combines the expressiveness of large language models with efficient execution. To improve generalization, we optionally ground language in behavior during training, without requiring demonstrations at inference. Experiments on MuJoCo and Meta-World show that TeNet produces policies that are orders of magnitude smaller than sequence-based baselines, while achieving strong performance in both multi-task and meta-learning settings and enabling high-frequency control. These results demonstrate that text-conditioned hypernetworks provide a practical approach for compact, language-driven robot control.

Index terms

Machine Learning for Robot Control Imitation Learning AI-Based Methods

Related papers