← Back ICRA 2026

Prompt-To-State Stable Vision-Language MPC for Approximated Neural Network Dynamics a Case Study on Soft Robot Control

Nicotra Emanuele, James J. Davies, Kefan Zhu, Sharma Bibhu, Adrienne Ji, Phuoc Thien Phan, Hung Manh La, Nigel Hamilton Lovell, Thanh Nho Do

PDF

AI summary

Key figure (auto-extracted from paper)

The proposed PSS-VLMPC framework guarantees closed-loop stability for vision-language model-guided control by rigorously bounding approximation errors and scaling terminal costs, enabling safe natural language robot control.

Vision-Language Models Model Predictive Control Prompt-to-State Stability Neural Network Dynamics Soft Robotics Closed-Loop Stability

Problem

Deploying vision-language models in closed-loop robotic control lacks formal safety and stability guarantees, particularly when system dynamics are approximated by neural networks that introduce compounding prediction errors in model predictive control.

Approach

The authors introduce a two-loop architecture where a vision-language model translates natural language and visual feedback into MPC parameters, while a lower-level MPC uses a Taylor-expanded neural network dynamics model with a rigorously computed terminal cost weight to guarantee stability despite approximation errors.

Key results

Formal definition of Prompt-to-State Stability guaranteeing closed-loop stability under arbitrary prompts
Derivation of a computable terminal cost weight ensuring Input-to-State Stability despite neural network approximation errors
Development of a two-loop PSS-VLMPC framework that safely translates natural language commands into MPC parameters
Validation via simulation and real-world experiments on a soft continuum robot executing language-specified tasks

Why it matters

Enables safe, stable, and interpretable integration of large vision-language models into real-time robotic control loops for complex, hard-to-model systems.

Abstract

The integration of large-scale foundation models in control loops proven to be effective for executing complex tasks from natural language inputs. However, ensuring stability and real-time performance remains a significant challenge when such models are used, especially for systems with hard-to-model dynamics. In this paper we introduce the concept of Prompt-to- State Stability (PSS) and we present the Prompt-to-State Stable Vision-Language Model Predictive Control (PSS-VLMPC), a novel framework that integrates a VLM with a robust MPC. We use the VLM to interpret user commands and visual feedback, translating them into parameters for the MPC that controls the system. The system’s dynamics are entirely learned by a neural network, and approximated for real-time performance of the MPC. Starting from the prediction error bound we provide rigorous stability guarantees for the closed-loop system, provided the environment dynamics do not exceed the VLM update rate. The effectiveness of the PSS-VLMPC is validated through simulations and real-world experiments on a soft continuum robot, demonstrating its capability to execute tasks from natural language inputs.

Index terms

Modeling Control and Learning for Soft Robots Soft Robot Applications Machine Learning for Robot Control