← Back ICRA 2026

Bridging Perception and Planning: Towards End-To-End Planning for Signal Temporal Logic Tasks

Bowen Ye, Junyue Huang, Yang Liu, Xiaozhen Qiao, Xiang Yin

PDF

AI summary

Key figure (auto-extracted from paper)

S-MSP successfully maps raw multi-view camera images and Signal Temporal Logic specifications directly to feasible robot trajectories, outperforming single-expert baselines in logical satisfaction and planning efficiency.

Signal Temporal Logic End-to-End Planning Mixture-of-Experts Robotics Vision-Based Control Trajectory Synthesis

Problem

Current Signal Temporal Logic (STL) planning methods rely on pre-defined maps or structured abstractions, making them brittle in unstructured real-world environments where perception and planning are decoupled.

Approach

S-MSP is a differentiable end-to-end transformer that ingests synchronized multi-view camera images and an STL specification to directly output a feasible trajectory, using a structure-aware Mixture-of-Experts to route temporal sub-tasks to specialized experts.

Key results

First end-to-end baseline for STL-constrained trajectory synthesis from raw multi-view camera observations
Structure-aware MoE model that decomposes STL formulas into temporally anchored sub-tasks for efficient learning
High-fidelity Gazebo-based benchmark dataset with synchronized multi-view imagery and annotated STL specifications
State-of-the-art STL satisfaction and trajectory feasibility with improved performance and no additional planning latency

Why it matters

It enables robust, perception-driven autonomous planning for complex temporal tasks in unstructured environments, advancing end-to-end robotics and safe autonomous systems.

Abstract

We investigate the task and motion planning problem for Signal Temporal Logic (STL) specifications in robotics. Existing STL methods rely on pre-defined maps or mobility representations, which are ineffective in unstruc- tured real-world environments. We propose the Structured- MoE STL Planner (S-MSP), a differentiable framework that maps synchronized multi-view camera observations and an STL specification directly to a feasible trajectory. S-MSP integrates STL constraints within a unified pipeline, trained with a composite loss that combines trajectory reconstruction and STL robustness. A structure-aware Mixture-of-Experts (MoE) model enables horizon-aware specialization by projecting sub-tasks into temporally anchored embeddings. We evaluate S-MSP using a high-fidelity simulation of factory-logistics scenarios with temporally constrained tasks. Experiments show that S- MSP outperforms single-expert baselines in STL satisfaction and trajectory feasibility. A rule-based safety filter at inference improves physical executability without compromising logical correctness, showcasing the practicality of the approach.

Index terms

Task Planning Planning Scheduling and Coordination Reactive and Sensor-Based Planning