Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation
Satyajeet Das, Darren Chiu, Zhehui Huang, Lars Lindemann, Gaurav Sukhatme
AI summary
Problem
Pre-trained reinforcement learning policies for multi-robot navigation remain vulnerable to rare but critical collisions in cluttered environments, and retraining to fix them is costly, risks catastrophic forgetting, and yields diminishing returns.
Approach
The framework monitors a frozen policy's intermediate latent activations with an online classifier, and replaces flagged unsafe activations with risk-amplified surrogates generated by a latent collision world model, steering behavior without modifying weights.
Key results
- Nearly 90% reduction in cumulative collisions compared to the baseline
- Substantially increased fraction of collision-free trajectories while preserving goal completion
- Demonstrated real-world feasibility on resource-constrained Crazyflie quadrotors
- Established as a lightweight, post-deployment refinement paradigm for learned robot policies
Why it matters
It offers a practical, model-free method for enhancing the safety of deployed multi-robot systems without costly retraining or architectural changes.
Abstract
Reinforcement learning has enabled significant progress in complex domains such as coordinating and navi- gating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activation steering in large language models and latent editing in computer vision, we introduce a framework for inference-time Latent Activation Editing (LAE) that refines the behavior of pre-trained policies without modi- fying their weights or architecture. The framework operates in two stages: (i) an online classifier monitors intermediate activations to detect states associated with undesired behaviors, and (ii) an activation editing module that selectively modifies flagged activations to shift the policy towards safer regimes. In this work, we focus on improving safety in multi-quadrotor navigation. We hypothesize that amplifying a policy’s internal perception of risk can induce safer behaviors. We instantiate this idea through a latent collision world model trained to pre- dict future pre-collision activations, thereby prompting earlier and more cautious avoidance responses. Extensive simulations and real-world Crazyflie experiments demonstrate that LAE achieves statistically significant reduction in collisions (nearly 90% fewer cumulative collisions compared to the unedited base- line) and substantially increases the fraction of collision-free trajectories, while preserving task completion. More broadly, our results establish LAE as a lightweight paradigm, feasible on resource-constrained hardware, for post-deployment refinement of learned robot policies. Our project page with videos and code is available at https://lae-robotics.github.io/.