← Back ICRA 2026

MACE: Mixture-Of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering

Shengyu Gu, Ruicong Ye, Wanli Qiu, Handong Yao, Ruopeng Zhang, Xianliang Huang,∗

PDF

AI summary

Key figure (auto-extracted from paper)

MACE cuts computational costs by 72% while maintaining high localization accuracy and enabling high-quality rendering in large-scale scenes through dynamic single-expert activation.

Mixture-of-Experts Scene Localization Coordinate Encoding 3D Gaussian Splatting Large-Scale Rendering Feed-Forward Rendering

Problem

Single-network Scene Coordinate Regression methods fail to capture global context in large environments, while existing multi-subnetwork approaches incur prohibitive computational costs and suffer from suboptimal clustering.

Approach

MACE employs a gating network to dynamically route each image to a single specialized expert sub-network, augmented by an auxiliary-loss-free load balancing strategy and a Gaussian regression head for direct feed-forward rendering.

Key results

Reduces activation weight by 72% compared to baseline methods while preserving localization precision
Achieves high-quality rendering with only 10 minutes of training on the Cambridge dataset
Eliminates auxiliary loss terms while improving angular and translational accuracy through dynamic load balancing
Enables unsupervised feed-forward 3D Gaussian Splatting by using inferred point clouds as spatial anchors

Why it matters

Delivers a computationally efficient and scalable pipeline for robotics and AR developers needing real-time, high-fidelity large-scale scene understanding.

Abstract

Efficient localization and high-quality rendering in large-scale scenes remain a significant challenge due to the computational cost involved. While Scene Coordinate Regres- sion (SCR) methods perform well in small-scale localization, they are limited by the capacity of a single network when extended to large-scale scenes. This limitation directly impacts robotics applications, where accurate and efficient scene under- standing is essential for navigation and interaction in complex environments. To address these challenges, we propose the Mixed Expert-based Accelerated Coordinate Encoding method (MACE), which enables efficient localization and high-quality rendering in large-scale scenes. Inspired by the remarkable capabilities of MOE in large model domains, we introduce a gating network to implicitly classify and select sub-networks, ensuring that only a single sub-network is activated during each inference. Furtheremore, we present Auxiliary-Loss-Free Load Balancing (ALF-LB) strategy to enhance the localization accuracy on large-scale scene. Our framework provides a sig- nificant reduction in costs while maintaining higher precision, offering an efficient solution for large-scale scene applications. Additional experiments on the Cambridge test set demonstrate that our method achieves high-quality rendering results with merely 10 minutes of training.

Index terms

Deep Learning for Visual Perception RGB-D Perception Computer Vision for Automation