MACE: Mixture-Of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering
Shengyu Gu, Ruicong Ye, Wanli Qiu, Handong Yao, Ruopeng Zhang, Xianliang Huang,∗
AI summary
Problem
Single-network Scene Coordinate Regression methods fail to capture global context in large environments, while existing multi-subnetwork approaches incur prohibitive computational costs and suffer from suboptimal clustering.
Approach
MACE employs a gating network to dynamically route each image to a single specialized expert sub-network, augmented by an auxiliary-loss-free load balancing strategy and a Gaussian regression head for direct feed-forward rendering.
Key results
- Reduces activation weight by 72% compared to baseline methods while preserving localization precision
- Achieves high-quality rendering with only 10 minutes of training on the Cambridge dataset
- Eliminates auxiliary loss terms while improving angular and translational accuracy through dynamic load balancing
- Enables unsupervised feed-forward 3D Gaussian Splatting by using inferred point clouds as spatial anchors
Why it matters
Delivers a computationally efficient and scalable pipeline for robotics and AR developers needing real-time, high-fidelity large-scale scene understanding.
Abstract
Efficient localization and high-quality rendering in large-scale scenes remain a significant challenge due to the computational cost involved. While Scene Coordinate Regres- sion (SCR) methods perform well in small-scale localization, they are limited by the capacity of a single network when extended to large-scale scenes. This limitation directly impacts robotics applications, where accurate and efficient scene under- standing is essential for navigation and interaction in complex environments. To address these challenges, we propose the Mixed Expert-based Accelerated Coordinate Encoding method (MACE), which enables efficient localization and high-quality rendering in large-scale scenes. Inspired by the remarkable capabilities of MOE in large model domains, we introduce a gating network to implicitly classify and select sub-networks, ensuring that only a single sub-network is activated during each inference. Furtheremore, we present Auxiliary-Loss-Free Load Balancing (ALF-LB) strategy to enhance the localization accuracy on large-scale scene. Our framework provides a sig- nificant reduction in costs while maintaining higher precision, offering an efficient solution for large-scale scene applications. Additional experiments on the Cambridge test set demonstrate that our method achieves high-quality rendering results with merely 10 minutes of training.