← Back ICRA 2026

GenLaM: Generative Layered Mesh for Multi-Modal Sensor Emulation in Robotics

Aakash Singh Bais, Akash Patel, Christoforos Kanellakis, George Nikolakopoulos

PDF

AI summary

Key figure (auto-extracted from paper)

A single RGB image can generate a dense 3D mesh that accurately emulates LiDAR and novel views, including transparent surfaces, enabling robust robot navigation with minimal hardware.

Monocular depth estimation Sensor emulation 3D mesh reconstruction Transparent surface detection Robotic perception Novel view synthesis

Problem

Robotic perception typically relies on costly, multi-sensor setups that increase system complexity and payload, while conventional LiDAR fails to detect transparent surfaces like glass, creating navigation hazards.

Approach

GenLaM projects monocular depth and camera intrinsics from a single RGB image into a 3D mesh, enriches it with semantic and glass masks, and uses GPU-accelerated ray casting to emulate synthetic LiDAR and novel views.

Key results

Unified pipeline for simultaneous novel view synthesis and RGB/segmented LiDAR emulation
Explicit reconstruction and depth estimation for transparent glass surfaces
GPU-accelerated sensor emulation generating colored, segmented LiDAR point clouds in under 0.2 seconds
Robust geometric reconstruction and navigation performance in low-visibility conditions (rain, fog, snow, low light)

Why it matters

Provides a resource-efficient alternative to multi-sensor robotics, enabling safer navigation and reduced hardware costs for autonomous systems.

Abstract

Accurate environment perception is fundamental for robust robot navigation, mapping, and interaction. Tradi- tional perception pipelines rely on multiple sensors, including stereo cameras and LiDAR, which impose constraints on cost, payload, and system integration. In this paper, we propose a novel single-image perception framework that unifies novel view synthesis and RGB/segmented LiDAR emulation into a single pipeline. Leveraging monocular depth estimation and camera intrinsics recovery, our approach projects image pixels into 3D space and performs mesh reconstruction to generate dense geometric representations. This enables high-fidelity sensor emulation, including transparent surface reconstruction such as glass - an element often missed by conventional LiDAR. By enriching synthetic LiDAR scans with otherwise unavailable geometry, our method enhances downstream tasks such as robot path planning and obstacle avoidance. This work opens up new possibilities for resource-efficient robotic perception by reducing sensor dependency while improving geometric reasoning.

Index terms

Semantic Scene Understanding Deep Learning for Visual Perception AI-Enabled Robotics