← Back ICRA 2026

PIRATR: Parametric Object Inference for Robotic Applications with Transformers in 3D Point Clouds

Michael Schwingshackl, Fabio Francisco Oberweger, Mario Niedermeyer, Johannes Huemer, Markus Murschitz

PDF

AI summary

Key figure (auto-extracted from paper)

PIRATR enables accurate, end-to-end detection of parametric 3D objects directly from occluded LiDAR scans using a transformer model trained entirely on synthetic data.

3D Object Detection LiDAR Perception Transformers Parametric Modeling Sim-to-Real Transfer Robotic Manipulation

Problem

Existing 3D object detectors rely on intermediate representations or classical geometric fitting, struggle with occluded LiDAR data, and lack direct estimation of task-relevant parametric states needed for robotic manipulation.

Approach

The method extends a 3D detection transformer with class-specific heads and a geometry-aware matcher to jointly predict 6-DoF poses and parametric states directly from raw, occluded LiDAR point clouds, trained entirely on randomized synthetic data.

Key results

End-to-end transformer architecture for multi-class parametric 3D detection without intermediate representations
Geometry-aware matching strategy that accounts for object symmetries and joint parameter estimation
Successful synthetic-to-real transfer achieving 0.919 mAP on real outdoor LiDAR scans without fine-tuning
Deployment on an autonomous forklift demonstrating reliable detection of grippers, loading platforms, and pallets

Why it matters

It provides a scalable, simulation-trained perception pipeline that enables reliable robotic interaction with structured outdoor environments, benefiting autonomous construction and logistics machinery.

Abstract

We present PIRATR, an end-to-end 3D object detection framework for robotic use cases in point clouds. Extending PI3DETR, our method streamlines parametric 3D object detection by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes directly from occlusion- affected point cloud data. This formulation enables not only geometric localization but also the estimation of task-relevant properties for parametric objects, such as a gripper’s opening, where the 3D model is adjusted according to simple, predefined rules. The architecture employs modular, class-specific heads, making it straightforward to extend to novel object types without re-designing the pipeline. We validate PIRATR on an automated forklift platform, focusing on three structurally and functionally diverse categories: crane grippers, loading plat- forms, and pallets. Trained entirely in a synthetic environment, PIRATR generalizes effectively to real outdoor LiDAR scans, achieving a detection mAP of 0.919 without additional fine- tuning. PIRATR establishes a new paradigm of pose-aware, parameterized perception. This bridges the gap between low- level geometric reasoning and actionable world models, paving the way for scalable, simulation-trained perception systems that can be deployed in dynamic robotic environments. Code: https://github.com/swingaxe/piratr

Index terms

Deep Learning for Visual Perception Computer Vision for Automation Deep Learning in Grasping and Manipulation