PIRATR: Parametric Object Inference for Robotic Applications with Transformers in 3D Point Clouds
Michael Schwingshackl, Fabio Francisco Oberweger, Mario Niedermeyer, Johannes Huemer, Markus Murschitz
AI summary
Problem
Existing 3D object detectors rely on intermediate representations or classical geometric fitting, struggle with occluded LiDAR data, and lack direct estimation of task-relevant parametric states needed for robotic manipulation.
Approach
The method extends a 3D detection transformer with class-specific heads and a geometry-aware matcher to jointly predict 6-DoF poses and parametric states directly from raw, occluded LiDAR point clouds, trained entirely on randomized synthetic data.
Key results
- End-to-end transformer architecture for multi-class parametric 3D detection without intermediate representations
- Geometry-aware matching strategy that accounts for object symmetries and joint parameter estimation
- Successful synthetic-to-real transfer achieving 0.919 mAP on real outdoor LiDAR scans without fine-tuning
- Deployment on an autonomous forklift demonstrating reliable detection of grippers, loading platforms, and pallets
Why it matters
It provides a scalable, simulation-trained perception pipeline that enables reliable robotic interaction with structured outdoor environments, benefiting autonomous construction and logistics machinery.
Abstract
We present PIRATR, an end-to-end 3D object detection framework for robotic use cases in point clouds. Extending PI3DETR, our method streamlines parametric 3D object detection by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes directly from occlusion- affected point cloud data. This formulation enables not only geometric localization but also the estimation of task-relevant properties for parametric objects, such as a gripper’s opening, where the 3D model is adjusted according to simple, predefined rules. The architecture employs modular, class-specific heads, making it straightforward to extend to novel object types without re-designing the pipeline. We validate PIRATR on an automated forklift platform, focusing on three structurally and functionally diverse categories: crane grippers, loading plat- forms, and pallets. Trained entirely in a synthetic environment, PIRATR generalizes effectively to real outdoor LiDAR scans, achieving a detection mAP of 0.919 without additional fine- tuning. PIRATR establishes a new paradigm of pose-aware, parameterized perception. This bridges the gap between low- level geometric reasoning and actionable world models, paving the way for scalable, simulation-trained perception systems that can be deployed in dynamic robotic environments. Code: https://github.com/swingaxe/piratr