Research Analyzer
← Back ICRA 2026

Preference-Conditioned Reinforcement Learning for Space-Time Efficient Online 3D Bin Packing

Nikita Sarawgi, Omey Mohan Manyar, Fan Wang, Thinh Nguyen, Daniel Seita, Satyandra K. Gupta

PDF

AI summary

Key figure (auto-extracted from paper)
A preference-conditioned Transformer policy balances spatial efficiency and operational time, cutting packing time by 44% without sacrificing density.
3D bin packing reinforcement learning space-time trade-off robotic manipulation Transformer policy warehouse automation

Problem

Existing robotic bin packing systems prioritize space utilization but ignore the operational time overhead of picking, reorienting, and transporting items, which degrades real-world warehouse throughput.

Approach

The authors frame bin packing as a multi-candidate selection problem and train a Transformer-based reinforcement learning policy that explicitly weighs expected spatial gains against estimated operational time costs, conditioned on dynamic user preferences.

Key results

  • Formulates semi-online 3D bin packing as a multi-candidate selection problem balancing spatial utility and time overhead
  • Introduces STEP, a preference-conditioned Transformer policy that jointly reasons over item geometry and operational costs
  • Achieves a 44% reduction in operational time while maintaining competitive packing density
  • Enables dynamic space-time trade-off control across varying preference weights with a single unified policy

Why it matters

Warehouse automation engineers and roboticists can deploy faster, more efficient packing systems that adapt to real-world physical constraints without sacrificing bin utilization.

Abstract

Robotic bin packing is widely deployed in ware- house automation, with current systems achieving robust per- formance through heuristic and learning-based strategies. These systems must balance compact placement with rapid execution, where selecting alternative items or reorienting them can improve space utilization but introduce additional time. We propose a selection-based formulation that explicitly reasons over this trade-off: at each step, the robot evaluates multiple candidate actions, weighing expected packing benefit against estimated operational time. This enables time-aware strategies that selectively accept increased operational time when it yields meaningful spatial improvements. Our method, STEP (Space-Time Efficient Packing), uses a preference-conditioned, Transformer-based reinforcement learning policy, and allows generalization across candidate set sizes and integration with standard placement modules. It achieves a 44% reduction in operational time without compromising packing density. Ad- ditional material is available at https://step-packing.github.io.

Index terms

Logistics Industrial Robots Deep Learning Methods

Related papers