Communication-Efficient Module-Wise Federated Learning for Grasp Pose Detection in Cluttered Environments
Woonsang Kang, Joohyung Lee, Seungjun Kim, Jungchan Cho, Yoonseon Oh
AI summary
Problem
Grasp pose detection requires large, diverse datasets that raise privacy and centralization concerns, while standard federated learning suffers from prohibitive communication overhead for resource-constrained robots.
Approach
A two-phase federated learning framework that analyzes module-wise update similarity to identify slower-converging components and restricts subsequent training and communication to only those modules.
Key results
- Module-wise cosine similarity analysis reveals heterogeneous learning dynamics in grasp pose detection models
- Two-phase algorithm adaptively allocates communication effort to slower-converging modules
- Achieves higher accuracy than FedAvg and baselines on GraspNet-1B for a fixed communication budget
- Demonstrates superior grasp success rates in real-world physical robot experiments
Why it matters
Enables privacy-preserving, communication-efficient training of robust grasp models for decentralized robotic systems without requiring centralized data collection.
Abstract
Grasp pose detection (GPD) is a fundamental ca- pability for robotic autonomy, but its reliance on large, diverse datasets creates significant data privacy and centralization chal- lenges. Federated Learning (FL) offers a privacy-preserving so- lution, but its application to GPD is hindered by the substantial communication overhead of large models, a key issue for resource- constrainedrobots.Toaddressthis,weproposeanovelmodule-wise FL framework that begins by analyzing the learning dynamics of the GPD model’s functional components. This analysis identifies slower-converging modules, to which our framework then allocates additional communication effort. This is realized through a two- phase process: a standard full-model training phase is followed by a communication-efficient phase where only an adaptively identified subset of slower-converging modules is trained and their partial updates are aggregated. Extensive experiments on the GraspNet- 1B dataset demonstrate that our method outperforms standard FedAvg and other baselines, achieving higher accuracy for a given communication budget. Furthermore, real-world experiments on a physical robot validate our approach, showing a superior grasp success rate compared to baseline methods in cluttered scenes. Our work presents a communication-efficient framework for training robust, generalized GPD models in a decentralized manner, effec- tively improving the trade-off between communication cost and model performance.