← Back ICRA 2026

Communication-Efficient Module-Wise Federated Learning for Grasp Pose Detection in Cluttered Environments

Woonsang Kang, Joohyung Lee, Seungjun Kim, Jungchan Cho, Yoonseon Oh

PDF

AI summary

Key figure (auto-extracted from paper)

Focusing communication resources on adaptively identified slower-converging modules significantly improves grasp pose detection accuracy under tight communication budgets.

Federated Learning Grasp Pose Detection Communication Efficiency Module-wise Training Robotic Manipulation

Problem

Grasp pose detection requires large, diverse datasets that raise privacy and centralization concerns, while standard federated learning suffers from prohibitive communication overhead for resource-constrained robots.

Approach

A two-phase federated learning framework that analyzes module-wise update similarity to identify slower-converging components and restricts subsequent training and communication to only those modules.

Key results

Module-wise cosine similarity analysis reveals heterogeneous learning dynamics in grasp pose detection models
Two-phase algorithm adaptively allocates communication effort to slower-converging modules
Achieves higher accuracy than FedAvg and baselines on GraspNet-1B for a fixed communication budget
Demonstrates superior grasp success rates in real-world physical robot experiments

Why it matters

Enables privacy-preserving, communication-efficient training of robust grasp models for decentralized robotic systems without requiring centralized data collection.

Abstract

Grasp pose detection (GPD) is a fundamental ca- pability for robotic autonomy, but its reliance on large, diverse datasets creates significant data privacy and centralization chal- lenges. Federated Learning (FL) offers a privacy-preserving so- lution, but its application to GPD is hindered by the substantial communication overhead of large models, a key issue for resource- constrainedrobots.Toaddressthis,weproposeanovelmodule-wise FL framework that begins by analyzing the learning dynamics of the GPD model’s functional components. This analysis identifies slower-converging modules, to which our framework then allocates additional communication effort. This is realized through a two- phase process: a standard full-model training phase is followed by a communication-efficient phase where only an adaptively identified subset of slower-converging modules is trained and their partial updates are aggregated. Extensive experiments on the GraspNet- 1B dataset demonstrate that our method outperforms standard FedAvg and other baselines, achieving higher accuracy for a given communication budget. Furthermore, real-world experiments on a physical robot validate our approach, showing a superior grasp success rate compared to baseline methods in cluttered scenes. Our work presents a communication-efficient framework for training robust, generalized GPD models in a decentralized manner, effec- tively improving the trade-off between communication cost and model performance.

Index terms

Deep Learning in Grasping and Manipulation Grasping