Object Detection Using Sim2Real Domain Randomization for Robotic Applications
Horváth, Dániel,Erdos, Gábor,Istenes, Zoltán,Horvath, Tomas,Földi, Sándor
Abstract
Robots working in unstructured environments must be capable of sensing and interpreting their surroundings. One of the main obstacles of deep-learning-based models in the field of robotics is the lack of domain-specific labeled data for different industrial applications. In this article, we propose a sim2real trans- fer learning method based on domain randomization for object detection with which labeled synthetic datasets of arbitrary size and object types can be automatically generated. Subsequently, a state-of-the-art convolutional neural network, YOLOv4, is trained to detect the different types of industrial objects. With the proposed domain randomization method, we could shrink the reality gap to a satisfactory level, achieving 86.32% and 97.38% mAP50 scores, respectively, in the case of zero-shot and one-shot transfers, on our manually annotated dataset containing 190 real images. Our solution fits for industrial use as the data generation process takes less than 0.5 s per image and the training lasts only around 12 h, on a GeForce RTX 2080 Ti GPU. Furthermore, it can reliably differentiate similar classes of objects by having access to only one real image for training. To our best knowledge, this is the only work thus far satisfying these constraints. Manuscript received 26 January 2022; revised 20 June 2022; accepted 24 Au- gust 2022. This work was supported in part by “Research on prime exploitation of the potential provided by the industrial digitalisation,” under Grant ED_18- 2-2018-0006 and in part by the Ministry for Innovation and Technology and the National Research, Development and Innovation Office within the framework of the National Lab for Autonomous Systems. The work of Tomáš Horváth was supported in part by the project “Application Domain Specific Highly Reliable IT Solutions ” financed by the National Research, Development and Innovation Fund of Hungary under Grant TKP2020-NKA-06, and in part by the Thematic ExcellenceProgrammeunderGrant2020-4.1.1.-TKP2020(NationalChallenges Subprogramme) funding scheme. This paper was recommended for publication by Associate Editor J. Civera and Editor W. Burgard upon evaluation of the reviewers’ comments. (Corresponding author: Dániel Horváth.) Dániel Horváth is with the Centre of Excellence in Production Informatics and Control, Institute for Computer Science and Control, Eötvös Loránd Research Network,1111Budapest,Hungary,andalsowiththeCoLocationCenterforAca- demic and Industrial Cooperation, Eötvös Loránd University, 1117 Budapest, Hungary (e-mail: daniel.horvath@sztaki.hu). Gábor Erd ̋os is with the Centre of Excellence in Production Informatics and Control, Institute for Computer Science and Control, Eötvös Loránd Research Network, 1111 Budapest, Hungary, and also with the Department of Manu- facturing Science and Engineering, Budapest University of Technology and Economics, 1111 Budapest, Hungary (e-mail: erdos.gabor@sztaki.hu). Zoltán Istenes is with the CoLocation Center for Academic and Industrial Cooperation, Eötvös Loránd University, 1117 Budapest, Hungary (e-mail: istenes@inf.elte.hu). Tomáš Horváth is with the Department of Data Science and Engineering, Eötvös Loránd University, 1117 Budapest, Hungary, and also with the Institute of Computer Science, Faculty of Science, Pavol Jozef Šafárik University, 040 01 Košice, Slovakia (e-mail: tomas.horvath@inf.elte.hu). Sándor Földi is with the Centre of Excellence in Production Informatics and Control, Institute for Computer Science and Control, Eötvös Loránd Research Network, 1111 Budapest, Hungary (e-mail: sandorfoldi98@gmail.com). Color versions of one or more figures in this article are available at https://doi.org/10.1109/TRO.2022.3207619. Digital Object Identifier 10.1109/TRO.2022.3207619