← Back ICRA 2026

Trajectory Planning for UAV-Based Smart Farming Using Imitation-Based Triple Deep Q-Learning

Wencan Mao, Quanxi Zhou, Tomás Couso Coddou, Manabu Tsukada, Liu Yunling, Yusheng Ji

PDF

AI summary

Key figure (auto-extracted from paper)

The proposed ITDQN algorithm significantly improves UAV trajectory planning for smart farming, boosting weed recognition and data collection rates over standard reinforcement learning baselines.

Trajectory Planning UAV Smart Farming Multi-Agent Reinforcement Learning Imitation Learning Deep Q-Network

Problem

Planning UAV trajectories for smart farming is hindered by environmental uncertainty, partial field observations, and strict battery limits, which degrade traditional optimization and standard multi-agent reinforcement learning methods.

Approach

The authors model the task as a Markov decision process and introduce ITDQN, a multi-agent reinforcement learning algorithm that uses elite imitation to cut exploration costs and a mediator Q-network to stabilize and accelerate training.

Key results

Formulation of UAV smart farming trajectory planning as a multi-agent Markov decision process
Development of ITDQN combining elite imitation and a mediator Q-network for stable MARL training
4.43% higher weed recognition rate compared to DDQN in simulated environments
6.94% higher data collection rate compared to DDQN validated in both simulation and real-world tests

Why it matters

Provides a scalable, battery-aware planning framework that enables more reliable and precise autonomous UAV operations for modern precision agriculture.

Abstract

Unmanned aerial vehicles (UAVs) have emerged as a promising auxiliary platform for smart agriculture, capa- ble of simultaneously performing weed detection, recognition, and data collection from wireless sensors. However, trajectory planning for UAV-based smart agriculture is challenging due to the high uncertainty of the environment, partial observations, and limited battery capacity of UAVs. To address these issues, we formulate the trajectory planning problem as a Markov decision process (MDP) and leverage multi-agent reinforcement learning (MARL) to solve it. Furthermore, we propose a novel imitation-based triple deep Q-network (ITDQN) algorithm, which employs an elite imitation mechanism to reduce explo- ration costs and utilizes a mediator Q-network over a double deep Q-network (DDQN) to accelerate and stabilize training and improve performance. Experimental results in both simulated and real-world environments demonstrate the effectiveness of our solution. Moreover, our proposed ITDQN outperforms DDQN by 4.43% in weed recognition rate and 6.94% in data collection rate.

Index terms

Reinforcement Learning Path Planning for Multiple Mobile Robots or Agents Aerial Systems: Applications