The Mission
The mission of this project is to develop and evaluate safe reinforcement learning (Safe RL) techniques that can enable autonomous drones to perform smooth and reliable landings on stable platforms with minimal risk of crashes. Unlike traditional reinforcement learning that often prioritizes performance over safety during training, this project focuses on incorporating safety constraints directly into the learning process. The objective is not only to achieve high landing success rates but also to ensure the learning agent avoids unsafe behaviors, such as excessive tilting or violating predefined altitude and velocity thresholds. Through the construction of a custom simulation environment and the implementation of multiple Safe RL methods—such as shielding, constrained policy optimization (CPO), and reward shaping—the project aims to investigate the effectiveness of these approaches and benchmark them against standard reinforcement learning methods like DQN and Actor-Critic algorithms.
The Challenge
Current reinforcement learning approaches for autonomous drone control typically do not prioritize safety during the training phase, resulting in high crash rates and unsafe behaviors. There is a critical need for training methods that ensure drones learn to operate within safe boundaries, particularly for tasks like precise autonomous landing on stable platforms, where minor errors can lead to system failure or hardware damage.
The solution
The proposed solution is to investigate and implement Safe Reinforcement Learning techniques that integrate safety constraints into the learning process, ensuring that the agent avoids hazardous actions during both training and deployment. The project will begin by defining "safe" and "unsafe" actions through physical and operational constraints, then implement Safe RL methods like shielding and constrained policy optimization in a simulated drone landing environment. The performance and safety of Safe RL methods will be quantitatively evaluated and compared to state-of-the-art standard RL algorithms using metrics like crash rate, landing success rate, and reward accumulation.