Autonomous Rescue Drone with Multi-Modal Vision and Cognitive Agentic Architecture
Abstract
In post-disaster search and rescue (SAR) operations, unmanned aerial vehicles (UAVs) are essential tools, yet the large volume of raw visual data often overwhelms human operators by providing isolated, context-free information. This paper presents an innovative system with a cognitive-agentic architecture that transforms the UAV into an intelligent and proactive partner. The proposed modular architecture integrates specialized software agents for reasoning, coordinated by a large language model (LLM) acting as an orchestrator to handle high-level reasoning, logical validation, and self-correction feedback loops. A visual perception module based on a custom trained YOLO11 model feeds the cognitive core, enabling a complete perception–reasoning–action cycle. The system also incorporates a physical payload delivery module for first-aid supplies, reducing operator cognitive load and accelerating victim assistance through prioritized, actionable recommendations. This work, therefore, presents the first developed LLM-driven architecture of its kind that transforms a drone from a mere data-gathering tool into a proactive reasoning partner, demonstrating a viable path toward reducing operator cognitive load in critical missions.
Related articles
Related articles are currently not available for this article.