TransVisDrone : Spatio temporal Transformers for drone to drone detection in aerial videos
Tushar Sangam
Ishan R. Dave
Waqas Sultani
Mubarak Shah
[Paper]
[GitHub]

Abstract

Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. However, existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. In this work, we propose a simple yet effective framework, TransVisDrone, that provides an end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to improve drone detection in challenging scenarios by learning spatio-temporal dependencies of drone motion. Our method achieves state-of-the-art performance on three challenging real-world datasets (Average Precision@0.5IOU): NPS 0.95, FLDrones 0.75, and AOT 0.80, and a higher throughput than previous methods. We also demonstrate its deployment capability on edge devices and its usefulness in detecting drone-collision (encounter).



Qualitative Visualizations


[Slides]

Method Overview

  • Method works in online fashion
  • Sample clip from continuous drone footage
  • We then apply temporally consistent augmentations on clip
  • Extract individual features of each frame in clip
  • Apply efficient spatio-temporal attention using 3D SwinTransformer layers
  • Post processing using NMS to reduce false positives & low confident detections

 [GitHub]


Paper and Supplementary Material

Tushar Sangam, Ishan Dave, Waqas Sultani, Mubarak Shah.
TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos
In Conference ICRA, 2023, London.
(hosted on ArXiv)


[Bibtex]


Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.