Skip to main content
  • Drone object detection — 50
  • Kalman filter tracking — 50

Overview and learning objectives

Multi-Object Tracking (MOT) is a core visual ability that humans use to perform kinetic tasks and coordinate activities in dynamic environments. The AI community has recognized the importance of MOT through a series of competitions. In this assignment, the target object class is drone. You will detect drones in video footage and track them using Kalman filters. The assignment situates probabilistic reasoning in the physical security domain. By completing this assignment, you will:
  • Identify and use a drone-specific object detection dataset.
  • Fine-tune or configure a deep learning detector for the drone class.
  • Implement a Kalman filter to track detections across frames.
  • Visualize 2D trajectories superimposed on video.

Test videos

The following two videos are your primary test inputs. Download them locally before starting.
Use yt-dlp to download them:
1

Install ffmpeg and yt-dlp

brew install ffmpeg yt-dlp
2

Download a video

yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]" \
  -o "drone_video_1.mp4" \
  "https://www.youtube.com/watch?v=DhmZ6W1UAv4"
Repeat for the second video.
3

Extract frames

mkdir -p frames
ffmpeg -i drone_video_1.mp4 -vf "fps=5" frames/frame_%04d.jpg
Sampling at 5 fps is a reasonable starting point. Adjust based on drone speed.

Task 1: Drone object detection (50 points)

Dataset

Find a dataset that contains labeled drone bounding boxes. Be careful to distinguish between:
  • Datasets that detect objects from drones (aerial imagery) — not what you want.
  • Datasets that detect the drone itself — what you want.
Resources to search: Prefer datasets stored in Parquet or standard COCO/YOLO formats for ease of loading.

Detector

You must use a deep learning model. You may use a pretrained architecture — fine-tuning is encouraged but not required. Recommended starting points:

Deliverable

Split each video into frames and run your detector on every frame. Save all frames that contain at least one detection to a folder called detections/. Write your code so that it processes all .mp4 files in a given directory, not just the two test videos.

Task 2: Kalman filter tracking (50 points)

Use the filterpy library to implement a Kalman filter that tracks the drone across frames. Initialize the filter with detections from Task 1. Your state vector should represent at minimum the 2D pixel position of the drone bounding box center (and optionally its velocity). For each track:
  1. Predict the next state using the Kalman filter motion model.
  2. Update the state using the detector output for that frame.
  3. Handle missing detections — the filter must continue predicting even when the detector misses the drone for a small number of consecutive frames.

Deliverable

Produce one output video per input video. Each output video must contain only the frames where the drone is present and must overlay:
  • The detector bounding box.
  • The 2D trajectory as a polyline connecting the tracker-estimated center positions across frames.
Use ffmpeg and OpenCV to compose the output.

Evaluation criteria

CriterionDescription
Detection qualityAre detections consistent and semantically correct (drone class, not background)?
Tracker correctnessDoes the Kalman filter correctly predict and update across frames?
Trajectory visualizationIs the 2D trajectory clearly superimposed on the output video?
Code generalityDoes the pipeline process any directory of .mp4 files, not just the test videos?
Report clarityCan you explain your detector choice, filter design, and failure cases?

Deliverables

  1. A Hugging Face dataset containing the detections/ sample frames (Parquet format).
  2. Output tracking videos (one per test input) uploaded to your personal YouTube channel and embedded in your README.
  3. A README.md in your submission repository covering:
    • Dataset choice and detector configuration.
    • Kalman filter state design and noise parameters.
    • Failure cases and how the tracker handles missed detections.