Skip to main content
Open In Colab

Faster R-CNN Hyperparameter Optimization with Optuna + W&B (COCO MiniTrain) and Small-Object Transfer (Drones)

This notebook is the deliverable for the assignment:
  • Training dataset: COCO MiniTrain (COCO-format subset)
    https://github.com/giddyyupp/coco-minitrain
  • Optimization engine: Optuna (TPE + pruning)
  • Experiment tracking: Weights & Biases (W&B) (logging + dashboards)
  • Generalization test: drone small-object detection (Assignment 3 dataset)
You will:
  1. Run a baseline Faster R-CNN on COCO MiniTrain.
  2. Run stage-wise hyperparameter optimization with Optuna.
  3. Log all runs to W&B and analyze them in the W&B UI.
  4. Evaluate the tuned detector on the drone dataset and discuss transfer.

What you must submit

  • A shareable link to your W&B project (public or access granted to the TA).
  • This notebook (executed), including:
    • baseline run
    • Optuna study runs with pruning
    • final 3-seed retraining
    • drone evaluation (baseline vs tuned)
    • analysis cells (plots + written answers)

Metrics

You must report COCO-style metrics:
  • mAP\mathrm{mAP} (COCO mAP@[0.5:0.950.5:0.95])
  • AP50\mathrm{AP}_{50} and AP75\mathrm{AP}_{75}
  • Recall (COCO AR or a simpler recall estimate)

Objective (default)

You will optimize validation COCO mAP: maxθ  mAPval(θ),\max_{\theta}\; \mathrm{mAP}_{\text{val}}(\theta), where θ\theta denotes the hyperparameters under search.
If you want to trade off latency, define a scalarized objective: J(θ)=mAPval(θ)λLatency(θ).J(\theta)=\mathrm{mAP}_{\text{val}}(\theta)-\lambda\,\mathrm{Latency}(\theta). In that case, you must define λ\lambda and measure latency consistently.

0. Colab setup

  1. Enable GPU: Runtime → Change runtime type → GPU
  2. Install packages
  3. Login to W&B

# If you need to install packages, do it here (Colab).
# !pip -q install torch torchvision
# !pip -q install pycocotools
# !pip -q install optuna
# !pip -q install wandb

import os, json, random, time
from dataclasses import dataclass, asdict
from typing import Dict, Any, List, Tuple, Optional

import numpy as np
import torch
import torchvision
from torchvision.transforms import functional as F

print("torch:", torch.__version__)
print("torchvision:", torchvision.__version__)
print("cuda available:", torch.cuda.is_available())
torch: 2.7.1+cu128
torchvision: 0.22.1+cu128
cuda available: True
# W&B authentication
# In Docker: WANDB_API_KEY is set in the environment automatically.
# In Colab: call wandb.login() interactively.
import wandb

if os.environ.get('WANDB_API_KEY'):
    wandb.login(key=os.environ['WANDB_API_KEY'])
    print('W&B: authenticated via WANDB_API_KEY env var')
else:
    wandb.login()
    print('W&B: interactive login')
wandb: WARNING If you're specifying your api key in code, ensure this code is not shared publicly.
wandb: WARNING Consider setting the WANDB_API_KEY environment variable, or running `wandb login` from the command line.
wandb: [wandb.login()] Using explicit session credentials for https://api.wandb.ai.
wandb: No netrc file found, creating one.
wandb: Appending key for api.wandb.ai to your netrc file: /home/vscode/.netrc
wandb: Currently logged in as: pantelis to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
W&B: authenticated via WANDB_API_KEY env var

1. Reproducibility (required)

You must fix and log:
  • random seeds
  • dataset split indices
  • code version (commit hash, if applicable)
You will run final training with 3 different seeds and report: mean mAP±std.\text{mean mAP} \pm \text{std}.

def set_global_seed(seed: int) -> None:
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

def seed_worker(worker_id: int) -> None:
    # Deterministic DataLoader workers
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)

BASE_SEED = 1337
set_global_seed(BASE_SEED)

2. Dataset: COCO MiniTrain

Clone the dataset repo and set paths below. COCO MiniTrain repository: https://github.com/giddyyupp/coco-minitrain You will create a deterministic train/val split.

Required outputs

  • train_ids.json and val_ids.json saved to disk
  • logged to W&B as artifacts (optional but encouraged)
# --- Dataset: COCO MiniTrain ---
#
# Sampling methodology: https://github.com/giddyyupp/coco-minitrain
#   Statistically samples N images from COCO 2017 train preserving class/size distributions.
#
# Pre-sampled subsets:
#   HF repo : https://huggingface.co/datasets/bryanbocao/coco_minitrain
#   Files   : coco_minitrain_10k.zip (9 GB) | _15k | _20k | _25k
#   Format  : YOLO labels + JPEG images (no COCO JSON included)
#
# This cell downloads the 10k subset, then generates a COCO JSON annotation file
# by filtering the official COCO 2017 train annotations to the 10k image IDs.
# For the full assignment run change HF_DATASET_FILE to "coco_minitrain_25k.zip".

import os, time, zipfile, json as _json
import requests as _requests
from huggingface_hub import hf_hub_download

HF_DATASET_REPO  = "bryanbocao/coco_minitrain"
HF_DATASET_FILE  = "coco_minitrain_10k.zip"   # change to _25k for full run
COCO_ANN_URL     = "http://images.cocodataset.org/annotations/annotations_trainval2017.zip"

_IS_COLAB    = os.path.isdir("/content")
DATASET_BASE = "/content" if _IS_COLAB else "/workspaces/eng-ai-agents/data"
EXTRACT_ROOT = os.path.join(DATASET_BASE, "coco_minitrain")

COCO_MINITRAIN_ROOT = None
IMAGES_DIR          = None
ANN_JSON            = None
DATASET_READY       = False


def _hf_download_with_retry(repo_id, filename, repo_type, local_dir,
                             max_retries=5, base_wait=60):
    for attempt in range(max_retries):
        try:
            return hf_hub_download(repo_id=repo_id, filename=filename,
                                   repo_type=repo_type, local_dir=local_dir)
        except Exception as e:
            if "429" in str(e) and attempt < max_retries - 1:
                wait = base_wait * (2 ** attempt)
                print(f"  Rate-limited (attempt {attempt+1}/{max_retries}). Waiting {wait}s ...")
                time.sleep(wait)
            else:
                raise
    raise RuntimeError("hf_hub_download: max retries exceeded")


def _build_coco_json(images_dir, full_ann_path, out_path):
    """Filter instances_train2017.json to the image IDs present in images_dir."""
    img_files = [f for f in os.listdir(images_dir) if f.lower().endswith(".jpg")]
    present_ids = {int(os.path.splitext(f)[0]) for f in img_files}
    print(f"  Found {len(present_ids)} images in {images_dir}")
    print(f"  Loading full COCO annotations from {full_ann_path} ...")
    with open(full_ann_path) as f:
        full = _json.load(f)
    imgs  = [im for im in full["images"]      if im["id"] in present_ids]
    anns  = [an for an in full["annotations"] if an["image_id"] in present_ids]
    mini  = {
        "info":        full.get("info", {}),
        "licenses":    full.get("licenses", []),
        "categories":  full["categories"],
        "images":      imgs,
        "annotations": anns,
    }
    os.makedirs(os.path.dirname(out_path), exist_ok=True)
    with open(out_path, "w") as f:
        _json.dump(mini, f)
    print(f"  Wrote {len(imgs)} images / {len(anns)} annotations → {out_path}")


def _ensure_coco_full_annotations(ann_dir):
    """Download and extract official COCO 2017 train annotations if missing.
    The COCO zip extracts to an 'annotations/' subdir, so the final path is
    ann_dir/annotations/instances_train2017.json.
    """
    target = os.path.join(ann_dir, "annotations", "instances_train2017.json")
    if os.path.exists(target):
        return target
    os.makedirs(ann_dir, exist_ok=True)
    zip_path = os.path.join(ann_dir, "annotations_trainval2017.zip")
    print(f"  Downloading COCO 2017 annotations (~253 MB) ...")
    with _requests.get(COCO_ANN_URL, stream=True, timeout=120) as r:
        r.raise_for_status()
        with open(zip_path, "wb") as f:
            for chunk in r.iter_content(1 << 20):
                f.write(chunk)
    print("  Extracting ...")
    with zipfile.ZipFile(zip_path) as zf:
        zf.extractall(ann_dir)
    os.remove(zip_path)
    return target


# ── Step 1: locate or download+extract the HF zip ────────────────────────────
extract_dir = os.path.join(EXTRACT_ROOT, HF_DATASET_FILE.replace(".zip", ""))
zip_local   = os.path.join(EXTRACT_ROOT, HF_DATASET_FILE)

if os.path.isdir(extract_dir) and os.listdir(extract_dir):
    print(f"Cached extraction found: {extract_dir}")
else:
    os.makedirs(EXTRACT_ROOT, exist_ok=True)
    try:
        if os.path.exists(zip_local):
            print(f"Zip already downloaded: {zip_local}")
        else:
            print(f"Downloading {HF_DATASET_FILE} from {HF_DATASET_REPO} ...")
            zip_local = _hf_download_with_retry(
                repo_id=HF_DATASET_REPO, filename=HF_DATASET_FILE,
                repo_type="dataset", local_dir=EXTRACT_ROOT,
            )
            print(f"Download complete: {zip_local}")
        print(f"Extracting to {extract_dir} ...")
        os.makedirs(extract_dir, exist_ok=True)
        with zipfile.ZipFile(zip_local, "r") as zf:
            zf.extractall(extract_dir)
        print("Extraction complete.")
    except Exception as e:
        print(f"HF download/extraction failed: {e}")
        extract_dir = None

# ── Step 2: locate train2017 images dir ───────────────────────────────────────
if extract_dir and os.path.isdir(extract_dir):
    # Zip extracts to coco_minitrain_10k/coco_minitrain_10k/images/train2017/
    train_imgs = os.path.join(extract_dir, os.path.basename(extract_dir),
                              "images", "train2017")
    if not os.path.isdir(train_imgs):
        best_dir, best_n = extract_dir, 0
        for dp, _, files in os.walk(extract_dir):
            n = sum(1 for f in files if f.lower().endswith(".jpg"))
            if n > best_n:
                best_dir, best_n = dp, n
        train_imgs = best_dir if best_n > 10 else None

    if train_imgs:
        IMAGES_DIR = train_imgs
        # ── Step 3: build COCO JSON if not present ─────────────────────────
        ann_dir  = os.path.join(extract_dir, "annotations")
        ann_json = os.path.join(ann_dir, "instances_minitrain.json")
        if not os.path.exists(ann_json):
            full_ann_cache = os.path.join(EXTRACT_ROOT, "coco_full_annotations")
            full_ann_path  = _ensure_coco_full_annotations(full_ann_cache)
            _build_coco_json(IMAGES_DIR, full_ann_path, ann_json)
        else:
            print(f"Annotation JSON already exists: {ann_json}")
        ANN_JSON            = ann_json
        COCO_MINITRAIN_ROOT = extract_dir
        DATASET_READY       = True
        print(f"Dataset ready:")
        print(f"  Root:        {COCO_MINITRAIN_ROOT}")
        print(f"  Annotations: {ANN_JSON}")
        print(f"  Images:      {IMAGES_DIR}")
    else:
        print(f"WARNING: could not locate train2017 images inside {extract_dir}")

# ── Step 4: fallback — git clone (annotations only) ──────────────────────────
if not DATASET_READY:
    print("\nFalling back to git clone of coco-minitrain (annotations only).")
    for p in ["coco-minitrain", "/content/coco-minitrain",
               os.path.expanduser("~/coco-minitrain")]:
        if os.path.isdir(p):
            COCO_MINITRAIN_ROOT = p
            break
    if COCO_MINITRAIN_ROOT is None:
        os.system("git clone --depth 1 https://github.com/giddyyupp/coco-minitrain.git")
        COCO_MINITRAIN_ROOT = "coco-minitrain"
    IMAGES_DIR = os.path.join(COCO_MINITRAIN_ROOT, "images")
    ANN_JSON   = os.path.join(COCO_MINITRAIN_ROOT, "annotations", "instances_minitrain.json")
    if os.path.exists(ANN_JSON):
        DATASET_READY = True
        print(f"Using local clone: {COCO_MINITRAIN_ROOT}")
    else:
        print("WARNING: Dataset not ready — no annotation file found.")

print(f"\nDATASET_READY : {DATASET_READY}")
if DATASET_READY:
    print(f"ANN_JSON      : {ANN_JSON}")
    print(f"IMAGES_DIR    : {IMAGES_DIR}")
/workspaces/eng-ai-agents/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Cached extraction found: /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k
  Found 10000 images in /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k/coco_minitrain_10k/images/train2017
  Loading full COCO annotations from /workspaces/eng-ai-agents/data/coco_minitrain/coco_full_annotations/annotations/instances_train2017.json ...
  Wrote 10000 images / 72944 annotations → /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k/annotations/instances_minitrain.json
Dataset ready:
  Root:        /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k
  Annotations: /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k/annotations/instances_minitrain.json
  Images:      /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k/coco_minitrain_10k/images/train2017

DATASET_READY : True
ANN_JSON      : /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k/annotations/instances_minitrain.json
IMAGES_DIR    : /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k/coco_minitrain_10k/images/train2017

from pycocotools.coco import COCO
from PIL import Image

coco = COCO(ANN_JSON)
img_ids = sorted(coco.getImgIds())
print("num images:", len(img_ids))

# Deterministic split
val_frac = 0.2
rng = np.random.default_rng(BASE_SEED)
perm = rng.permutation(len(img_ids))
n_val = int(len(img_ids) * val_frac)
val_ids = [img_ids[i] for i in perm[:n_val]]
train_ids = [img_ids[i] for i in perm[n_val:]]

print("train:", len(train_ids), "val:", len(val_ids))

SPLIT_DIR = os.path.join(COCO_MINITRAIN_ROOT, "splits")
os.makedirs(SPLIT_DIR, exist_ok=True)
with open(os.path.join(SPLIT_DIR, "train_ids.json"), "w") as f:
    json.dump(train_ids, f)
with open(os.path.join(SPLIT_DIR, "val_ids.json"), "w") as f:
    json.dump(val_ids, f)

print("Saved split ids to:", SPLIT_DIR)
loading annotations into memory...
Done (t=1.42s)
creating index...
index created!
num images: 10000
train: 8000 val: 2000
Saved split ids to: /workspaces/eng-ai-agents/data/coco_minitrain/coco_minitrain_10k/splits

3. PyTorch dataset and transforms

You must keep transforms simple initially. Use augmentations only after baseline correctness is established. Recommended minimal transforms:
  • Convert to tensor
  • (Optional) resize to a fixed shorter side (be consistent across runs)

from torch.utils.data import Dataset, DataLoader

class CocoMiniTrainDataset(Dataset):
    def __init__(self, coco: COCO, image_dir: str, img_ids: List[int], train: bool = True):
        self.coco = coco
        self.image_dir = image_dir
        self.img_ids = img_ids
        self.train = train

    def __len__(self) -> int:
        return len(self.img_ids)

    def __getitem__(self, idx: int):
        img_id = self.img_ids[idx]
        img_info = self.coco.loadImgs([img_id])[0]
        img_path = os.path.join(self.image_dir, img_info["file_name"])
        image = Image.open(img_path).convert("RGB")

        ann_ids = self.coco.getAnnIds(imgIds=[img_id], iscrowd=None)
        anns = self.coco.loadAnns(ann_ids)

        boxes = []
        labels = []
        areas = []
        iscrowd = []

        for a in anns:
            # COCO bbox: [x,y,w,h] -> [x1,y1,x2,y2]
            x, y, w, h = a["bbox"]
            if w <= 1 or h <= 1:
                continue
            boxes.append([x, y, x + w, y + h])
            labels.append(a["category_id"])
            areas.append(a.get("area", w * h))
            iscrowd.append(a.get("iscrowd", 0))

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        areas = torch.as_tensor(areas, dtype=torch.float32)
        iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64)

        image_t = F.to_tensor(image)

        target = {
            "boxes": boxes,
            "labels": labels,
            "image_id": torch.tensor([img_id]),
            "area": areas,
            "iscrowd": iscrowd,
        }
        return image_t, target

def collate_fn(batch):
    return tuple(zip(*batch))

train_ds = CocoMiniTrainDataset(coco, IMAGES_DIR, train_ids, train=True)
val_ds   = CocoMiniTrainDataset(coco, IMAGES_DIR, val_ids, train=False)

print("train len:", len(train_ds), "val len:", len(val_ds))
train len: 8000 val len: 2000

BATCH_SIZE = 2  # adjust to GPU memory
NUM_WORKERS = 2

g = torch.Generator()
g.manual_seed(BASE_SEED)

train_loader = DataLoader(
    train_ds, batch_size=BATCH_SIZE, shuffle=True,
    num_workers=NUM_WORKERS, collate_fn=collate_fn,
    worker_init_fn=seed_worker, generator=g
)
val_loader = DataLoader(
    val_ds, batch_size=1, shuffle=False,
    num_workers=NUM_WORKERS, collate_fn=collate_fn,
    worker_init_fn=seed_worker, generator=g
)

next(iter(train_loader))[0][0].shape
torch.Size([3, 439, 640])

4. Model: Faster R-CNN (torchvision)

You will use:
  • torchvision.models.detection.fasterrcnn_resnet50_fpn
You will also tune RPN and RoI head hyperparameters in later stages.

from torchvision.models.detection import fasterrcnn_resnet50_fpn

def build_model(num_classes: Optional[int] = None):
    # COCO has 80 categories (plus background internally).
    # In torchvision, num_classes includes background.
    # If you want to adapt to a different label space, you must remap category IDs.
    model = fasterrcnn_resnet50_fpn(weights="DEFAULT")
    if num_classes is not None:
        # Replace the box predictor head
        in_features = model.roi_heads.box_predictor.cls_score.in_features
        model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)
    return model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = build_model().to(device)
print("Model loaded on:", device)
Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /home/vscode/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth

  0%|          | 0.00/160M [00:00<?, ?B/s]

  3%|▎         | 4.12M/160M [00:00<00:03, 42.4MB/s]

  6%|▌         | 9.00M/160M [00:00<00:03, 45.9MB/s]

  9%|▉         | 15.0M/160M [00:00<00:02, 53.4MB/s]

 13%|█▎        | 20.2M/160M [00:00<00:02, 53.9MB/s]

 16%|█▌        | 25.5M/160M [00:00<00:02, 52.4MB/s]

 19%|█▉        | 30.8M/160M [00:00<00:02, 53.2MB/s]

 22%|██▏       | 35.9M/160M [00:00<00:02, 50.1MB/s]

 26%|██▌       | 40.8M/160M [00:00<00:02, 43.4MB/s]

 29%|██▉       | 46.6M/160M [00:00<00:02, 48.3MB/s]

 32%|███▏      | 51.5M/160M [00:01<00:02, 48.8MB/s]

 36%|███▌      | 57.9M/160M [00:01<00:02, 51.8MB/s]

 39%|███▉      | 63.0M/160M [00:01<00:01, 51.8MB/s]

 43%|████▎     | 68.0M/160M [00:01<00:01, 51.4MB/s]

 46%|████▋     | 74.1M/160M [00:01<00:01, 53.9MB/s]

 50%|█████     | 80.4M/160M [00:01<00:01, 57.1MB/s]

 54%|█████▍    | 86.6M/160M [00:01<00:01, 59.4MB/s]

 58%|█████▊    | 92.4M/160M [00:01<00:01, 57.5MB/s]

 61%|██████▏   | 98.0M/160M [00:01<00:01, 53.9MB/s]

 65%|██████▍   | 103M/160M [00:02<00:01, 51.1MB/s]

 68%|██████▊   | 108M/160M [00:02<00:01, 49.8MB/s]

 71%|███████   | 114M/160M [00:02<00:00, 50.0MB/s]

 74%|███████▍  | 119M/160M [00:02<00:00, 49.7MB/s]

 78%|███████▊  | 124M/160M [00:02<00:00, 50.6MB/s]

 81%|████████  | 129M/160M [00:02<00:00, 51.7MB/s]

 84%|████████▍ | 134M/160M [00:02<00:00, 47.9MB/s]

 89%|████████▊ | 142M/160M [00:02<00:00, 55.8MB/s]

 92%|█████████▏| 147M/160M [00:02<00:00, 53.8MB/s]

 96%|█████████▌| 154M/160M [00:03<00:00, 58.0MB/s]

100%|█████████▉| 159M/160M [00:03<00:00, 57.7MB/s]

100%|██████████| 160M/160M [00:03<00:00, 52.5MB/s]
Model loaded on: cuda

5. Training and evaluation

You will implement:
  • a training loop that logs loss components
  • COCO evaluation via pycocotools.cocoeval.COCOeval

Important notes

  • COCO category IDs are not always contiguous. Torchvision expects contiguous class indices when you replace heads.
  • For this assignment you will keep the default COCO label space and use the pretrained COCO model, then fine-tune on COCO MiniTrain.

from pycocotools.cocoeval import COCOeval

@torch.no_grad()
def evaluate_coco_map(model, coco_gt: COCO, data_loader: DataLoader, max_dets: int = 100):
    model.eval()
    results = []

    for images, targets in data_loader:
        images = [img.to(device) for img in images]
        outputs = model(images)

        for out, tgt in zip(outputs, targets):
            img_id = int(tgt["image_id"].item())

            boxes = out["boxes"].detach().cpu().numpy()  # [N,4] x1,y1,x2,y2
            scores = out["scores"].detach().cpu().numpy()
            labels = out["labels"].detach().cpu().numpy()

            # Convert to COCO format
            for b, s, c in zip(boxes, scores, labels):
                x1, y1, x2, y2 = b.tolist()
                w = max(0.0, x2 - x1)
                h = max(0.0, y2 - y1)
                results.append({
                    "image_id": img_id,
                    "category_id": int(c),
                    "bbox": [x1, y1, w, h],
                    "score": float(s),
                })

    if len(results) == 0:
        return {"mAP": 0.0, "AP50": 0.0, "AP75": 0.0}

    coco_dt = coco_gt.loadRes(results)
    coco_eval = COCOeval(coco_gt, coco_dt, iouType="bbox")
    coco_eval.params.maxDets = [max_dets, max_dets, max_dets]
    coco_eval.evaluate()
    coco_eval.accumulate()
    coco_eval.summarize()

    # COCOeval.stats indices:
    # 0: AP IoU=0.50:0.95
    # 1: AP IoU=0.50
    # 2: AP IoU=0.75
    mAP = float(coco_eval.stats[0])
    AP50 = float(coco_eval.stats[1])
    AP75 = float(coco_eval.stats[2])
    return {"mAP": mAP, "AP50": AP50, "AP75": AP75}

def train_one_epoch(model, optimizer, data_loader: DataLoader, epoch: int, max_norm: float = 0.0):
    model.train()
    loss_sums = {"loss": 0.0}
    n = 0

    for images, targets in data_loader:
        images = [img.to(device) for img in images]
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())

        optimizer.zero_grad(set_to_none=True)
        losses.backward()

        if max_norm and max_norm > 0:
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

        optimizer.step()

        # accumulate
        n += 1
        loss_sums["loss"] += float(losses.item())
        for k, v in loss_dict.items():
            loss_sums[k] = loss_sums.get(k, 0.0) + float(v.item())

    for k in loss_sums:
        loss_sums[k] /= max(1, n)
    return loss_sums

6. Baseline run (required)

Run a baseline training job and log to W&B. Required:
  • train loss curves (total + components)
  • validation metrics: mAP, AP50, AP75
  • save the model checkpoint
os.makedirs("checkpoints", exist_ok=True)


from torch.optim import SGD
from torch.optim.lr_scheduler import StepLR

BASELINE_CFG = {
    "seed": BASE_SEED,
    "epochs": int(os.environ.get("BASELINE_EPOCHS", 3)),
    "lr": 0.005,
    "momentum": 0.9,
    "weight_decay": 1e-4,
    "grad_clip_norm": 0.0,
    "step_size": 6,
    "gamma": 0.1,
    "batch_size": BATCH_SIZE,
}

set_global_seed(BASELINE_CFG["seed"])

run = wandb.init(
    project="faster-rcnn-optuna-coco-minitrain",
    name="baseline",
    config=BASELINE_CFG
)

model = build_model().to(device)

optimizer = SGD(
    model.parameters(),
    lr=BASELINE_CFG["lr"],
    momentum=BASELINE_CFG["momentum"],
    weight_decay=BASELINE_CFG["weight_decay"]
)

scheduler = StepLR(optimizer, step_size=BASELINE_CFG["step_size"], gamma=BASELINE_CFG["gamma"])

for epoch in range(BASELINE_CFG["epochs"]):
    t0 = time.time()
    losses = train_one_epoch(model, optimizer, train_loader, epoch, max_norm=BASELINE_CFG["grad_clip_norm"])
    scheduler.step()
    metrics = evaluate_coco_map(model, coco, val_loader)

    log_dict = {**losses, **{f"val_{k}": v for k, v in metrics.items()}, "epoch": epoch, "lr": scheduler.get_last_lr()[0], "epoch_time_s": time.time()-t0}
    wandb.log(log_dict)
    print(f"Epoch {epoch}: loss={losses['loss']:.4f} val_mAP={metrics['mAP']:.4f}")

BASELINE_CKPT = os.path.join("checkpoints", "baseline_fasterrcnn.pt")
torch.save(model.state_dict(), BASELINE_CKPT)
wandb.save(BASELINE_CKPT)
wandb.finish()

print("Saved:", BASELINE_CKPT)
wandb: setting up run mm333dqx
wandb: Tracking run with wandb version 0.25.0
wandb: Run data is saved locally in /workspaces/eng-ai-agents/wandb/run-20260305_165657-mm333dqx
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run baseline
wandb: ⭐️ View project at https://wandb.ai/pantelis/faster-rcnn-optuna-coco-minitrain
wandb: 🚀 View run at https://wandb.ai/pantelis/faster-rcnn-optuna-coco-minitrain/runs/mm333dqx
Loading and preparing results...
DONE (t=0.50s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=21.59s).
Accumulating evaluation results...
DONE (t=4.70s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.055
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.096
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.059
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.035
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.062
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.068
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.083
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.083
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.083
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.046
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.087
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.109
Epoch 0: loss=0.6911 val_mAP=0.0555
Loading and preparing results...
DONE (t=0.18s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=24.71s).
Accumulating evaluation results...
DONE (t=5.54s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.052
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.091
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.054
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.034
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.059
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.060
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.085
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.085
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.085
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.051
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.093
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.107
Epoch 1: loss=0.6072 val_mAP=0.0521
Loading and preparing results...
DONE (t=0.10s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=17.82s).
Accumulating evaluation results...
DONE (t=3.21s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.057
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.099
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.060
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.035
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.071
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.086
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.086
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.086
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.051
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.091
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.114
Epoch 2: loss=0.5561 val_mAP=0.0574
wandb: WARNING Symlinked 1 file into the W&B run directory; call wandb.save again to sync new files.
wandb: updating run metadata
wandb: uploading checkpoints/baseline_fasterrcnn.pt
wandb: uploading checkpoints/baseline_fasterrcnn.pt; uploading history steps 2-2, summary
wandb: uploading data
wandb: 
wandb: Run history:
wandb:            epoch ▁▅█
wandb:     epoch_time_s █▂▁
wandb:             loss █▄▁
wandb:     loss_box_reg █▅▁
wandb:  loss_classifier █▃▁
wandb:  loss_objectness █▄▁
wandb: loss_rpn_box_reg █▄▁
wandb:               lr ▁▁▁
wandb:         val_AP50 ▆▁█
wandb:         val_AP75 ▇▁█
wandb:               +1 ...
wandb: 
wandb: Run summary:
wandb:            epoch 2
wandb:     epoch_time_s 1398.17586
wandb:             loss 0.55609
wandb:     loss_box_reg 0.24568
wandb:  loss_classifier 0.21955
wandb:  loss_objectness 0.03035
wandb: loss_rpn_box_reg 0.06052
wandb:               lr 0.005
wandb:         val_AP50 0.09916
wandb:         val_AP75 0.05954
wandb:               +1 ...
wandb:
wandb: 🚀 View run baseline at: https://wandb.ai/pantelis/faster-rcnn-optuna-coco-minitrain/runs/mm333dqx
wandb: ⭐️ View project at: https://wandb.ai/pantelis/faster-rcnn-optuna-coco-minitrain
wandb: Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20260305_165657-mm333dqx/logs
Saved: checkpoints/baseline_fasterrcnn.pt

7. Optuna + W&B: stage-wise hyperparameter optimization (required)

You will run Optuna studies in stages.
  • Stage 1: optimizer dynamics (LR, weight decay, momentum, warmup)
  • Stage 2: RPN hyperparameters
  • Stage 3: RoI head hyperparameters
  • Stage 4: post-processing calibration (no training)
You must use:
  • TPESampler
  • pruning (MedianPruner or HyperbandPruner)
Each trial must:
  1. Train for a small budget (e.g., 3–5 epochs),
  2. Report intermediate validation mAP via trial.report(...),
  3. Allow Optuna to prune underperforming trials.
Default objective: maxmAPval.\max \mathrm{mAP}_{\text{val}}.

import optuna

def make_optimizer(model, lr: float, momentum: float, weight_decay: float):
    return SGD(model.parameters(), lr=lr, momentum=momentum, weight_decay=weight_decay)

def objective_stage1(trial: optuna.Trial) -> float:
    cfg = {
        "stage": "stage1_opt",
        "seed": int(trial.suggest_int("seed", 1, 10_000)),
        "epochs": int(trial.suggest_int("epochs", 3, 5)),
        "lr": float(trial.suggest_float("lr", 1e-5, 1e-2, log=True)),
        "weight_decay": float(trial.suggest_float("weight_decay", 1e-6, 1e-2, log=True)),
        "momentum": float(trial.suggest_float("momentum", 0.8, 0.99)),
        "grad_clip_norm": float(trial.suggest_float("grad_clip_norm", 0.0, 5.0)),
    }

    set_global_seed(cfg["seed"])

    run = wandb.init(
        project="faster-rcnn-optuna-coco-minitrain",
        name=f"optuna_stage1_trial_{trial.number:04d}",
        config=cfg,
        reinit=True
    )

    model = build_model().to(device)
    optimizer = make_optimizer(model, cfg["lr"], cfg["momentum"], cfg["weight_decay"])

    best_map = -1.0
    for epoch in range(cfg["epochs"]):
        losses = train_one_epoch(model, optimizer, train_loader, epoch, max_norm=cfg["grad_clip_norm"])
        metrics = evaluate_coco_map(model, coco, val_loader)

        val_map = metrics["mAP"]
        best_map = max(best_map, val_map)

        wandb.log({**losses, **{f"val_{k}": v for k, v in metrics.items()}, "epoch": epoch})

        trial.report(val_map, step=epoch)
        if trial.should_prune():
            wandb.log({"pruned": 1, "best_val_mAP": best_map})
            wandb.finish()
            raise optuna.exceptions.TrialPruned()

    wandb.log({"best_val_mAP": best_map, "pruned": 0})
    wandb.finish()
    return best_map

sampler = optuna.samplers.TPESampler(seed=BASE_SEED)
pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=1)

study_stage1 = optuna.create_study(direction="maximize", sampler=sampler, pruner=pruner, study_name="stage1_opt")
[I 2026-03-05 18:09:24,527] A new study created in memory with name: stage1_opt

# Run Stage 1 study
N_TRIALS_STAGE1 = int(os.environ.get("HPO_TRIALS", 3))  # default 3 for demo, 30 for assignment
study_stage1.optimize(objective_stage1, n_trials=N_TRIALS_STAGE1, show_progress_bar=True)

print("Best Stage 1:", study_stage1.best_value)
print("Best params:", study_stage1.best_params)

  0%|          | 0/3 [00:00<?, ?it/s]
wandb: WARNING Using a boolean value for 'reinit' is deprecated. Use 'return_previous' or 'finish_previous' instead.
wandb: setting up run ftugc11l
wandb: Tracking run with wandb version 0.25.0
wandb: Run data is saved locally in /workspaces/eng-ai-agents/wandb/run-20260305_180924-ftugc11l
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run optuna_stage1_trial_0000
wandb: ⭐️ View project at https://wandb.ai/pantelis/faster-rcnn-optuna-coco-minitrain
wandb: 🚀 View run at https://wandb.ai/pantelis/faster-rcnn-optuna-coco-minitrain/runs/ftugc11l
Loading and preparing results...
DONE (t=0.06s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=16.28s).
Accumulating evaluation results...
DONE (t=2.59s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.103
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.151
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.116
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.071
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.114
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.130
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.121
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.121
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.121
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.085
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.129
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.153
Loading and preparing results...

8. Stage 2: RPN tuning (required)

Fix the best Stage 1 hyperparameters, then tune RPN knobs that affect proposal quality and recall. Suggested search space:
  • rpn_nms_thresh in [0.5,0.9][0.5, 0.9]
  • rpn_pre_nms_topk in [1000,4000][1000, 4000]
  • rpn_post_nms_topk in [300,2000][300, 2000]
  • rpn_fg_iou_thresh in [0.5,0.8][0.5, 0.8]
  • rpn_bg_iou_thresh in [0.0,0.4][0.0, 0.4]
  • rpn_batch_size_per_image in [128,512][128, 512]
  • rpn_positive_fraction in [0.25,0.75][0.25, 0.75]
You will implement this by mutating the torchvision model components:
  • model.rpn.* fields (where supported)
Note: torchvision does not expose every parameter as a public attribute in every version; implement what is available and document what you tuned.

def apply_rpn_hparams(model, cfg: Dict[str, Any]):
    # Apply what torchvision exposes on your version.
    # These attributes exist in torchvision's RegionProposalNetwork in most versions.
    rpn = model.rpn
    if "rpn_nms_thresh" in cfg: rpn.nms_thresh = float(cfg["rpn_nms_thresh"])
    if "rpn_pre_nms_topk" in cfg: rpn.pre_nms_top_n["training"] = int(cfg["rpn_pre_nms_topk"])
    if "rpn_post_nms_topk" in cfg: rpn.post_nms_top_n["training"] = int(cfg["rpn_post_nms_topk"])
    if "rpn_pre_nms_topk" in cfg: rpn.pre_nms_top_n["testing"] = int(cfg["rpn_pre_nms_topk"])
    if "rpn_post_nms_topk" in cfg: rpn.post_nms_top_n["testing"] = int(cfg["rpn_post_nms_topk"])
    if "rpn_fg_iou_thresh" in cfg: rpn.fg_iou_thresh = float(cfg["rpn_fg_iou_thresh"])
    if "rpn_bg_iou_thresh" in cfg: rpn.bg_iou_thresh = float(cfg["rpn_bg_iou_thresh"])
    if "rpn_batch_size_per_image" in cfg: rpn.batch_size_per_image = int(cfg["rpn_batch_size_per_image"])
    if "rpn_positive_fraction" in cfg: rpn.positive_fraction = float(cfg["rpn_positive_fraction"])

def objective_stage2(trial: optuna.Trial) -> float:
    # Fix Stage 1 best optimizer params
    best1 = study_stage1.best_params
    cfg = {
        "stage": "stage2_rpn",
        "seed": int(trial.suggest_int("seed", 1, 10_000)),
        "epochs": 4,  # keep small for HPO budget
        "lr": float(best1["lr"]),
        "weight_decay": float(best1["weight_decay"]),
        "momentum": float(best1["momentum"]),
        "grad_clip_norm": float(best1.get("grad_clip_norm", 0.0)),
        # RPN search
        "rpn_nms_thresh": float(trial.suggest_float("rpn_nms_thresh", 0.5, 0.9)),
        "rpn_pre_nms_topk": int(trial.suggest_int("rpn_pre_nms_topk", 1000, 4000)),
        "rpn_post_nms_topk": int(trial.suggest_int("rpn_post_nms_topk", 300, 2000)),
        "rpn_fg_iou_thresh": float(trial.suggest_float("rpn_fg_iou_thresh", 0.5, 0.8)),
        "rpn_bg_iou_thresh": float(trial.suggest_float("rpn_bg_iou_thresh", 0.0, 0.4)),
        "rpn_batch_size_per_image": int(trial.suggest_int("rpn_batch_size_per_image", 128, 512)),
        "rpn_positive_fraction": float(trial.suggest_float("rpn_positive_fraction", 0.25, 0.75)),
    }

    set_global_seed(cfg["seed"])

    run = wandb.init(
        project="faster-rcnn-optuna-coco-minitrain",
        name=f"optuna_stage2_trial_{trial.number:04d}",
        config=cfg,
        reinit=True
    )

    model = build_model().to(device)
    apply_rpn_hparams(model, cfg)
    optimizer = make_optimizer(model, cfg["lr"], cfg["momentum"], cfg["weight_decay"])

    best_map = -1.0
    for epoch in range(cfg["epochs"]):
        losses = train_one_epoch(model, optimizer, train_loader, epoch, max_norm=cfg["grad_clip_norm"])
        metrics = evaluate_coco_map(model, coco, val_loader)
        val_map = metrics["mAP"]
        best_map = max(best_map, val_map)

        wandb.log({**losses, **{f"val_{k}": v for k, v in metrics.items()}, "epoch": epoch})
        trial.report(val_map, step=epoch)
        if trial.should_prune():
            wandb.log({"pruned": 1, "best_val_mAP": best_map})
            wandb.finish()
            raise optuna.exceptions.TrialPruned()

    wandb.log({"best_val_mAP": best_map, "pruned": 0})
    wandb.finish()
    return best_map

study_stage2 = optuna.create_study(direction="maximize", sampler=sampler, pruner=pruner, study_name="stage2_rpn")

# Run Stage 2 study
N_TRIALS_STAGE2 = int(os.environ.get("HPO_TRIALS", 3))  # default 3 for demo, 30 for assignment
study_stage2.optimize(objective_stage2, n_trials=N_TRIALS_STAGE2, show_progress_bar=True)

print("Best Stage 2:", study_stage2.best_value)
print("Best params:", study_stage2.best_params)

9. Stage 3: RoI head tuning (required)

Fix Stage 1+2 best configuration and tune RoI head sampling and loss weighting. Suggested search space:
  • roi_batch_size_per_image in [128,512][128, 512]
  • roi_positive_fraction in [0.1,0.5][0.1, 0.5]
  • cls_loss_weight in [0.5,2.0][0.5, 2.0]
  • box_loss_weight in [0.5,2.0][0.5, 2.0]
Implementation note:
  • Torchvision ROIHeads exposes sampler parameters.
  • Loss weights might require applying weights to loss terms manually (by scaling loss_dict before summing). You will implement that by creating a custom train_one_epoch_weighted below.

def apply_roi_hparams(model, cfg: Dict[str, Any]):
    roi = model.roi_heads
    if "roi_batch_size_per_image" in cfg: roi.batch_size_per_image = int(cfg["roi_batch_size_per_image"])
    if "roi_positive_fraction" in cfg: roi.positive_fraction = float(cfg["roi_positive_fraction"])

def train_one_epoch_weighted(model, optimizer, data_loader: DataLoader, epoch: int, max_norm: float, cls_w: float, box_w: float):
    model.train()
    loss_sums = {"loss": 0.0}
    n = 0

    for images, targets in data_loader:
        images = [img.to(device) for img in images]
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        # Scale RoI losses; keep RPN terms unscaled by default.
        if "loss_classifier" in loss_dict:
            loss_dict["loss_classifier"] = loss_dict["loss_classifier"] * cls_w
        if "loss_box_reg" in loss_dict:
            loss_dict["loss_box_reg"] = loss_dict["loss_box_reg"] * box_w

        losses = sum(loss for loss in loss_dict.values())

        optimizer.zero_grad(set_to_none=True)
        losses.backward()
        if max_norm and max_norm > 0:
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
        optimizer.step()

        n += 1
        loss_sums["loss"] += float(losses.item())
        for k, v in loss_dict.items():
            loss_sums[k] = loss_sums.get(k, 0.0) + float(v.item())

    for k in loss_sums:
        loss_sums[k] /= max(1, n)
    return loss_sums

def objective_stage3(trial: optuna.Trial) -> float:
    best1 = study_stage1.best_params
    best2 = study_stage2.best_params

    cfg = {
        "stage": "stage3_roi",
        "seed": int(trial.suggest_int("seed", 1, 10_000)),
        "epochs": 4,
        "lr": float(best1["lr"]),
        "weight_decay": float(best1["weight_decay"]),
        "momentum": float(best1["momentum"]),
        "grad_clip_norm": float(best1.get("grad_clip_norm", 0.0)),
        # RPN fixed (best2)
        **{k: best2[k] for k in best2 if k.startswith("rpn_")},
        # RoI search
        "roi_batch_size_per_image": int(trial.suggest_int("roi_batch_size_per_image", 128, 512)),
        "roi_positive_fraction": float(trial.suggest_float("roi_positive_fraction", 0.1, 0.5)),
        "cls_loss_weight": float(trial.suggest_float("cls_loss_weight", 0.5, 2.0)),
        "box_loss_weight": float(trial.suggest_float("box_loss_weight", 0.5, 2.0)),
    }

    set_global_seed(cfg["seed"])
    run = wandb.init(
        project="faster-rcnn-optuna-coco-minitrain",
        name=f"optuna_stage3_trial_{trial.number:04d}",
        config=cfg,
        reinit=True
    )

    model = build_model().to(device)
    apply_rpn_hparams(model, cfg)
    apply_roi_hparams(model, cfg)

    optimizer = make_optimizer(model, cfg["lr"], cfg["momentum"], cfg["weight_decay"])

    best_map = -1.0
    for epoch in range(cfg["epochs"]):
        losses = train_one_epoch_weighted(
            model, optimizer, train_loader, epoch,
            max_norm=cfg["grad_clip_norm"],
            cls_w=cfg["cls_loss_weight"],
            box_w=cfg["box_loss_weight"],
        )
        metrics = evaluate_coco_map(model, coco, val_loader)
        val_map = metrics["mAP"]
        best_map = max(best_map, val_map)

        wandb.log({**losses, **{f"val_{k}": v for k, v in metrics.items()}, "epoch": epoch})
        trial.report(val_map, step=epoch)
        if trial.should_prune():
            wandb.log({"pruned": 1, "best_val_mAP": best_map})
            wandb.finish()
            raise optuna.exceptions.TrialPruned()

    wandb.log({"best_val_mAP": best_map, "pruned": 0})
    wandb.finish()
    return best_map

study_stage3 = optuna.create_study(direction="maximize", sampler=sampler, pruner=pruner, study_name="stage3_roi")

# Run Stage 3 study
N_TRIALS_STAGE3 = int(os.environ.get("HPO_TRIALS", 3))  # default 3 for demo, 30 for assignment
study_stage3.optimize(objective_stage3, n_trials=N_TRIALS_STAGE3, show_progress_bar=True)

print("Best Stage 3:", study_stage3.best_value)
print("Best params:", study_stage3.best_params)

10. Stage 4: post-processing calibration (required)

You will tune score threshold and NMS IoU threshold without retraining. Suggested ranges:
  • score_thresh in [0.01,0.5][0.01, 0.5]
  • box_nms_thresh in [0.3,0.7][0.3, 0.7]
In torchvision:
  • model.roi_heads.score_thresh
  • model.roi_heads.nms_thresh
  • model.roi_heads.detections_per_img
You will:
  1. Train one final model using the best Stage 1+2+3 configuration (longer epochs, e.g., 10–15).
  2. Run an Optuna study that only changes post-processing parameters and evaluates on val.

def apply_postprocess_hparams(model, cfg: Dict[str, Any]):
    roi = model.roi_heads
    if "score_thresh" in cfg: roi.score_thresh = float(cfg["score_thresh"])
    if "box_nms_thresh" in cfg: roi.nms_thresh = float(cfg["box_nms_thresh"])
    if "detections_per_img" in cfg: roi.detections_per_img = int(cfg["detections_per_img"])

def train_final_model(best_cfg: Dict[str, Any], epochs: int = 12, seed: int = 2026) -> str:
    set_global_seed(seed)
    run = wandb.init(
        project="faster-rcnn-optuna-coco-minitrain",
        name=f"final_train_seed_{seed}",
        config={**best_cfg, "final_epochs": epochs, "final_seed": seed},
        reinit=True
    )

    model = build_model().to(device)
    apply_rpn_hparams(model, best_cfg)
    apply_roi_hparams(model, best_cfg)

    optimizer = make_optimizer(model, best_cfg["lr"], best_cfg["momentum"], best_cfg["weight_decay"])

    for epoch in range(epochs):
        losses = train_one_epoch_weighted(
            model, optimizer, train_loader, epoch,
            max_norm=best_cfg.get("grad_clip_norm", 0.0),
            cls_w=best_cfg.get("cls_loss_weight", 1.0),
            box_w=best_cfg.get("box_loss_weight", 1.0),
        )
        metrics = evaluate_coco_map(model, coco, val_loader)
        wandb.log({**losses, **{f"val_{k}": v for k, v in metrics.items()}, "epoch": epoch})

    ckpt = os.path.join("checkpoints", f"final_fasterrcnn_seed_{seed}.pt")
    torch.save(model.state_dict(), ckpt)
    wandb.save(ckpt)
    wandb.finish()
    return ckpt

# Compose best config from Stage 1-3
best_cfg = {}
best_cfg.update(study_stage1.best_params)
best_cfg.update({k: v for k, v in study_stage2.best_params.items() if k.startswith("rpn_")})
best_cfg.update({k: v for k, v in study_stage3.best_params.items() if k.startswith("roi_") or k.endswith("_weight")})

# Ensure required optimizer keys exist
# (names differ across studies; normalize to expected keys)
# Stage1 keys are: lr, weight_decay, momentum, grad_clip_norm
# Keep them as is.
print("Best combined cfg:", best_cfg)

FINAL_CKPT = train_final_model(best_cfg, epochs=12, seed=2026)
print("Final ckpt:", FINAL_CKPT)

@torch.no_grad()
def evaluate_with_postprocess(model, score_thresh: float, nms_thresh: float, dets_per_img: int = 100):
    apply_postprocess_hparams(model, {"score_thresh": score_thresh, "box_nms_thresh": nms_thresh, "detections_per_img": dets_per_img})
    return evaluate_coco_map(model, coco, val_loader)

def objective_stage4(trial: optuna.Trial) -> float:
    cfg = {
        "stage": "stage4_post",
        "score_thresh": float(trial.suggest_float("score_thresh", 0.01, 0.5, log=True)),
        "box_nms_thresh": float(trial.suggest_float("box_nms_thresh", 0.3, 0.7)),
        "detections_per_img": int(trial.suggest_int("detections_per_img", 50, 300)),
    }

    run = wandb.init(
        project="faster-rcnn-optuna-coco-minitrain",
        name=f"optuna_stage4_trial_{trial.number:04d}",
        config=cfg,
        reinit=True
    )

    model = build_model().to(device)
    model.load_state_dict(torch.load(FINAL_CKPT, map_location=device))
    apply_rpn_hparams(model, best_cfg)
    apply_roi_hparams(model, best_cfg)

    metrics = evaluate_with_postprocess(model, cfg["score_thresh"], cfg["box_nms_thresh"], cfg["detections_per_img"])
    wandb.log({f"val_{k}": v for k, v in metrics.items()})
    wandb.finish()
    return metrics["mAP"]

study_stage4 = optuna.create_study(direction="maximize", sampler=sampler, pruner=None, study_name="stage4_post")

N_TRIALS_STAGE4 = int(os.environ.get("HPO_TRIALS", 3))  # default 3 for demo, 30 for assignment
study_stage4.optimize(objective_stage4, n_trials=N_TRIALS_STAGE4, show_progress_bar=True)
print("Best Stage 4:", study_stage4.best_value)
print("Best post params:", study_stage4.best_params)

11. Final multi-seed retraining (required)

Retrain the best configuration (Stages 1–4) with 3 different seeds and report: mean mAP±std.\text{mean mAP} \pm \text{std}. You must log all runs to W&B and include the W&B links in your report.

best_post = study_stage4.best_params if 'study_stage4' in globals() and study_stage4.best_params else {"score_thresh": 0.05, "box_nms_thresh": 0.5, "detections_per_img": 100}
best_full = {**best_cfg, **best_post}
print("Best full config:", best_full)

SEEDS = [11, 22, 33]
ckpts = []
for s in SEEDS:
    ckpts.append(train_final_model(best_full, epochs=12, seed=s))

print("ckpts:", ckpts)

# Evaluate each checkpoint with best post-processing
maps = []
for ckpt in ckpts:
    model = build_model().to(device)
    model.load_state_dict(torch.load(ckpt, map_location=device))
    apply_rpn_hparams(model, best_full)
    apply_roi_hparams(model, best_full)
    apply_postprocess_hparams(model, best_full)
    metrics = evaluate_coco_map(model, coco, val_loader)
    maps.append(metrics["mAP"])
    print(ckpt, metrics)

maps = np.array(maps, dtype=np.float32)
print("mAP mean ± std:", float(maps.mean()), float(maps.std(ddof=1)))

12. Small-object transfer test: drones (extra credit)

You must evaluate:
  • baseline COCO MiniTrain fine-tuned model (Section 6)
  • tuned model (best configuration from Sections 7–11)
on the drone dataset defined in Assignment 3: https://aegean.ai/aiml-common/assignments/main/cv-spring-2026/assignment-3

Requirements

  1. Do not retune hyperparameters on drones initially.
  2. Compute at least:
    • mAP\mathrm{mAP}, AP50\mathrm{AP}_{50}, recall (or COCO AR)
  3. Provide qualitative results showing:
    • missed small drones
    • duplicates / NMS issues
    • low-confidence detections

Implementation note

You must make the drone dataset available in COCO format (images + instances JSON). Set the paths below accordingly.

# TODO: Set these paths to your drone dataset (COCO format) from Assignment 3
DRONE_ROOT = "/content/drone_dataset"  # TODO
DRONE_IMAGES_DIR = os.path.join(DRONE_ROOT, "images")  # TODO
DRONE_ANN_JSON = os.path.join(DRONE_ROOT, "annotations", "instances_drone.json")  # TODO

# Uncomment after you place the dataset:
# assert os.path.exists(DRONE_IMAGES_DIR), "Set DRONE_IMAGES_DIR"
# assert os.path.exists(DRONE_ANN_JSON), "Set DRONE_ANN_JSON"

# drone_coco = COCO(DRONE_ANN_JSON)
# drone_img_ids = sorted(drone_coco.getImgIds())
# drone_ds = CocoMiniTrainDataset(drone_coco, DRONE_IMAGES_DIR, drone_img_ids, train=False)
# drone_loader = DataLoader(drone_ds, batch_size=1, shuffle=False, num_workers=2, collate_fn=collate_fn)

# def eval_on_drones(ckpt_path: str, tag: str):
#     run = wandb.init(project="faster-rcnn-optuna-coco-minitrain", name=f"drone_eval_{tag}", reinit=True)
#     model = build_model().to(device)
#     model.load_state_dict(torch.load(ckpt_path, map_location=device))
#     apply_rpn_hparams(model, best_full)
#     apply_roi_hparams(model, best_full)
#     apply_postprocess_hparams(model, best_full)
#     metrics = evaluate_coco_map(model, drone_coco, drone_loader)
#     wandb.log({f"drone_{k}": v for k, v in metrics.items()})
#     wandb.finish()
#     return metrics

# Example usage (after you set paths):
# baseline_metrics = eval_on_drones(BASELINE_CKPT, "baseline")
# tuned_metrics = eval_on_drones(ckpts[0], "tuned_seed11")
# print("baseline:", baseline_metrics)
# print("tuned:", tuned_metrics)

13. Required written answers (include in your report)

Answer these questions using evidence (W&B plots, metrics, qualitative results):
  1. Which Stage (1–4) delivered the largest gain in mAP\mathrm{mAP}? Why?
  2. Which hyperparameters most influenced small-object recall on drones?
  3. Did increasing rpn_pre_nms_topk help drone detection? Explain using proposal reasoning.
  4. Did changing NMS thresholds change the duplicate-box failure mode? Provide examples.
  5. Is the tuned configuration robust across seeds? Use mean±std\text{mean}\pm\text{std}.