Skip to main content
Open In Colab

Training Faster RCNN End-to-End

Notebook 5 of 6 in the Faster RCNN from-scratch series This notebook assembles all components (backbone + FPN, RPN, ROI head) into a single FasterRCNN module and trains it on COCO data streamed from Hugging Face. Scope: a short training demo (5 gradient steps) that verifies the full forward + backward pass and saves a checkpoint for notebook 06. Memory notes: we use:
  • 400 × 400 input resolution (vs the canonical 800 × 800) to fit in ~16 GB VRAM
  • PyTorch AMP (automatic mixed precision) — forward in float16, gradients in float32
  • Frozen backbone stem + layer1/2/3 (only layer4, FPN, RPN, ROI head are trained)
import sys, os, pathlib
# Locate frcnn_common.py — works whether run via papermill or interactively
_nb_candidates = [
    pathlib.Path.cwd().parent,  # interactive: cwd is the notebook dir
    pathlib.Path.cwd() / 'notebooks' / 'scene-understanding' / 'object-detection' / 'faster-rcnn' / 'pytorch',  # papermill: cwd is repo root
]
for _p in _nb_candidates:
    if (_p / 'frcnn_common.py').exists():
        sys.path.insert(0, str(_p))
        break

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader

from frcnn_common import (
    IMG_SIZE, NUM_CLASSES, DEVICE,
    COCOStreamDataset, frcnn_collate_fn,
    FasterRCNN,
)

print(f"Device: {DEVICE}")
if DEVICE.type == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM total: {torch.cuda.get_device_properties(0).total_memory/1024**3:.1f} GB")
/workspaces/eng-ai-agents/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Device: cuda
GPU: NVIDIA RTX A4500 Laptop GPU
VRAM total: 15.6 GB
# ─── 1. Data pipeline (imported from frcnn_common) ─────────────────────────────

ds  = COCOStreamDataset(split='train', max_samples=2)
imgs, tgts = frcnn_collate_fn(list(ds))
print(f"Batch images : {imgs.shape}")
print(f"GT boxes     : {[t['boxes'].shape for t in tgts]}")
Batch images : torch.Size([2, 3, 400, 400])
GT boxes     : [torch.Size([8, 4]), torch.Size([2, 4])]
# ─── 2. Backbone: ResNet50 + FPN (imported from frcnn_common) ─────────────────
print("Backbone and FPN imported from frcnn_common.")
Backbone and FPN imported from frcnn_common.
# ─── 3. RPN (imported from frcnn_common) ─────────────────────────────────────
print("RPN components imported from frcnn_common.")
RPN components imported from frcnn_common.
# ─── 4. ROI Head (imported from frcnn_common) ────────────────────────────────
print("ROI head components imported from frcnn_common.")
ROI head components imported from frcnn_common.
# ─── 5. FasterRCNN module (imported from frcnn_common) ───────────────────────

# Quick forward check on CPU
model = FasterRCNN(num_classes=NUM_CLASSES)
model.train()
with torch.no_grad():
    dummy_imgs = torch.randn(1, 3, 600, 600)
    dummy_tgts = [{'boxes':  torch.tensor([[50.,50.,250.,250.],[100.,100.,400.,400.]]),
                   'labels': torch.tensor([3, 7])}]
    losses_check = model(dummy_imgs, dummy_tgts)
print("Loss keys:", list(losses_check.keys()))
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total     = sum(p.numel() for p in model.parameters())
print(f"Parameters: {total/1e6:.1f}M total  |  {trainable/1e6:.1f}M trainable")
Loss keys: ['rpn_cls', 'rpn_box', 'roi_cls', 'roi_box']
Parameters: 41.8M total  |  33.2M trainable

Loss terms

Faster RCNN minimizes four losses jointly:
LossComponentWhat it optimizes
rpn_clsRPNDistinguish foreground anchors (overlap GT ≥ 0.7) from background (overlap < 0.3)
rpn_boxRPNRegress anchor → GT box for positive anchors (smooth-L1)
roi_clsROI headClassify each proposal into one of 80 COCO categories or background
roi_boxROI headRegress proposal → GT box for positive ROIs, class-specific (smooth-L1)
The total loss is their unweighted sum. Early in training roi_cls dominates because the randomly initialized classifier is far from correct; rpn_box and roi_box decrease as the regressors learn offsets.
Training scale: this demo runs for only 5 gradient steps — enough to confirm that the full forward + backward pass works and that all four losses decrease. It is not a converged model. Real COCO training requires ~90 000 steps (~12 epochs) on a multi-GPU machine. The checkpoint saved at the end of this notebook is used only to verify the inference pipeline in notebook 06.
# ─── 6. Training demo (5 gradient steps) ──────────────────────────────────────

model     = FasterRCNN(num_classes=NUM_CLASSES).to(DEVICE)
optimizer = torch.optim.SGD(
    [p for p in model.parameters() if p.requires_grad],
    lr=0.005, momentum=0.9, weight_decay=1e-4)
scaler    = torch.amp.GradScaler('cuda')

TRAIN_STEPS = 5
train_ds = COCOStreamDataset(split='train', max_samples=TRAIN_STEPS)
train_dl = DataLoader(train_ds, batch_size=1, collate_fn=frcnn_collate_fn)

model.train()
history = []

for step, (images, targets) in enumerate(train_dl):
    images  = images.to(DEVICE)
    targets = [{k: v.to(DEVICE) for k, v in t.items()} for t in targets]

    # ── Core 5-step training loop ──────────────────────────────────────────────
    # 1. Zero gradients
    optimizer.zero_grad()
    # 2. Forward pass (AMP: mixed precision for memory efficiency)
    with torch.amp.autocast('cuda'):
        losses = model(images, targets)
        total  = sum(losses.values())
    # 3. Backward pass (scaled for AMP)
    scaler.scale(total).backward()
    # 4. Gradient clipping (stability: prevents exploding gradients)
    scaler.unscale_(optimizer)
    nn.utils.clip_grad_norm_([p for p in model.parameters() if p.requires_grad],
                              max_norm=10.0)
    # 5. Optimizer step
    scaler.step(optimizer)
    scaler.update()

    info = {k: f"{v.item():.4f}" for k, v in losses.items()}
    info['total'] = f"{total.item():.4f}"
    history.append({k: float(v.item()) for k, v in {**losses, 'total': total}.items()})
    print(f"Step {step+1}/{TRAIN_STEPS}  {info}")

print("\nTraining demo complete.")
Step 1/5  {'rpn_cls': '0.6837', 'rpn_box': '0.0983', 'roi_cls': '4.4301', 'roi_box': '0.0001', 'total': '5.2122'}
Step 2/5  {'rpn_cls': '0.6723', 'rpn_box': '0.1660', 'roi_cls': '4.0081', 'roi_box': '0.0000', 'total': '4.8464'}
Step 3/5  {'rpn_cls': '0.6485', 'rpn_box': '0.0649', 'roi_cls': '3.1855', 'roi_box': '0.0272', 'total': '3.9261'}
Step 4/5  {'rpn_cls': '0.6520', 'rpn_box': '0.1292', 'roi_cls': '3.1818', 'roi_box': '0.0001', 'total': '3.9631'}
Step 5/5  {'rpn_cls': '0.6011', 'rpn_box': '0.1212', 'roi_cls': '1.6939', 'roi_box': '0.0611', 'total': '2.4773'}

Training demo complete.
# ─── 7. Loss curves ────────────────────────────────────────────────────────────

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

ax = axes[0]
for k in [kk for kk in history[0] if kk != 'total']:
    ax.plot([h[k] for h in history], label=k, marker='o')
ax.set_xlabel('Step'); ax.set_ylabel('Loss')
ax.set_title('Individual Loss Components (5 steps)'); ax.legend()

axes[1].plot([h['total'] for h in history], 'r-o', linewidth=2)
axes[1].set_xlabel('Step'); axes[1].set_ylabel('Total loss')
axes[1].set_title('Total Loss (5 steps)')

plt.tight_layout()
os.makedirs('images', exist_ok=True)
plt.savefig('images/loss_curves.png', dpi=100, bbox_inches='tight')
plt.show()
Output from cell 8

Optional: stability and performance enhancements

The training loop above includes three techniques that are not part of the core algorithm but are essential in practice:
TechniqueWhy it helps
AMP (torch.amp.autocast)Runs forward in float16 → halves VRAM; GradScaler prevents underflow in backward
Gradient clipping (clip_grad_norm_)Caps the gradient norm at 10.0 — prevents loss spikes when the RPN proposes very large boxes early in training
Gradient checkpointing (torch.utils.checkpoint)Already applied in ResNet50.forward for layer3/layer4 — trades recomputation for memory
For a production training run you would also add:
  • LR warm-up — ramp learning rate from 0 to 0.005 over the first 500 steps before any decay
  • Multi-step LR decay — drop by 0.1× at epochs 8 and 11 (standard Detectron2 schedule)
  • Periodic checkpointing — save every N steps, not just at the end
# ─── 8. Save checkpoint ────────────────────────────────────────────────────────

os.makedirs('checkpoints', exist_ok=True)
ckpt_path = 'checkpoints/faster_rcnn_demo.pth'
torch.save({
    'model_state_dict':     model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'steps_trained':        TRAIN_STEPS,
    'num_classes':          NUM_CLASSES,
    'final_losses':         history[-1],
}, ckpt_path)
size_mb = os.path.getsize(ckpt_path) / 1024**2
print(f"Checkpoint saved → {ckpt_path}  ({size_mb:.1f} MB)")
print(f"Final losses: { {k: f'{v:.4f}' for k,v in history[-1].items()} }")
Checkpoint saved → checkpoints/faster_rcnn_demo.pth  (286.3 MB)
Final losses: {'rpn_cls': '0.6011', 'rpn_box': '0.1212', 'roi_cls': '1.6939', 'roi_box': '0.0611', 'total': '2.4773'}
Key references: (Wightman et al., 2021; Redmon & Farhadi, 2016; Zagoruyko & Komodakis, 2016; Szegedy et al., 2016; Tan & Le, 2019)

References

  • Redmon, J., Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger.
  • Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A. (2016). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.
  • Tan, M., Le, Q. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.
  • Wightman, R., Touvron, H., Jégou, H. (2021). ResNet strikes back: An improved training procedure in timm.
  • Zagoruyko, S., Komodakis, N. (2016). Wide Residual Networks.