Skip to main content

Introduction

Here we outline the end-to-end process of training and evaluating our model. This includes how jobs are launched and managed through the command-line interface, the training strategies defined in our configuration files, the metrics we use to monitor progress, and the artifacts produced for reproducibility.

Command Line Interface (CLI) Usage

To launch or resume a training job:
cd projects/1_Sidewalks-DeepLab_Refactored

Run Types

New Run:
python scripts/run_training_job_iterable.py \
  --config_file configs/deeplab-v3-plus-resnet103.yaml \
  --dataset_path data/training/processed/sidewalks_buffered_v2_balancedsplit/final_processed_dataset \
  --batch_size 48 \
  --buffer_batches 100
New Run Forked from Pretrained Weights:
python scripts/run_training_job_iterable.py \
  --config_file configs/deeplab-v3-plus-resnet103.yaml \
  --dataset_path data/training/processed/sidewalks_buffered_v2_balancedsplit/final_processed_dataset \
  --weights_path models/deeplab-v3-plus-resnet103_experiment_3_model_final_2025-06-23_21-36-33
Strict Resume:
python scripts/run_training_job_iterable.py \
  --resume_run models/deeplab-v3-plus-resnet103_experiment_3_model_final_2025-06-23_21-36-33 \
  --dataset_path data/training/processed/sidewalks_buffered_v2_balancedsplit/final_processed_dataset

CLI Overrides

  • Dedicated Flags: --weights_path, --batch_size, --buffer_batches, --num_workers
  • Universal Overrides (--opts): Modify any key from the base YAML

Training Run Artifacts & Reproducibility

Each training run produces a self-contained folder:
deeplab-v3-plus-resnet103_experiment_R-103_2025-06-22_14-15-59/
├── events.out.tfevents...       # TensorBoard logs
├── last_checkpoint              # Pointer to latest model
├── metrics.json                 # Aggregated metrics
├── model_0000199.pth            # Saved checkpoints
├── model_final.pth              # Final checkpoint
├── original_config.yaml         # Base config
├── resolved_config.yaml         # Final merged config
└── val_metrics.json             # Rolling validation results

Training Schedule & Strategy

Architecture and Initialization

  • Backbone: ResNet-101 with DeepLab modifications
  • Decoder: Custom WeightedDeepLabHead with ASPP and SyncBN
  • Classes: Two output classes (background = 0, sidewalk = 1)
  • Weights: Initialized from Detectron2’s Cityscapes-trained checkpoint

Loss Strategy

  • Loss Type: Hard pixel mining (with TOP_K_PERCENT_PIXELS=1.0, effectively weighted cross-entropy)
  • Class Weighting: Background = 1.0, Foreground (sidewalk) = 10.0

Training Schedule

  • Iterations: 100,000 (with batch size 48, this yields 4.8 million chip exposures)
  • Warmup: 1,000 iterations, linear ramp-up
  • Learning Rate: Base 0.001 → Final 0.0001 with cosine decay
  • Optimizer: SGD with momentum = 0.9
  • Gradient Clipping: Enabled (norm clipping at 1.0)

Model Training Metrics Overview

Training Loss

Per-pixel cross-entropy loss with class weighting: (y^i,yi)=wyilogexp(y^i,yi)c=1Cexp(y^i,c)\ell(\hat{y}_i, y_i) = -\, w_{y_i}\,\log\frac{\exp(\hat{y}_{i,y_i})}{\sum_{c=1}^C \exp(\hat{y}_{i,c})} where wyiw_{y_i} is the class weight (1.0 for background, 10.0 for sidewalk).

Training Learning Rate

Linear warmup → cosine decay schedule:
  • Start LR: 1e-6 (BASE_LR × WARMUP_FACTOR)
  • Base LR: 1e-3 after warmup
  • Final LR: 1e-4

Validation Metrics

  • Background IoU: Evaluates avoidance of false sidewalk predictions
  • Sidewalk IoU: Key measure of actual sidewalk detection ability
  • Mean IoU (mIoU): Average of per-class IoUs, balanced metric
  • Pixel Accuracy: Proportion of correctly classified pixels

Final Validation and Test Metrics

MetricValidationTest
Pixel Accuracy0.96400.9638
Mean IoU0.77840.7732
Background IoU0.96200.9619
Sidewalk IoU0.59490.5845
Key Observations:
  • Strong alignment: Validation and test sets are tightly matched
  • Pixel accuracy consistency: ~96.4% in both cases
  • Sidewalk IoU challenge persists: Both splits report ~0.59
  • Mean IoU stability: ~0.77–0.78 across splits

Training Complications & Lessons Learned

Annotation Ceiling

Incomplete ground truth annotations introduced a hard ceiling on achievable IoU. Dataset refinement is as critical as architectural changes.

Sidewalk IoU vs. Practical Utility

The model often predicted sidewalks with a slightly wider buffer than annotated masks. This had no negative effect on the downstream task of extracting centerlines.

Class Imbalance and Loss Weighting

We applied class weighting (foreground ×10) and hard pixel mining to stabilize training and improve sidewalk recall.

Streaming Complexity

The use of Hugging Face streaming with buffer-based shuffling added complexity but allowed scaling to large datasets.

Chapter Summary

This training run validated our streaming-based pipeline, produced a stable DeepLabV3+ segmentation model, and showed consistent alignment between validation and test performance. Pixel accuracy reached ~96%, mean IoU stabilized around ~0.77–0.78, and sidewalk IoU held at ~0.58–0.59. Looking ahead, improvements will focus on:
  • Refining data quality and coverage
  • Developing task-specific evaluation metrics
  • Scaling training with hyperparameter search
  • Exploring model variations

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.