Model Training and Metrics Report

Introduction

Here we outline the end-to-end process of training and evaluating our model. This includes how jobs are launched and managed through the command-line interface, the training strategies defined in our configuration files, the metrics we use to monitor progress, and the artifacts produced for reproducibility.

Command Line Interface (CLI) Usage

To launch or resume a training job:

cd projects/1_Sidewalks-DeepLab_Refactored

Run Types

New Run:

python scripts/run_training_job_iterable.py \
  --config_file configs/deeplab-v3-plus-resnet103.yaml \
  --dataset_path data/training/processed/sidewalks_buffered_v2_balancedsplit/final_processed_dataset \
  --batch_size 48 \
  --buffer_batches 100

New Run Forked from Pretrained Weights:

python scripts/run_training_job_iterable.py \
  --config_file configs/deeplab-v3-plus-resnet103.yaml \
  --dataset_path data/training/processed/sidewalks_buffered_v2_balancedsplit/final_processed_dataset \
  --weights_path models/deeplab-v3-plus-resnet103_experiment_3_model_final_2025-06-23_21-36-33

Strict Resume:

python scripts/run_training_job_iterable.py \
  --resume_run models/deeplab-v3-plus-resnet103_experiment_3_model_final_2025-06-23_21-36-33 \
  --dataset_path data/training/processed/sidewalks_buffered_v2_balancedsplit/final_processed_dataset

CLI Overrides

Dedicated Flags: --weights_path, --batch_size, --buffer_batches, --num_workers
Universal Overrides (--opts): Modify any key from the base YAML

Training Run Artifacts & Reproducibility

Each training run produces a self-contained folder:

deeplab-v3-plus-resnet103_experiment_R-103_2025-06-22_14-15-59/
├── events.out.tfevents...       # TensorBoard logs
├── last_checkpoint              # Pointer to latest model
├── metrics.json                 # Aggregated metrics
├── model_0000199.pth            # Saved checkpoints
├── model_final.pth              # Final checkpoint
├── original_config.yaml         # Base config
├── resolved_config.yaml         # Final merged config
└── val_metrics.json             # Rolling validation results

Training Schedule & Strategy

Architecture and Initialization

Backbone: ResNet-101 with DeepLab modifications
Decoder: Custom WeightedDeepLabHead with ASPP and SyncBN
Classes: Two output classes (background = 0, sidewalk = 1)
Weights: Initialized from Detectron2’s Cityscapes-trained checkpoint

Loss Strategy

Loss Type: Hard pixel mining (with TOP_K_PERCENT_PIXELS=1.0, effectively weighted cross-entropy)
Class Weighting: Background = 1.0, Foreground (sidewalk) = 10.0

Training Schedule

Iterations: 100,000 (with batch size 48, this yields 4.8 million chip exposures)
Warmup: 1,000 iterations, linear ramp-up
Learning Rate: Base 0.001 → Final 0.0001 with cosine decay
Optimizer: SGD with momentum = 0.9
Gradient Clipping: Enabled (norm clipping at 1.0)

Model Training Metrics Overview

Training Loss

Per-pixel cross-entropy loss with class weighting:

\ell(\hat{y}_i, y_i) = -\, w_{y_i}\,\log\frac{\exp(\hat{y}_{i,y_i})}{\sum_{c=1}^C \exp(\hat{y}_{i,c})}

where

w_{y_i}

is the class weight (1.0 for background, 10.0 for sidewalk).

Training Learning Rate

Linear warmup → cosine decay schedule:

Start LR: 1e-6 (BASE_LR × WARMUP_FACTOR)
Base LR: 1e-3 after warmup
Final LR: 1e-4

Validation Metrics

Background IoU: Evaluates avoidance of false sidewalk predictions
Sidewalk IoU: Key measure of actual sidewalk detection ability
Mean IoU (mIoU): Average of per-class IoUs, balanced metric
Pixel Accuracy: Proportion of correctly classified pixels

Final Validation and Test Metrics

Metric	Validation	Test
Pixel Accuracy	0.9640	0.9638
Mean IoU	0.7784	0.7732
Background IoU	0.9620	0.9619
Sidewalk IoU	0.5949	0.5845

Key Observations:

Strong alignment: Validation and test sets are tightly matched
Pixel accuracy consistency: ~96.4% in both cases
Sidewalk IoU challenge persists: Both splits report ~0.59
Mean IoU stability: ~0.77–0.78 across splits

Training Complications & Lessons Learned

Annotation Ceiling

Incomplete ground truth annotations introduced a hard ceiling on achievable IoU. Dataset refinement is as critical as architectural changes.

Sidewalk IoU vs. Practical Utility

The model often predicted sidewalks with a slightly wider buffer than annotated masks. This had no negative effect on the downstream task of extracting centerlines.

Class Imbalance and Loss Weighting

We applied class weighting (foreground ×10) and hard pixel mining to stabilize training and improve sidewalk recall.

Streaming Complexity

The use of Hugging Face streaming with buffer-based shuffling added complexity but allowed scaling to large datasets.

Chapter Summary

This training run validated our streaming-based pipeline, produced a stable DeepLabV3+ segmentation model, and showed consistent alignment between validation and test performance. Pixel accuracy reached ~96%, mean IoU stabilized around ~0.77–0.78, and sidewalk IoU held at ~0.58–0.59. Looking ahead, improvements will focus on:

Refining data quality and coverage
Developing task-specific evaluation metrics
Scaling training with hyperparameter search
Exploring model variations

Edit this page on GitHub or file an issue.

Overview

Remote Sensing

Manufacturing QC

Model Training and Metrics Report

Introduction

Command Line Interface (CLI) Usage

Run Types

CLI Overrides

Training Run Artifacts & Reproducibility

Training Schedule & Strategy

Architecture and Initialization

Loss Strategy

Training Schedule

Model Training Metrics Overview

Training Loss

Training Learning Rate

Validation Metrics

Final Validation and Test Metrics

Training Complications & Lessons Learned

Annotation Ceiling

Sidewalk IoU vs. Practical Utility

Class Imbalance and Loss Weighting

Streaming Complexity

Chapter Summary

Overview

Remote Sensing

Manufacturing QC

​Introduction

​Command Line Interface (CLI) Usage

​Run Types

​CLI Overrides

​Training Run Artifacts & Reproducibility

​Training Schedule & Strategy

​Architecture and Initialization

​Loss Strategy

​Training Schedule

​Model Training Metrics Overview

​Training Loss

​Training Learning Rate

​Validation Metrics

​Final Validation and Test Metrics

​Training Complications & Lessons Learned

​Annotation Ceiling

​Sidewalk IoU vs. Practical Utility

​Class Imbalance and Loss Weighting

​Streaming Complexity

​Chapter Summary

Introduction

Command Line Interface (CLI) Usage

Run Types

CLI Overrides

Training Run Artifacts & Reproducibility

Training Schedule & Strategy

Architecture and Initialization

Loss Strategy

Training Schedule

Model Training Metrics Overview

Training Loss

Training Learning Rate

Validation Metrics

Final Validation and Test Metrics

Training Complications & Lessons Learned

Annotation Ceiling

Sidewalk IoU vs. Practical Utility

Class Imbalance and Loss Weighting

Streaming Complexity

Chapter Summary