Introduction
Here we outline the end-to-end process of training and evaluating our model. This includes how jobs are launched and managed through the command-line interface, the training strategies defined in our configuration files, the metrics we use to monitor progress, and the artifacts produced for reproducibility.Command Line Interface (CLI) Usage
To launch or resume a training job:Run Types
New Run:CLI Overrides
- Dedicated Flags:
--weights_path,--batch_size,--buffer_batches,--num_workers - Universal Overrides (
--opts): Modify any key from the base YAML
Training Run Artifacts & Reproducibility
Each training run produces a self-contained folder:Training Schedule & Strategy
Architecture and Initialization
- Backbone: ResNet-101 with DeepLab modifications
- Decoder: Custom
WeightedDeepLabHeadwith ASPP and SyncBN - Classes: Two output classes (background = 0, sidewalk = 1)
- Weights: Initialized from Detectron2’s Cityscapes-trained checkpoint
Loss Strategy
- Loss Type: Hard pixel mining (with TOP_K_PERCENT_PIXELS=1.0, effectively weighted cross-entropy)
- Class Weighting: Background = 1.0, Foreground (sidewalk) = 10.0
Training Schedule
- Iterations: 100,000 (with batch size 48, this yields 4.8 million chip exposures)
- Warmup: 1,000 iterations, linear ramp-up
- Learning Rate: Base 0.001 → Final 0.0001 with cosine decay
- Optimizer: SGD with momentum = 0.9
- Gradient Clipping: Enabled (norm clipping at 1.0)
Model Training Metrics Overview
Training Loss
Per-pixel cross-entropy loss with class weighting: where is the class weight (1.0 for background, 10.0 for sidewalk).Training Learning Rate
Linear warmup → cosine decay schedule:- Start LR: 1e-6 (BASE_LR × WARMUP_FACTOR)
- Base LR: 1e-3 after warmup
- Final LR: 1e-4
Validation Metrics
- Background IoU: Evaluates avoidance of false sidewalk predictions
- Sidewalk IoU: Key measure of actual sidewalk detection ability
- Mean IoU (mIoU): Average of per-class IoUs, balanced metric
- Pixel Accuracy: Proportion of correctly classified pixels
Final Validation and Test Metrics
| Metric | Validation | Test |
|---|---|---|
| Pixel Accuracy | 0.9640 | 0.9638 |
| Mean IoU | 0.7784 | 0.7732 |
| Background IoU | 0.9620 | 0.9619 |
| Sidewalk IoU | 0.5949 | 0.5845 |
- Strong alignment: Validation and test sets are tightly matched
- Pixel accuracy consistency: ~96.4% in both cases
- Sidewalk IoU challenge persists: Both splits report ~0.59
- Mean IoU stability: ~0.77–0.78 across splits
Training Complications & Lessons Learned
Annotation Ceiling
Incomplete ground truth annotations introduced a hard ceiling on achievable IoU. Dataset refinement is as critical as architectural changes.Sidewalk IoU vs. Practical Utility
The model often predicted sidewalks with a slightly wider buffer than annotated masks. This had no negative effect on the downstream task of extracting centerlines.Class Imbalance and Loss Weighting
We applied class weighting (foreground ×10) and hard pixel mining to stabilize training and improve sidewalk recall.Streaming Complexity
The use of Hugging Face streaming with buffer-based shuffling added complexity but allowed scaling to large datasets.Chapter Summary
This training run validated our streaming-based pipeline, produced a stable DeepLabV3+ segmentation model, and showed consistent alignment between validation and test performance. Pixel accuracy reached ~96%, mean IoU stabilized around ~0.77–0.78, and sidewalk IoU held at ~0.58–0.59. Looking ahead, improvements will focus on:- Refining data quality and coverage
- Developing task-specific evaluation metrics
- Scaling training with hyperparameter search
- Exploring model variations

