Skip to main contentReliable and up-to-date sidewalk data is a persistent challenge in transportation planning, accessibility assessment, and urban infrastructure management. Public records are often incomplete or completely unavailable, limiting the ability of agencies and planners to make informed decisions. Our project addresses this gap by developing a statewide, AI-driven sidewalk detection pipeline that transforms high-resolution aerial imagery into accurate, vectorized sidewalk networks.
This effort spans the full machine learning lifecycle — from dataset curation and model training, to scalable inference and statewide deployment — and demonstrates how modern AI methods can be applied to real-world infrastructure at scale.
Training Dataset Review and Processing
Our work began with a comprehensive qualitative review of our initial dataset, which lead to extensive data cleaning and ground-truth mask rebuffering in order to directly improve the quality and integrity of our dataset. The original annotations exhibited inconsistencies, missing segments, and artifacts that limited training quality. Through systematic cleaning and geometric correction, we produced higher-fidelity mask data that increased the proportion of true sidewalk pixels, corrected boundary shapes, and ensured continuity. This improved dataset underpins our model’s ability to generalize and achieve strong downstream performance.
Model Architecture: DeepLabV3+ in Detectron2
We selected DeepLabV3+, implemented in a custom fork of Detectron2, as the backbone of our sidewalk detection system. The model combines Atrous Spatial Pyramid Pooling (ASPP) for multi-scale context capture with a decoder head for precise boundary refinement. To adapt it to our problem space, we implemented enhancements such as weighted loss functions to address class imbalance and Parquet-based streaming ingestion for high-throughput training.
Model Training and Metrics Report
Training was conducted in a CLI-driven, streaming-enabled pipeline with support for reproducibility, checkpointing, and artifact logging. Our preliminary runs achieved ~96% pixel accuracy and 0.77–0.78 mean IoU, with sidewalk IoU around 0.58, reflecting the persistent difficulty of minority-class recall under occlusion and annotation gaps. These results validated the strength of our data improvements and training pipeline while also highlighting opportunities for further optimization through hyperparameter tuning and expanded training sets.
Scalable Inference with Ray and Triton
For deployment, we engineered a distributed, actor-based inference system using Ray and Triton Inference Server. The pipeline ingests GeoTIFF imagery, breaks it into chips, performs inference with Triton Inference Server, and aggregates predictions back into full-tile masks in order to extract centerlines from machine learning model segmentations. The asynchronous design allows for a parallelized multi-stage processing workflow that is efficient, fault-tolerant, and scalable — demonstrably able to process hundreds of tiles per hour, with the full capability to vertically and/or horizontally scale to thousands per hour, all while preserving reproducibility and transparency.
New Jersey Statewide Inference Results
To validate our pipeline, we ran inference across 9,202 statewide tiles, generating more than 4.8 million chip predictions. The full job completed in ~17 hours on a single A4000 GPU, demonstrating throughput of nearly 9 tiles per minute and producing detailed, vectorized centerlines that can be directly consumed for transportation planning and accessibility analysis. These results confirm that our pipeline is not only technically robust but also operationally scalable for real-world statewide deployments.
Structure of this Report
The chapters that follow provide a detailed account of each component of this work:
- Training Dataset Review and Processing – an overview of dataset cleaning, and dataset restructuring.
- Detectron2’s DeepLabV3+ Model – an explanation of model architecture choice, model configuration, and implementation.
- Model Training and Metrics Report – training CLI explanation; results from training and utilizing our DeepLabV3+ model.
- Scalable Inference with Ray Pipelining and Triton Inference Server – an in-depth inference pipeline architecture review.
- New Jersey Inference Run Results – statewide pipeline deployment performance and outcome analysis.
Together, these sections document the technical and practical significance of our work: a modern, scalable, and reproducible AI powered system for sidewalk detection at scale.