Introduction
This document outlines a distributed, actor-based inference system designed to process large-scale geospatial machine learning workloads from raw imagery to vectorized outputs. The system combines Ray for parallel and stateful pipeline orchestration with NVIDIA’s Triton Inference Server for efficient GPU-accelerated model serving, together enabling high throughput with minimal idle time across all stages.About Ray
Ray is an open-source framework for building and running distributed applications at scale. It provides a unified runtime for tasks (stateless units of work) and actors (stateful, long-lived processes). Ray was chosen because it:- Supports persistent actors that maintain state across calls
- Offers a simple API for asynchronous and parallel execution
- Can scale from a single machine to a multi-node cluster
About Triton Inference Server
NVIDIA Triton Inference Server enables deployment of trained AI models from multiple frameworks. Triton was chosen because it:- Provides optimized GPU utilization through dynamic batching
- Supports multiple model backends
- Integrates with both local and cloud deployments
High-Level Ray Actor Overview
| Actor(s) | Primary Function | Key Inputs | Key Outputs |
|---|---|---|---|
| ControllerActor | Orchestrates all stages | Job list, config | Run metadata, progress table |
| TileLoaderActor | Ingests tiles, extracts chips | GeoTIFF tiles | Chips + metadata |
| InputQueueActor | Buffers chips between stages | Chip records | Chip records for batching |
| InferenceDispatcherActor | Batches chips, runs inference | Chips from queue | Chip predictions |
| AggregatorActor | Buffers predictions per tile | Chip predictions | Complete per-tile sets |
| PostProcessingActor | Stitches masks, extracts centerlines | Complete tile sets | GeoJSON centerlines |
| CenterlineWorker | Converts polygons to centerlines | Polygon batches | Vectorized centerlines |
Pipeline Ingress
Building Tile Jobs
Thebuild_tile_jobs(...) function creates a standardized list of per-tile job dicts:
Running the Pipeline
Pipeline Stages
Lifecycle of a Single Tile
- Ingestion –
TileLoaderActorreads the tile, extracts chips, sends to queue - Queuing –
InputQueueActorbuffers chips for downstream consumption - Inference –
InferenceDispatcherActorbatches chips, runs model inference - Aggregation –
AggregatorActorgroups predictions until tile complete - Post-Processing –
PostProcessingActorstitches mask, extracts centerlines
Stage 0 — ControllerActor
- Startup & Wiring: Launches all workers, connects handoffs
- Progress Tracking: Maintains progress table indexed by (job_id, tile_id)
- Health & Logging: Polls actors, logs status summaries
- Completion Criteria: Declares complete when all tiles processed
Stage 1 — TileLoaderActor
- Reads
.tiftiles from local or S3 storage - Extracts geospatial metadata (CRS, transform, dimensions)
- Splits tiles into chips with overlap
- Assigns composite keys for traceability
Stage 2 — InferenceDispatcherActor
- Accumulates chips into batches (default size: 200)
- Normalizes inputs (mean/std from model config)
- Sends mini-batches to Triton via gRPC
- Applies softmax and confidence thresholding
- Handles backpressure with exponential backoff
Stage 3 — PostProcessingActor
- Reconstructs full-size prediction mask from chips
- Applies morphological operations
- Converts polygons to centerlines via CenterlineWorker pool
- Writes GeoJSON output to local/S3
Triton Inference Server Configuration
Model Directory Structure
Configuration
Chapter Summary
The inference pipeline transforms large-scale geospatial imagery into usable vector data through a fully automated, parallel workflow:- Starting from GeoTIFF tiles
- Applying configurable preprocessing
- Performing semantic segmentation via Triton
- Reassembling predictions at tile scale
- Converting to vectorized sidewalk centerlines
- Scalable concurrency across multiple tiles and jobs
- Robust fault handling with per-tile tracking
- Flexible deployment for local or cloud environments
- Minimal idle time through asynchronous handoffs
- Clear observability via central controller
- Vectorized per-tile GeoJSON centerlines
- Progress and status tracking tables
- Complete run metadata and execution logs

