Skip to main content

Introduction

This document outlines a distributed, actor-based inference system designed to process large-scale geospatial machine learning workloads from raw imagery to vectorized outputs. The system combines Ray for parallel and stateful pipeline orchestration with NVIDIA’s Triton Inference Server for efficient GPU-accelerated model serving, together enabling high throughput with minimal idle time across all stages.

About Ray

Ray is an open-source framework for building and running distributed applications at scale. It provides a unified runtime for tasks (stateless units of work) and actors (stateful, long-lived processes). Ray was chosen because it:
  • Supports persistent actors that maintain state across calls
  • Offers a simple API for asynchronous and parallel execution
  • Can scale from a single machine to a multi-node cluster

About Triton Inference Server

NVIDIA Triton Inference Server enables deployment of trained AI models from multiple frameworks. Triton was chosen because it:
  • Provides optimized GPU utilization through dynamic batching
  • Supports multiple model backends
  • Integrates with both local and cloud deployments

High-Level Ray Actor Overview

Actor(s)Primary FunctionKey InputsKey Outputs
ControllerActorOrchestrates all stagesJob list, configRun metadata, progress table
TileLoaderActorIngests tiles, extracts chipsGeoTIFF tilesChips + metadata
InputQueueActorBuffers chips between stagesChip recordsChip records for batching
InferenceDispatcherActorBatches chips, runs inferenceChips from queueChip predictions
AggregatorActorBuffers predictions per tileChip predictionsComplete per-tile sets
PostProcessingActorStitches masks, extracts centerlinesComplete tile setsGeoJSON centerlines
CenterlineWorkerConverts polygons to centerlinesPolygon batchesVectorized centerlines

Pipeline Ingress

Building Tile Jobs

The build_tile_jobs(...) function creates a standardized list of per-tile job dicts:
{
    "tif_path": "s3://bucket/path/to.tif",
    "tile_id": "H6B10",
    "job_id": "njogis-2020",
    "requested_chip_size": 256,
    "requested_chip_overlap": 32,
    "use_pseudo_color_nir": True,
    "target_format": "NIR-GB",
    "target_model_input_size": [3, 256, 256]
}

Running the Pipeline

store = s3_store()
files = list_s3_files(store, prefix="imagery/njogis-tiles/2020/cog")
tif_keys = sorted(files["key"])
s3_tif_paths = [os.path.join("s3://njtpa/", k) for k in tif_keys]

tile_jobs = build_tile_jobs(tif_paths=s3_tif_paths, job_id="njogis-tiles_2020")

main(
    tile_jobs=tile_jobs,
    run_id="njogis-tiles_2020_cog_full_run",
    endpoint="triton-inference-server:8001",
    model_name="batched_semseg_model",
    model_version="1",
    num_tileloaders=3,
    num_postprocessors=3,
    storage_mode="s3",
    store=store,
)

Pipeline Stages

Lifecycle of a Single Tile

  1. IngestionTileLoaderActor reads the tile, extracts chips, sends to queue
  2. QueuingInputQueueActor buffers chips for downstream consumption
  3. InferenceInferenceDispatcherActor batches chips, runs model inference
  4. AggregationAggregatorActor groups predictions until tile complete
  5. Post-ProcessingPostProcessingActor stitches mask, extracts centerlines

Stage 0 — ControllerActor

  • Startup & Wiring: Launches all workers, connects handoffs
  • Progress Tracking: Maintains progress table indexed by (job_id, tile_id)
  • Health & Logging: Polls actors, logs status summaries
  • Completion Criteria: Declares complete when all tiles processed

Stage 1 — TileLoaderActor

  • Reads .tif tiles from local or S3 storage
  • Extracts geospatial metadata (CRS, transform, dimensions)
  • Splits tiles into chips with overlap
  • Assigns composite keys for traceability

Stage 2 — InferenceDispatcherActor

  • Accumulates chips into batches (default size: 200)
  • Normalizes inputs (mean/std from model config)
  • Sends mini-batches to Triton via gRPC
  • Applies softmax and confidence thresholding
  • Handles backpressure with exponential backoff

Stage 3 — PostProcessingActor

  • Reconstructs full-size prediction mask from chips
  • Applies morphological operations
  • Converts polygons to centerlines via CenterlineWorker pool
  • Writes GeoJSON output to local/S3

Triton Inference Server Configuration

Model Directory Structure

models/
└── batched_semseg_model/
    ├── config.pbtxt
    └── 1/
        └── model.onnx

Configuration

name: "batched_semseg_model"
platform: "onnxruntime_onnx"
max_batch_size: 200

instance_group [
  {
    kind: KIND_GPU
    count: 1
    gpus: [0]
  }
]

dynamic_batching {
  preferred_batch_size: [16, 32, 64, 128, 150, 200]
  max_queue_delay_microseconds: 100000
  preserve_ordering: true
}

input [
  {
    name: "image"
    data_type: TYPE_FP32
    dims: [3, -1, -1]
  }
]

output [
  {
    name: "sem_seg"
    data_type: TYPE_FP32
    dims: [2, -1, -1]
  }
]

Chapter Summary

The inference pipeline transforms large-scale geospatial imagery into usable vector data through a fully automated, parallel workflow:
  1. Starting from GeoTIFF tiles
  2. Applying configurable preprocessing
  3. Performing semantic segmentation via Triton
  4. Reassembling predictions at tile scale
  5. Converting to vectorized sidewalk centerlines
The Ray-based architecture provides:
  • Scalable concurrency across multiple tiles and jobs
  • Robust fault handling with per-tile tracking
  • Flexible deployment for local or cloud environments
  • Minimal idle time through asynchronous handoffs
  • Clear observability via central controller
Each run produces:
  • Vectorized per-tile GeoJSON centerlines
  • Progress and status tracking tables
  • Complete run metadata and execution logs

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.