Skip to main content
Fine-tuning a Detectron2 model (pretrained on COCO) can be accelerated and scaled using Ray’s ecosystem: Ray Train for distributed training, Ray Tune for hyperparameter search, and Ray Serve for serving or orchestrating training requests. Throughout the process, ClearML can track experiments, metrics, and models.

Distributed Training with Ray Train (Multi-GPU/Node)

Ray Train provides a simple interface to distribute PyTorch training across multiple GPUs and even multiple nodes. Instead of using Detectron2’s built-in launcher, you can leverage Ray’s TorchTrainer to run the training loop on several parallel workers. Key steps for using Ray Train:
  • Define a training function that sets up Detectron2’s configuration and runs a training loop
  • Initialize Ray and create a TorchTrainer with ScalingConfig
  • Ensure distributed training is properly configured for Detectron2
import ray
from ray.train.torch import TorchTrainer
from ray.train import ScalingConfig

def train_detectron2(config):
    import torch, detectron2
    from detectron2.config import get_cfg
    from detectron2.engine import DefaultTrainer
    from detectron2.utils.comm import create_local_process_group

    # 1. Setup distributed process group
    world_size = ray.train.get_context().get_world_size()
    if world_size > 1:
        create_local_process_group(num_workers_per_machine=world_size)

    # 2. Register the dataset
    from detectron2.data import DatasetCatalog, MetadataCatalog
    DatasetCatalog.register("custom_train", lambda: prepare_dataset(config["hf_dataset_name"]))
    MetadataCatalog.get("custom_train").set(thing_classes=["..."])

    # 3. Load base config and update it
    cfg = get_cfg()
    cfg.merge_from_file(config["model_cfg_path"])
    cfg.DATASETS.TRAIN = ("custom_train",)
    cfg.SOLVER.BASE_LR = config["lr"]
    cfg.SOLVER.MAX_ITER = config["max_iter"]

    # 4. Train using Detectron2's Trainer
    trainer = DefaultTrainer(cfg)
    trainer.resume_or_load(resume=False)
    trainer.train()
    return {"eval_mAP": metrics.get("bbox/AP", 0)}

ray.init()
trainer = TorchTrainer(
    train_loop_per_worker=train_detectron2,
    train_loop_config={
        "hf_dataset_name": "user/dataset",
        "model_cfg_path": "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml",
        "lr": 0.00025, "max_iter": 300, "num_classes": 1
    },
    scaling_config=ScalingConfig(num_workers=4, use_gpu=True)
)
result = trainer.fit()

Hyperparameter Tuning with Ray Tune

Ray Tune can automate hyperparameter optimization by running multiple trials of the training function with different hyperparameters.
from ray import tune
from ray.tune import Tuner, TuneConfig, RunConfig

param_space = {
    "train_loop_config": {
        "lr": tune.loguniform(1e-4, 1e-2),
        "max_iter": tune.choice([200, 300, 500]),
        "ims_per_batch": tune.choice([2, 4, 8])
    }
}

tuner = Tuner(
    trainer,
    param_space=param_space,
    tune_config=TuneConfig(num_samples=10, metric="eval_mAP", mode="max"),
    run_config=RunConfig(name="detectron2_finetune_tuning")
)
results = tuner.fit()
best_config = results.get_best_result().config

Ray Serve vs FastAPI for Managing Training Requests

Ray Serve can integrate with FastAPI using the serve.ingress decorator, letting you use FastAPI’s routing while Ray Serve handles scaling:
from fastapi import FastAPI
from ray import serve

app = FastAPI()

serve.deployment(route_prefix="/train")
serve.ingress(app)
class TrainAPI:
    app.post("/")
    def trigger_training(self, config: dict):
        ray.remote(train_detectron2).remote(config)
        return {"status": "scheduled"}

ClearML Integration

ClearML provides experiment tracking, metric logging, and orchestration capabilities:
from clearml import Task, Logger

def train_detectron2(config):
    if ray.train.get_context().get_world_rank() == 0:
        task = Task.init(project_name="Detectron2-Ray",
                        task_name=f"train_{config['lr']}")
        task.connect(config)
        logger = task.get_logger()

    for iteration, metrics in training_loop:
        if ray.train.get_context().get_world_rank() == 0:
            logger.report_scalar("loss", "train", iteration, metrics["loss"])

    if ray.train.get_context().get_world_rank() == 0:
        task.close()
Each individual experiment during HPO should ideally generate its own task to allow independent comparison.

Dockerization and Multi-Container Deployment

Example Docker Compose for a Ray cluster:
version: "3.9"
services:
  ray-head:
    image: your_detectron2_ray_image:latest
    command: >
      ray start --head --port=6379 --dashboard-host=0.0.0.0 --block
    ports:
      - "8265:8265"
      - "6379:6379"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: ["gpu"]
  ray-worker:
    image: your_detectron2_ray_image:latest
    depends_on:
      - ray-head
    command: >
      ray start --address=ray-head:6379 --block
    deploy:
      replicas: 3
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: ["gpu"]
By combining Ray Train and Ray Tune, you achieve scalable training and automated hyperparameter optimization for Detectron2. Ray Serve offers a path to deploy this training pipeline as a service, and ClearML integration provides experiment tracking vital when running many experiments on distributed resources.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.