Fine-tuning a Detectron2 model (pretrained on COCO) can be accelerated and scaled using Ray’s ecosystem: Ray Train for distributed training, Ray Tune for hyperparameter search, and Ray Serve for serving or orchestrating training requests. Throughout the process, ClearML can track experiments, metrics, and models.
Distributed Training with Ray Train (Multi-GPU/Node)
Ray Train provides a simple interface to distribute PyTorch training across multiple GPUs and even multiple nodes. Instead of using Detectron2’s built-in launcher, you can leverage Ray’s TorchTrainer to run the training loop on several parallel workers.
Key steps for using Ray Train:
- Define a training function that sets up Detectron2’s configuration and runs a training loop
- Initialize Ray and create a
TorchTrainer with ScalingConfig
- Ensure distributed training is properly configured for Detectron2
import ray
from ray.train.torch import TorchTrainer
from ray.train import ScalingConfig
def train_detectron2(config):
import torch, detectron2
from detectron2.config import get_cfg
from detectron2.engine import DefaultTrainer
from detectron2.utils.comm import create_local_process_group
# 1. Setup distributed process group
world_size = ray.train.get_context().get_world_size()
if world_size > 1:
create_local_process_group(num_workers_per_machine=world_size)
# 2. Register the dataset
from detectron2.data import DatasetCatalog, MetadataCatalog
DatasetCatalog.register("custom_train", lambda: prepare_dataset(config["hf_dataset_name"]))
MetadataCatalog.get("custom_train").set(thing_classes=["..."])
# 3. Load base config and update it
cfg = get_cfg()
cfg.merge_from_file(config["model_cfg_path"])
cfg.DATASETS.TRAIN = ("custom_train",)
cfg.SOLVER.BASE_LR = config["lr"]
cfg.SOLVER.MAX_ITER = config["max_iter"]
# 4. Train using Detectron2's Trainer
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
return {"eval_mAP": metrics.get("bbox/AP", 0)}
ray.init()
trainer = TorchTrainer(
train_loop_per_worker=train_detectron2,
train_loop_config={
"hf_dataset_name": "user/dataset",
"model_cfg_path": "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml",
"lr": 0.00025, "max_iter": 300, "num_classes": 1
},
scaling_config=ScalingConfig(num_workers=4, use_gpu=True)
)
result = trainer.fit()
Hyperparameter Tuning with Ray Tune
Ray Tune can automate hyperparameter optimization by running multiple trials of the training function with different hyperparameters.
from ray import tune
from ray.tune import Tuner, TuneConfig, RunConfig
param_space = {
"train_loop_config": {
"lr": tune.loguniform(1e-4, 1e-2),
"max_iter": tune.choice([200, 300, 500]),
"ims_per_batch": tune.choice([2, 4, 8])
}
}
tuner = Tuner(
trainer,
param_space=param_space,
tune_config=TuneConfig(num_samples=10, metric="eval_mAP", mode="max"),
run_config=RunConfig(name="detectron2_finetune_tuning")
)
results = tuner.fit()
best_config = results.get_best_result().config
Ray Serve vs FastAPI for Managing Training Requests
Ray Serve can integrate with FastAPI using the serve.ingress decorator, letting you use FastAPI’s routing while Ray Serve handles scaling:
from fastapi import FastAPI
from ray import serve
app = FastAPI()
serve.deployment(route_prefix="/train")
serve.ingress(app)
class TrainAPI:
app.post("/")
def trigger_training(self, config: dict):
ray.remote(train_detectron2).remote(config)
return {"status": "scheduled"}
ClearML Integration
ClearML provides experiment tracking, metric logging, and orchestration capabilities:
from clearml import Task, Logger
def train_detectron2(config):
if ray.train.get_context().get_world_rank() == 0:
task = Task.init(project_name="Detectron2-Ray",
task_name=f"train_{config['lr']}")
task.connect(config)
logger = task.get_logger()
for iteration, metrics in training_loop:
if ray.train.get_context().get_world_rank() == 0:
logger.report_scalar("loss", "train", iteration, metrics["loss"])
if ray.train.get_context().get_world_rank() == 0:
task.close()
Each individual experiment during HPO should ideally generate its own task to allow independent comparison.
Dockerization and Multi-Container Deployment
Example Docker Compose for a Ray cluster:
version: "3.9"
services:
ray-head:
image: your_detectron2_ray_image:latest
command: >
ray start --head --port=6379 --dashboard-host=0.0.0.0 --block
ports:
- "8265:8265"
- "6379:6379"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: ["gpu"]
ray-worker:
image: your_detectron2_ray_image:latest
depends_on:
- ray-head
command: >
ray start --address=ray-head:6379 --block
deploy:
replicas: 3
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: ["gpu"]
By combining Ray Train and Ray Tune, you achieve scalable training and automated hyperparameter optimization for Detectron2. Ray Serve offers a path to deploy this training pipeline as a service, and ClearML integration provides experiment tracking vital when running many experiments on distributed resources.