Skip to main content

ResNet-50

Since we are dealing with grayscale images we replicate the single channel to three channels to match the input size of the ResNet-50v2 model and avoid redesigning backbones.
ResNet-50v2 Architecture The timm (PyTorch Image Models) library is used to load the pretrained ResNet-50 model. The model summary shows:
  • Input: 3×224×2243 \times 224 \times 224 images (grayscale replicated to 3 channels)
  • Output: 2048-dimensional feature vector (before classification head)
  • Architecture: 50 layers with residual connections
  • Pretrained on: ImageNet (1000 classes)

Why ResNet-50?

ResNet-50 was chosen for several reasons:
  1. Proven effectiveness: Widely used in transfer learning applications
  2. Appropriate depth: Deep enough to learn hierarchical features without being overly complex
  3. Efficient inference: Reasonable computational requirements for edge deployment
  4. Available pretrained weights: Extensive pretraining on ImageNet provides strong general visual features

Feature Extraction

For anomaly detection, we use the ResNet-50 model as a feature extractor:
  1. Remove the final classification layer
  2. Extract features from the global average pooling layer
  3. Obtain a 2048-dimensional embedding for each input image
These embeddings are then used with UMAP for dimensionality reduction and kNN for anomaly scoring.