Skip to main content

ResNet-50

Since we are dealing with grayscale images we replicate the single channel to three channels to match the input size of the ResNet-50v2 model and avoid redesigning backbones.
ResNet-50v2 Architecture The timm (PyTorch Image Models) library is used to load the pretrained ResNet-50 model. The model summary shows:
  • Input: 3×224×2243 \times 224 \times 224 images (grayscale replicated to 3 channels)
  • Output: 2048-dimensional feature vector (before classification head)
  • Architecture: 50 layers with residual connections
  • Pretrained on: ImageNet (1000 classes)

Why ResNet-50?

ResNet-50 was chosen for several reasons:
  1. Proven effectiveness: Widely used in transfer learning applications
  2. Appropriate depth: Deep enough to learn hierarchical features without being overly complex
  3. Efficient inference: Reasonable computational requirements for edge deployment
  4. Available pretrained weights: Extensive pretraining on ImageNet provides strong general visual features

Feature Extraction

For anomaly detection, we use the ResNet-50 model as a feature extractor:
  1. Remove the final classification layer
  2. Extract features from the global average pooling layer
  3. Obtain a 2048-dimensional embedding for each input image
These embeddings are then used with UMAP for dimensionality reduction and kNN for anomaly scoring.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.