Skip to main contentIntroduction
We include below samples from the well known CIFAR-10 dataset with similar number of classes (K=10).
The performance verification will serve the purpose of showing what is the accuracy difference when the same model is applied to CIFAR-10 as compared to the Seamagine dataset. The datasets have the following differences:
-
Seamagine contains gray images with size [1,224,224] as compared to CIFAR-10 naturally colored images that have size [3,32,32].
-
Seamagine is an anomaly detection dataset where the vast majority of the images belong to the
PASS class as compared to CIFAR-10’s unique K=10 classes.
Implications
The differences have several implications:
-
The Seamagine image size constraints the batch size since we cannot fit many images into a single GPU VRAM and multiple GPUs must be used to match the batch size we can configure with CIFAR-10.
-
The learning rate is also affected by (1).
-
In general, the operating configuration of the same model as determined by Hyperparameter Optimization (HPO) algorithms is different between the two datasets.
Key Insight
Despite the above differences and implications the main value we add by doing this exercise is to highlight the substantial difference on model performance not from the technical perspective but from the semantic perspective.
As one can observe from the examples above, there is substantial difference in shape and texture between the K=10 CIFAR-10 classes and such difference is not present in the images of the Seamagine dataset.