Introduction
As we discussed in the pretrained CNN section, the aim is to bring the high-dimensional vector data from the CNN in a lower -dimensional space (a hyperparameter) and to use the -dim features for the downstream tasks. To optimize the hyperparameter of what should be the best -dim space we use kNN to classify the images using the accuracy of the classification as a metric for the adopted grid search optimization approach. Typically kNN is implemented inside a Qdrant vector database but the analysis uses an exact kNN algorithm without any dependency on the vector database.Visualizing Embeddings
Please note that in the plots a lexicographic mapping is used to map the labels (
PASS and FAIL) to the colors.UMAP kNN Results
The kNN classifier is applied to the UMAP-reduced embeddings with various values of and embedding dimensions . The results show:- Best Performance: AU-ROC of approximately 0.97 achieved with UMAP dimensionality reduction paired with kNN using majority voting
- Optimal : Values between 3-7 neighbors typically perform best
- Optimal : Embedding dimensions between 10-50 provide good discrimination while maintaining computational efficiency

