Model Evaluation & Verification Pipeline

Introduction
Benchmark Datasets

Introduction

As outlined in the software architecture section, the purpose of the MEVP pipeline is to verify the model performance. This can be done with a variety of methods but it is common practice to verify the validity of the data and model training pipelines using other datasets. This serves a variety of purposes:

To iron out modeling bugs that may cause performance issues. If the model for a custom dataset performs sub-optimally, we use exactly the same model in a dataset that we already have performance results from tens of other references / publications and ask the model to replicate such results.
To highlight issues with the data itself that are independent of the adopted modeling approach. After we replicate existing performance results, we then evaluate the sensitivity of the model performance by modifying the data itself.

Benchmark Datasets

CIFAR-10

Classification benchmark with 10 classes

MVTec-AD

Industrial anomaly detection benchmark

Edit this page on GitHub or file an issue.

Model Training Pipeline

CIFAR-10 Verification

Overview

Remote Sensing

Manufacturing QC

Model Evaluation & Verification Pipeline

Introduction

Benchmark Datasets

CIFAR-10

MVTec-AD

Overview

Remote Sensing

Manufacturing QC

​Introduction

​Benchmark Datasets

CIFAR-10

MVTec-AD

Introduction

Benchmark Datasets