Introduction
As outlined in the software architecture section, the purpose of the MEVP pipeline is to verify the model performance. This can be done with a variety of methods but it is common practice to verify the validity of the data and model training pipelines using other datasets. This serves a variety of purposes:- To iron out modeling bugs that may cause performance issues. If the model for a custom dataset performs sub-optimally, we use exactly the same model in a dataset that we already have performance results from tens of other references / publications and ask the model to replicate such results.
- To highlight issues with the data itself that are independent of the adopted modeling approach. After we replicate existing performance results, we then evaluate the sensitivity of the model performance by modifying the data itself.

