- Instead of the data, now symbolizes the hyperparameters we try at each iteration.
- Instead of the label, now symbolizes the loss we get at each iteration.
Hyperparameter Optimization
Bayesian Optimization
In Bayesian regression or classification, we start with a prior over the parameters of the and form the initial posterior using the Bayes rule. As the data are sequentially observed, the posterior is iteratively updated using the previous posterior and the new likelihood function after receiving new data points.
We can then form the predictive distribution and make predictions given new values of x:
Bayesian regression / classification is more complex than a plain MLE regression / classification. No matter which of the two methods we use, we will always have hyperparameters in the learning process and evaluating the average loss is too expensive to obtain as it can only be obtained after training/validation/test runs for each permutation of hyperparameters.
When we have such as setting, we can apply the Bayesian Optimization (BO) approach that builds a probabilistic model of the underlying loss function. We can make the following analogies:

