We would like to optimize the Bolognese quality by tweaking the time we cook it. As we saw above it’s not a nice function to optimize which makes it excellent for hard core testing algorithms for robustness and ability to avoid local maxima.
Since there are many optimization methods and algorithms I will try to do a benchmark on the most common ones and the ones I like to use. This is not an exhaustive list and there may be many more worthy of adding, but this is the list I know. Hopefully it can guide you in your choices moving forward.
The algorithms we will look into are primarily implemented in
NLopt except the Simulated annealing which comes from the standard
optim function in
R. Either way the list of algorithms are:
The algorithms we will look into are primarily implemented in
NLopt except the Simulated annealing which comes from the standard
optim function in
R. Either way the list of algorithms are:
- Simulated Annealing (SANN)
- Improved Stochastic Ranking Evolution Strategy (ISRES)
- Constrained Optimization BY Linear Approximations (COBYLA)
- Bound Optimization BY Quadratic Approximation (BOBYQA)
- Low-storage Broyden–Fletcher–Goldfarb–Shanno algorithm (LBFGS)
- Augmented Lagrangian algorithm (AUGLAG)
- Nelder-Mead Simplex (NELDERMEAD)
- Sequential Quadratic Programming (SLSQP)
As some algorithms (or even most of them) may depend heavily on initial values and starting points we run 500 optimization with randomized starting points for all algorithms. Each algorithm receive the same starting point for each iteration. Thus, each algorithm gets to start it’s optimization in 500 different locations. Seems like a fair setup to me.
Then let’s get to the results. Below you’ll see a box and whiskers plots of the Optimal Bolognese quality found by each algorithm among the 500 trials. There’s no doubt that all are pretty ok. However, the Augmented lagrangian, Simulated annealing and BFGS are the worst by far. They are also the ones who are the most incosistent. This basically means that they are very sensitive to local maxima and cannot explore the entire landscape properly. This may not be too surprising as they are based on gradient information. Second order or not, it’s still gradient information and that my friends is always local by nature.