Imagine you are very nearsighted, so that you can see only the region locally near you, but can observe nothing globally, far away.
Now imagine you are situated at the top of a mountain, and you want to make your way down by foot to the lowest point in the valley. But you cannot see beyond your feet, so your algorithm is simply to keep heading downhill, wherever gravity takes you fastest. You do that, and eventually, half way down the mountain, you end up at the bottom of a small oval ditch or basin, from which all paths lead up. As far as you can nearsightedly tell, you’ve reached the lowest point. But it’s a local minimum, not truly the bottom of the valley, and you’re not where you need to be.
What would be a better algorithm for someone with purely local vision? Simulated annealing, inspired by sword makers.
To make a metal sword, you have to first heat the metal until it’s hot and soft enough to shape or mold. The trouble is that when the metal subsequently cools and crystallizes, it doesn’t do so uniformly. As a result, different parts of it tend to crystallize in different orientations (each of them a small local low-energy basin), so that the entire body of the sword consists of a multitude of small crystal cells with defects between them. This makes the sword brittle. It would be much better if the metal were one giant crystal with no defects. That would be the true low-energy configuration. How to get there?
Sword makers learned how to anneal the metal, a process in which they first heat it to a high temperature and then cool it down very slowly. Throughout the slow cooling, the sword maker continually taps the metal, imparting enough energy to the individual cells so that they can jump up from their temporary basin into a higher energy state and then drop down to realign with their neighbors into a communally more stable, lower energy state. The tap paradoxically increases the energy of the cell, moving it up and out of the basin and farther away from the true valley, which is ostensibly bad; but in so doing the tap allows the cell to emerge from the basin and seek out the lower energy state.
As the metal cools, as the tapping continues, more and more of the cells align, and because the metal is cooler, the tapping is less able to disturb cells from their newer and more stable positions.
In physics and mathematics one often has to find the lowest energy or minimum state of some complicated function of many variables. For very complicated functions, that can’t be done analytically via a formula; it requires an algorithmic search. An algorithm that blindly tried to head downwards could get stuck in a local minimum and never find the global minimum.
Simulated annealing is a metaphorical kind of annealing, carried out in the algorithmic search for the minima of such complicated functions. When the descent in the simulated annealing algorithm takes you to some minimum, you sometimes take a chance and shake things up at random, in the hope that by sometimes shaking yourself out of the local minimum and temporarily moving higher, (which is not where you ultimately want to be), you may then find your way to a lower and more stable global minimum.
As time passes, the algorithm decreases the probability of the shake-up, which corresponds to the cooling of the metal.
Simulated annealing employs judicious volatility in the hope that it will be beneficial. In an impossibly complex world, we should perhaps shun temporary stability and instead be willing to tolerate a bit of volatility in order to find a greater stability thereafter.