They may outwit Kasparov, but can machines ever be as smart as a three-year-old?
Learning has been at the center of the new revival of AI. But the best learners in the universe, by far, are still human children. In the last 10 years, developmental cognitive scientists, often collaborating with computer scientists, have been trying to figure out how children could possibly learn so much so quickly.
One of the fascinating things about the search for AI is that it’s been so hard to predict which parts would be easy or hard. At first, we thought that the quintessential preoccupations of the officially smart few, like playing chess or proving theorems—the corridas of nerd machismo—would prove to be hardest for computers. In fact, they turn out to be easy. Things every dummy can do like recognizing objects or picking them up are much harder. And it turns out to be much easier to simulate the reasoning of a highly trained adult expert than to mimic the ordinary learning of every baby. So where are machines catching up to three-year-olds and what kinds of learning are still way beyond their reach?
In the last 15 years we’ve discovered that even babies are amazingly good at detecting statistical patterns. And computer scientists have invented machines that are also extremely skilled at statistical learning. Techniques like "deep learning" can detect even very complicated statistical regularities in enormous data sets. The result is that computers have suddenly become able to do things that were impossible before, like labeling internet images accurately.
The trouble with this sort of purely statistical machine learning is that it depends on having enormous amounts of data, and data that is predigested by human brains. Computers can only recognize internet images because millions of real people have reduced the unbelievably complex information at their retinas to a highly stylized, constrained and simplified Instagram of their cute kitty, and have clearly labeled that image, too. The dystopian fantasy is simple fact, we’re all actually serving Googles computers, under the anesthetizing illusion that we’re just having fun with lol cats. And yet even with all that help, machines still need enormous data sets and extremely complex computations to be able to look at a new picture and say "kitty-cat!"—something every baby can do with just a few examples.
More profoundly, you can only generalize from this kind of statistical learning in a limited way, whether you’re a baby or a computer or a scientist. A more powerful way to learn is to formulate hypotheses about what the world is like and test them against the data. Tycho Brahe, the Google Scholar of his day, amalgamated an enormous data set of astronomical observations and could use them to predict star positions in the future. But Kepler’s theory allowed him to make unexpected, wide-ranging, entirely novel predictions that were well beyond Brahe’s ken. Preschoolers can do the same.
One of the other big advances in machine learning has been to formalize and automate this kind of hypothesis-testing. Introducing Bayesian probability theory into the learning process has been particularly important. We can mathematically describe a particular causal hypothesis, for example, say about how temperature changes in the ocean will influence hurricanes, and then calculate just how likely that hypothesis is to be true, given the data we see. Machines have become able to test and evaluate hypotheses against the data extremely well, with consequences for everything from medical diagnoses to meteorology. When we study young children they turn out to reason in a similar way, and this helps to explain just why they learn so well.
So computers have become extremely skilled at making inferences from structured hypotheses, especially probabilistic inferences. But the really hard problem is deciding which hypotheses, out of all the infinite possibilities, are worth testing. Even preschoolers are remarkably good at creating brand new, out-of-the-box concepts and hypotheses in a creative way. Somehow they combine rationality and irrationality, systematicity and randomness to do this, in a way that we still haven’t even begun to understand. Young children’s thoughts and actions often do seem random, even crazy – just join in a three-year-old pretend game sometime. This is exactly why psychologists like Piaget thought that they were irrational and illogical. But they also have an uncanny capacity to zero in on the right sort of weird hypothesis – in fact, they can be substantially better at this than grown-ups.
Of course, the whole idea of computation is that once we have a complete step-by step account of any process we can program it on a computer. And, after all, we know that there are intelligent physical systems that can do all these things. In fact, most of us have actually created such systems and enjoyed doing it too (well, at least in the earliest stages). We call them our kids. Computation is still the best, indeed the only, scientific explanation we have of how a physical object like a brain can act intelligently. But, at least for now, we have almost no idea at all how the sort of creativity we see in children is possible. Until we do, the largest and most powerful computers will still be no match for the smallest and weakest humans.