A SEPARATE KIND OF INTELLIGENCE
ALISON GOPNIK: Everyone knows that Turing talked about the imitation game as a way of trying to figure out whether a system is intelligent or not, but what people often don’t appreciate is that in the very same paper, about three paragraphs after the part that everybody quotes, he said, wait a minute, maybe this is the completely wrong track. In fact, what he said was, "Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child?" Then he gives a bunch of examples of how that could be done.
For several years I’ve been pointing to that quote because everybody stops reading after the first section. I was searching at lunch to make sure that I got the quote right, and I discovered that when you Google this, you now come up with a whole bunch of examples of people saying that this is the thing you should be quoting from Turing. There’s a reason for that, which is that the explosion of machine learning as a basis for the new AI has made people appreciate the fact that if you’re interested in systems that are going to learn about the external world, the system that we know of that does that better than anything else is a human child.
One of the consequences of that, which is not so obvious, is thinking about children not just as immature forms who learn and grow into an adult intelligence, but as a separate kind of intelligence, which is implicit in the Turing quote. That fits with a lot of interesting ideas in evolutionary biology.
In evolutionary biology there’s increasing work on the idea of "life history," but if you talk to developmental psychologists, they’ve never even heard of it. Life history is the developmental trajectory of a species: how long a childhood it has, how long it lives, how much parental investment there is, how many young it produces. That general feature of what its life history is like is often much more explanatory of other features of the organism than things that might seem to be more apparent; in particular, a relationship that comes up again and again is a relationship between what we perhaps anthropomorphically think of as intelligence, things like being able to deal with many different kinds of environments, learn about them, and adapt to them effectively. That turns out to be very consistently related with a particular life history pattern, namely a life history in which there are few young, a very long period of immaturity and dependence, and a great deal of parental investment.
The strategy of producing just a few younger organisms, giving them a long period where they’re incapable of taking care of themselves, and then having a lot of resources dedicated to keeping them alive turns out to be a strategy that over and over again is associated with higher levels of intelligence. And that’s not just true for primates. You can see this in analyses of hundreds and hundreds of primates. It’s true for marsupials, for birds, for cetaceans, and it’s even true for insects. If you look at different subcategories of butterflies that depend more or less on learning, what you see is that they have a different developmental trajectory, such that the ones that depend on learning have a longer period of immaturity and produce fewer offspring. It turns out to even be true for plants and for immune systems.
Creatures that have more complex immune systems also have this longer developmental trajectory. It looks as if there’s a general relationship between the very fact of childhood and the fact of intelligence. That might be informative if one of the things that we’re trying to do is create artificial intelligences or understand artificial intelligences. In neuroscience, you see this pattern of development where you start out with this very plastic system with lots of local connection, and then you have a tipping point where that turns into a system that has fewer connections but much stronger, more long-distance connections. It isn’t just a continuous process of development. So, you start out with a system that’s very plastic but not very efficient, and that turns into a system that’s very efficient and not very plastic and flexible.
It’s interesting that that isn’t an architecture that’s typically been used in AI. But it’s an architecture that biology seems to use over and over again to implement intelligent systems. One of the questions you could ask is, how come? Why would you see this relationship? Why would you see this characteristic neural architecture, especially for highly intelligent species? We’re way out on the end of the distribution. Chimpanzee young are producing as much food as they’re consuming by the time they’re seven, but we humans aren’t doing that even in forager cultures until we’re fifteen, and we have much larger brains and much greater capacities for intelligence.
A good way of thinking about this strategy may be that it’s a way of resolving the explore-exploit tradeoffs that you see all the time in AI. One of the problems that you have characteristically in AI is that as you get a greater range of solutions that seem to be moving in the direction of a system that’s more intelligent, a system that understands the world in more different ways, what you also have is a big expansion of the search problem. If there are many more different things that you can do, how can you search through that space more effectively?
One way to solve that problem that comes out of computer science is to start out with a very wide-ranging exploration of the space, including parts that might turn out to be unprofitable, and then gradually narrow in on solutions that are going to be more effective. My slogan is that you could think about childhood as evolution’s way of doing simulated annealing. It’s evolution’s way of starting out with a very high temperature, broad search and then narrowing it. The problem with a high temperature search is that you could be spending a lot of time considering solutions that aren’t very effective, and if you’re considering solutions that aren’t effective, you aren’t going to be very good at effectively acting in the world, performing the four Fs and doing all the other things that we need to do as adults.
An interesting consequence of this picture of what intelligence is like is that many things that seem to be bugs in childhood turn out to be features. Literally and metaphorically, one of the things about children is that they’re noisy. They produce a lot of random variability. When I’m trying to explain the annealing idea to a general audience, I’ll say, "There are two ways of thinking about this system. Here’s a big box full of solutions, and you could be wildly bouncing around this box going from point to point and bouncing off the walls, or you could just be staying in one place and carefully exploring the space. Which one of those sounds like your four-year-old?" That randomness, variability, and noise—things that we often think of as bugs—could be features from the perspective of this exploratory space. Things like executive function or frontal control, which we typically think of as being a feature of adult intelligence—our ability to do things like inhibit, do long-term planning, keep our impulses down, have attentional focus—are features from the exploit perspective, but they could be bugs from the perspective of just trying to get as much information as you possibly can about the world around you.
Being impulsive and acting on the world a lot are good ways of getting more data. They’re not very good ways of planning effectively on the world around you. This gives you a different picture about the kinds of things you should be looking for in intelligence. It means that some of the things that have been difficult for AI to do—like creativity, being able to get to solutions that are genuinely new and not crazy—are things that human children are remarkably good at. In our empirical evidence, they're often better at it than human adults are.
You can have a lot of random search, or you can solve a problem that’s very highly constrained, but the combination of being able to solve problems that are highly constrained and search for solutions that are further away has been the most challenging problem for AI to solve. That’s a problem that children characteristically solve more effectively than adults.
There are some other consequences of thinking about this particular life history as a solution to intelligence. For example, one of the things that we know children do is get into everything, and one of the things that we know that adult scientists do is experiment. That's active learning, where you’re determining what your data sample is going to be and you’re literally and metaphorically expending energy on getting the right kind of data sample, one that will not only be useful but will be the exact kind of data that will cause you to change the current view that you have of the world. It's a very unusual thing to be able to do, to go out into the world and spend calories and energy in order to turn out to be wrong. That’s something that children very characteristically do, and if Danny Kahneman were here he could tell you adults very characteristically don’t do.
Another aspect of what children are doing that would be informative for thinking about intelligence in general, is that children are cultural learners. One of the effects of this life history for human beings, in particular, is that it gives us this capacity for cultural ratcheting. It gives us a way of balancing innovation and imitation. If all we did as a result of cultural learning was imitate exactly the things that the previous generation had done, there would be no point in having cultural learning. There’s constant tension between how much you are going to be able to build on the things that the previous generation has done and how much you are going to be producing something that’s new enough so it would be worth having the next generation imitate. Having this developmental trajectory where you start out with a broad exploration and then narrow in on exploiting particular solutions gives you a way of solving that problem in the context of cultural evolution.
There are other ways that you can do that even as an adult, like have an interdisciplinary conference, or give adults things to do that are new. Recently, I've been interested in looking at psychedelic chemicals, which seem to have the rather surprising effect of putting adult brains back into a state of plasticity that looks much more like childhood brains. So, the effect of psychedelic drugs neurally is that it increases the local connections and breaks the long-distance network connections. It literally induces plasticity and induces more synaptogenesis.
The ones that have been studied the most are psilocybin, LSD, MDMA, and ketamine, all of which have the same phenomenological properties. They also all turn out to have this same neural effect of driving the system back to something that looks more like childhood plasticity, which may be an interesting way of testing some of those ideas. It would be a good explanation for what otherwise seems very puzzling, which is that a small chemical change, at least by report, can lead people to have large changes in the ways that they see the universe or in the ways that they behave.
One of my slogans is that you could think about psychedelics as doing for the individual what childhood does for the culture: It takes a system that’s relatively rigid and injects a bunch of noise and variability into the system, shakes it out of its local optima and lets it settle into something new.
Thinking about learning, in terms of active learning—having computers that would go out and play, and explore, and get into things the way that young children are playing and exploring and getting into things—is a sense in which children might be a model for intelligence that’s different from the models of intelligence that we currently have. Thinking about systems that are learning from previous generations could be a model for intelligence that’s different from the models of intelligence that we currently have.
Thinking in this life history perspective, another thing that’s distinctive about human intelligence is that having a life history with a long childhood and a lot of caregiving changes our conceptions of moral relations. Our model for naturalizing morality has very much been a model of contracts, thanks to people like Robert. It’s been a model of having individual people who are more or less equal in their status and in their relation, who are trying to develop a contract that will lead to the best outcomes for both of them. If you think about both markets and democracy, those are essentially wonderful institutions and inventions that maximize that process of contract-making so that we don’t have to have face-to-face contracts to maximize our preferences.
If this picture is right, caregiving relations are absolutely key to having this life history. Every parent, no matter how bizarre or weird or crazy their child is, is committed to taking care of that child. That’s a very different kind of relation than a contractual relation. It’s asymmetric. Maybe your kids are going to take care of you when you’re old, but it’s not clear that they will, and that doesn’t seem to be the motivation behind the life history. There's something about protecting a next generation that can introduce variability into the system. They have this fundamental asymmetry and transparency about them, so when you’re attached to a baby, for example, it doesn’t matter very much. You don’t know very much about what the properties of the baby are. You don’t know whether that baby is going to turn out to be valuable or not. There’s just this transparent attachment that you have. It’s having that transparent attachment that lets you have the noise and variability and mess.
If you were only attached to your children because you thought they were going to come out really well, or you wanted the children to come out as well as they possibly could, the sensible thing to do would be to look out at the universe of children and find the ones that you felt were most likely to succeed and then have everybody put all their love and attention into those and let the other ones perish. That seems like a crazy system. Part of the reason why it’s a crazy system is because if you think children are the source of unpredictable variability all the time, then the moral commitment that you need to be able to allow unpredictable variability to thrive is not anticipating what the outcome of caring for that child. There’s a lot of human moral and political life that has that character of unconditional commitment to a person, or to a community, or to a nation, and there’s a puzzle about why those unconditional commitments give us a moral dimension that’s different from the tit-for-tat contractual moral commitments.
One of the things that is fascinating about the Macy Conferences is that they have some of the earliest studies of things like longitudinal language acquisition before language development was a discipline within the official disciplines of psychology. People in that group were doing that kind of work, which echoes things that people here have said: If you want a good account of intelligence, thinking about those developmental trajectories both in the literal sense of thinking about children and adults, but also thinking about developmental trajectories more generally—thinking about developmental trajectories over history, thinking about the ways that you could adjust to an environment over time—those are going to be a crucial piece of the story that’s missing from the kinds of accounts that we typically have now.
* * * *
NEIL GERSHENFELD: There are beautiful algorithms emerging in machine learning that nicely interpolate between simulated annealing and gradient descent. You were describing those as extremes, but what these algorithms do is start by sampling the space, from there you use it to make an estimate of the distribution they were drawn from, then you re-synthesize from the distribution you estimated and use that to re-estimate the distribution. The way those propagate is they start looking like simulated annealing but they end up looking like gradient descent. In a nice way, the model grows. It might be an interesting analogy.
ROBERT AXELROD: The gradient descent is a function of temperature. So, if you have a high temperature, you’re not doing gradient descent.
GERSHENFELD: What I’m describing is not simulated annealing. Simulated annealing is a simple thermodynamic model. Simulated annealing does a bad job of using local gradient information, which is the basis of back propagation in machine learning. What I’m describing is something that crosses over in an interesting way. You start by sampling a distribution broadly, then as you re-sample it you start to tighten the estimate, and then as you tighten the estimate you end up doing something that looks like gradient descent. One of the banes of simulated annealing is determining the cooling schedule and how to do the innovation. This is a very different way to answer it that crosses over between them.
AXELROD: I like the simulated annealing metaphor a lot, but what I was thinking is that you were describing it as, in simulated annealing terms, lowering the temperature. You broaden expiration and you lower the temperature. But simulated annealing itself typically raises and lowers the temperature on some schedule so you can jump out of the local optima. I was thinking, do children then become more plastic? Is adolescence is like that?
GOPNIK: One of the things that we have discovered empirically by looking at some of this is that if you look at physical problems, like trying to figure out how a machine works, what you see is something that looks like high flexibility and high search early on, and then it drops around school age and stays the same. Maybe it's that debate about whether that's the effect of school or the effect of school age. It’s probably the effect of school age. It stays the same and then drops in adolescence. If you take a social problem, what happens is that you get the most flexibility in adolescence. Preschoolers are very flexible, then you decline. Adults are not very flexible. Adolescents are showing this peak that fits the neural evidence about plasticity, specifically in social areas. Presumably something like graduate school is a way of doing the same thing, plunking people into a situation in which they’re forced into increased plasticity.
TOM GRIFFITHS: So, if I understand your argument, graduate school is like taking LSD?
GOPNIK: Yes, at its best. Sometimes those two things are combined, but the general idea would be that being put in a space in which the usual exploit strategies that you’ve learned are not effective has some of the same effects. That’s true. The vividness of phenomenology, the vividness of experience, the emotional lability, which is characteristic of preschoolers, of people on psychedelics, of going to the center at Stanford for a year—those things are not just a joke, they're connected to one another.
I’ve talked to a lot of people who are doing machine learning, and what they typically say is, "Yes, we use annealing schedules all the time, but that’s one of those artisanal things." There aren't general proofs about the way that the annealing schedule should work, about what's more effective in this context, about a general principle—aside from the general optimization idea, or the general getting out of a local minimum idea. Those things don’t seem to be understood in a coherent theoretical way.
GERSHENFELD: They’re not. I’ll give you some references to these algorithms I’m mentioning in response to that.
SETH LLOYD: May I share Wired’s best scientific graphic of 2015 with you? This is from a paper by friends of mine. We’re doing these quantum algorithms for topological analysis of data. This is called "Homological Analysis of Brain Function." They took functional MRI data of the brain. The one on the left basically shows clusters of thought processes. There are about seven highly clustered processes, and they’re talking between them with little links like this. Then the other one on the right is the same group of people having taken psilocybin. Let me just summarize if you haven’t seen the picture of it. When you take psilocybin it’s like, "Wow, everything is connected man."
GOPNIK: It is literally true that if you look at the developmental neuroscience literature, you essentially see that graph but going in the other direction. What you see is lots and lots of local connection. And this is boilerplate. One of the few things we know about developmental neuroscience is that you start out with lots of local connectivity and then as time goes on you get segregation. That graphic is part of why I’m making this argument.
RODNEY BROOKS: Getting back to life history, you were painting a very broad picture—humans and great apes— but other animals also have the characteristic that it’s not that the group of children all happen at the same time, progress in the same period, and then go away. In human families over a period of time, there’s a lot of sibling rivalry and learning from siblings. That happens in most great apes. Does it happen in other animals, too, or is that unique?
GOPNIK: One of the things that seems to be different is that in most animals you have a clutch of young all at once, so they’re all on the same developmental progression.
If you look at the parental investment side, humans have pair bonding and alloparenting, including siblings being involved in care. The fact that you’ve got distributed siblings means that older siblings are involved in a lot of caregiving. They have postmenopausal grandmothers and of course they have biological mothers—it’s those adaptations that you see in individual species. But I don’t think there’s any other species that has all three of them: pair bonding, alloparenting, and postmenopausal grandmothers and grandfathers. Well, grandfathers are more complicated because they’re not postmenopausal, but you have this extra twenty years essentially that people are investing in care. Not only do you have much more caregiving, but you have much more distributed caregiving, if part of the picture is supposed to be this picture of introducing a burst of noise, as it were, in each cultural generation. Part of getting that noise is just having these noisy children, but part of it is also the fact that very different people with different knowledge are giving them different kinds of information and models about what the culture is like.
GEORGE DYSON: Young killer whales are educated by their grandmothers.
GOPNIK: Yes, killer whales are a wonderful example of this. When I was talking about life history, I said we are also the only species that has postmenopausal grandmothers, except killer whales. Killer whales, go figure. It turns out that killer whales also have more culture than even other smart cetaceans, not just the adaptation to intelligence but the adaptation to culturally transmitted intelligence seems to be connected to this second-generation transmission.
G. DYSON: One of the grandmothers off Vancouver Island just died. She was 105.
GOPNIK: There’s some pretty good evidence that because the young aren’t dispersing for killer whales as they are with other cetaceans, the existence of the grandmother is changing the survival rates for the children and even for the grandchildren. So, when the grandmothers die, that affects the entire community.
There’s good anthropological evidence that among human foragers things like myths, and songs and stories—things you might think of as giving you some of the high-level dimensions of what a culture has discovered—that transmission comes from grandparents to grandchildren. It skips parents in forager cultures, so parents are busy telling you what you should do specifically to hunt in a particular place, but the big ideas about what we’ve discovered about the world in general are coming from grandparents to grandchildren and skipping over the parents.
JOHN BROCKMAN: What if the grandparents are dead?
GOPNIK: Well, the older generation, in general. In forager cultures, it’s going to be the fifty to seventy-year-olds. The comforting just-so story is that if you believe that, then remembering the things that happened yesterday if you’re a grandparent is not going to be very useful because the kids already know that, and the parents can know that. Being able to talk a lot about the things that happened to you when you were very young, that’s the stuff that you want if you’re going to transmit information appropriately to children. As someone with an aging memory, I find this to be extremely comforting.
GERSHENFELD: In numerical methods, those algorithms lead to two diverging interspersed sets, we could argue. If you have these grandparents and those grandchildren, and then these grandparents and those grandchildren, but you get two interspersed sets. They begin to diverge and that, you could argue, might be why we have generations that alternate.
W. DANIEL HILLIS: Presumably, the annealing schedule for the human mind is optimized not just for the learning phase. We also have the role of being teachers and caretakers. For instance, it may be that it’s better to turn off learning language when you’re trying to teach a child language. That's Marvin Minsky’s theory of why it got hard to learn languages when you’re an adult.
The interesting thing with machine intelligences is that the modes of transmitting information might be completely different. In some sense, we’ve got a kludgy method of transferring knowledge from our mind into our children’s minds. Certainly, with many representations, in machine knowledge there are much more efficient ways of doing that. In some sense, a machine can be born with all the experience of the previous generations of machines. I’m curious if you think that would radically change the annealing schedule of a machine.
GOPNIK: The proposal would be that a machine that was doing that without loss and without noise would be bad. What you want is for each generation, as you're getting the information from the previous machine, you’d also want to introduce a bunch of extra noise and variability.
HILLIS: That might be true, but I don’t see why that follows. You don’t have that option in the human method of transmitting knowledge because there’s no mechanism by which you could transmit the knowledge through birth.
GOPNIK: We know something about some of the mechanisms of transmission, and there’s this interesting debate in the cultural evolution community about this phenomenon called over-imitation. It seems to be very characteristically human that when we're imitating what another human does, we imitate even fine level details we don’t need to imitate, things that aren’t obviously relevant to the activity that the person is performing. You can take chimps and children and have someone perform a whole bunch of complicated bells and whistles to bring about a particular kind of effect, and the chimps will read through to what the actual problem is that you’re trying to solve, but the kids will put in the bells and whistles. Presumably, computers could do both, so the next generation of computers could simply take all the details about what the previous generation had done, but you’d end up with overfitting problems. That’s a classic overfitting problem.
HILLIS: Wouldn’t it be equivalent to just having somebody with a lot more experience and a lot more cases that they would learn from? That doesn’t necessarily mean you’re overfit. That’s a different issue.
GOPNIK: Again, this is where it would be nice to have people working out the computer science to explain what you would expect to have happen in those conditions versus other conditions.
HILLIS: But my gut feeling is with more information, it would be better.
GOPNIK: Well, I’m not sure that that’s true. Again, what might be happening is that having more information is just going to narrow the space of new solutions that you’re going to search. Right?
BROOKS: The world has changed.
GOPNIK: Yeah, exactly. The world is changing.
HILLIS: Well, okay, that is another issue, but you can know that the world has changed, too, so you can weight them with time or something like that.
GOPNIK: I just saw a really wonderful paper about this is. There’s good evidence, in birds for example, that environmental variability is a trigger for these life history changes, especially environmental variability within the lifetime of the organism, which seems to be the thing that triggers a long life history versus a shorter life history.
HILLIS: Put another way, I don’t doubt that different information has different amounts of relevance. It seems unlikely to me that the information that was available to you right from the moment you were born happens to be just exactly the best set of information. It’s much more likely that it's being able to choose from a wider set of information and weighting it appropriately, which is not an option with human children.
GOPNIK: Why is it not an option in human children?
HILLIS: Because the experiences weren’t recorded in a way that they can in some sense retrain on them.
CAROLINE JONES: But they’re out having experiences. They’re out in the world having experiences. That was your big box metaphor. They’re not constrained to an information transfer from the parents. There are also agents in the world.
PETER GALISON: It's interesting to look at what happens within a discipline like physics where you can have a group of people, like the people who formed quantum mechanics—Niels Bohr and Heisenberg—and one of the things that they did over the course of their life was to come back over and over again to the hope that an extreme excursion from what was known was what they needed. For example, wanting to give up the conservation of energy, as Bohr did on two or three different occasions. In 1935, Heisenberg said we need a new revolution to understand why some things that look like electrons could penetrate a lot of lead and others couldn’t. It turned out you just had to stick with the physics they knew and work it out, and it turned out that there’s just a heavier version of the electron.
Heisenberg thought there were revolutions all the way up past the war, and he got the young German physicists after World War II into a whole mess of trouble because they were departing from productive physics. I just say this because if the opposite of a trauma is a traum, they had this dream experience as young people, Bohr in 1913 and then later, and Heisenberg when he was practically a kid in 1924, ’25, ’26. Then they kept looking for that again over and over again.
You can have the consequences of overconservatism being growing old and not being willing to meet new ideas in some way to have a high enough temperature of excursion in the annealing process, but if you design the computer that was always making huge excursions, you’d be in a world of hurt intellectually. One of the problems is, how do you know contextually whether it’s time for a high temperature or a low temperature?
GOPNIK: This is relevant to what Robert was talking about: Exactly how do you balance those things across a scientific discipline? In a way, evolution gives it to us for free with childhood because children aren't sitting there saying, "In this context, should we be exploratory or not? Is this insane imagined fantasy going to turn out to be useful in the long run or not?" They just do it. That’s just the way that they’re designed. When you have social institutions that are trying to do the same thing, trying to balance those things, or when you’re trying to design a computer algorithm, then the question about whether there are contextual cues that you could use gets to be a relevant problem. There’s a little bit of work in the developmental area about these "live fast, die young" life history strategies, even within a species, versus having a long extended exploratory period strategy. There’s a lot of debate. It’s not obvious.
One thing is when the environment is variable in particular kinds of ways over particular time scales, it’s an advantage to explore. It looks as if when you’ve got a lot of resources, then you can afford to have more exploration, which you can’t when you have fewer resources. There’s some evidence that kids who are under stress or maturing animals that are under stress mature more quickly. That’s also underresearched, and the intuitions that you have don’t necessarily translate into what happens when you do the math.
DAVID CHALMERS: This is super domain relative. There are critical periods for language learning, early, and then for music appreciation much later, like when you’re eighteen or something. So, the annealing has to be domain relative. I guess what I’m wondering is whether there are domains where kids are super-conservative, non-exploratory.
GOPNIK: Kids have a single utility theory, which is, "Be as cute as you possibly can be," and they’re extremely good at maximizing that utility. No other utility function is relevant to you if you’re a kid, but it turns out that being as cute as you possibly can be is not trivial. Having a caregiver environment that’s highly stable and predictable, when you don’t have to do any cognitive work in terms of wondering whether you’re going to be taken care of or not, that’s something that’s not transparent or easy. That’s a context where children are extremely conservative. When it comes to their parents, they don’t want variability. They don’t want change. They don’t want to noise. They’re very conservative about that.
BROCKMAN: We’ll leave it with "Be as cute as you can possibly be."