Marvin Minsky [5.1.96]

Roger Schank: Marvin Minsky is the smartest person I've ever known. He's absolutely full of ideas, and he hasn't gotten one step slower or one step dumber. One of the things about Marvin that's really fantastic is that he never got too old. He's wonderfully childlike. I think that's a major factor explaining why he's such a good thinker. There are aspects of him I'd like to pattern myself after. Because what happens to some scientists is that they get full of their power and importance, and they lose track of how to think brilliant thoughts. That's never happened to Marvin.


MARVIN MINSKY is a mathematician and computer scientist; Toshiba Professor of Media Arts and Sciences at the Massachusetts Institute of Technology; cofounder of MIT's Artificial Intelligence Laboratory, Logo Computer Systems, Inc., and Thinking Machines, Inc.; laureate of the Japan Prize (1990), that nation's highest distinction in science and technology; author of eight books, including The Society of Mind (1986). 

Marvin Minsky's Edge Bio Page

[Marvin Minsky:]Like everyone else, I think most of the time. But mostly I think about thinking. How do people recognize things? How do we make our decisions? How do we get our new ideas? How do we learn from experience? Of course, I don't think only about psychology. I like solving problems in other fields — engineering, mathematics, physics, and biology. But whenever a problem seems too hard, I start wondering why that problem seems so hard, and we're back again to psychology! Of course, we all use familiar self-help techniques, such as asking, "Am I representing the problem in an unsuitable way," or "Am I trying to use an unsuitable method?" However, another way is to ask, "How would I make a machine to solve that kind of problem?"
A century ago, there would have been no way even to start thinking about making smart machines. Today, though, there are lots of good ideas about this. The trouble is, almost no one has thought enough about how to put all those ideas together. That's what I think about most of the time.

The technical field of research toward machine intelligence really started with the emergence in the 1940s of what was first called cybernetics. Soon this became a main concern of several different scientific fields, including computer science, neuropsychology, computational linguistics, control theory, cognitive psychology, artificial intelligence — and, more recently, the new fields called connectionism, virtual reality, intelligent agents, and artificial life.

Why are so many people now concerned with making machines that think and learn? It's clear that this is useful to do, because we already have so many machines that solve so many important and interesting problems. But I think we're motivated also by a negative reason: the sense that our traditional concepts about psychology are no longer serving us well enough. Psychology developed rapidly in the early years of this century, and produced many good theories about the periphery of psychology — notably, about certain aspects of perception, learning, and language. But experimental psychology never told us enough about issues of more central concern — about thinking, meaning, consciousness, or feeling.

The early history of modern psychology also produced some higher-level conceptions that promised — at least in principle — to explain a great deal more. These included, for example, the kinds of theories proposed by Freud, Piaget, and the gestalt psychologists. However, those ideas were too complex to study by observing the behavior of human subjects under controlled conditions. So, because there was no way to confirm or reject such ideas, these potentially more adequate theories found themselves beyond the fringe of what most researchers considered to be the proper domain of science. Today, though, our computers are powerful enough to simulate such artificial minds, provided only that we describe them clearly enough for our programmers to program them.

The first modern computers arrived around 1950, but it wasn't until the 1960s, with the arrival of faster machines and larger memories, that the field of artificial intelligence really began to grow. Many useful systems were invented, and by the late 1970s these were finding applications in many fields. However, even as those applied techniques were spreading, the theoretical progress in that field began to slow, and I found myself wondering what had gone wrong — and what to do about it. The main problem seemed to be that each of our so-called "expert systems" could be used only for some single, specialized application. None of them showed what a person would call general intelligence. None of them showed any signs of having what we call common sense.

For example, some people developed programs that were good at playing games like chess. Other programs were written to prove certain kinds of theorems in mathematics. Others were good at recognizing various kinds of visual patterns — printed characters, for example. But none of the chess programs had any ability to recognize text, nor could the recognition systems prove any theorems, nor could the theorem-proving machine play competently at chess. Some people suggested that we could make our machines more versatile by somehow fusing those programs together into a more integrated whole, but no one had any good ideas about how that could be done. Today, it's still almost impossible to get any two different AI programs to cooperate.

This is precisely the problem I tried to solve in my book The Society of Mind. We'd already run into this problem in the 1960s, when Seymour Papert and I started a project to build vision-guided manipulators — that is, robots with eyes and hands. To make a computer that could "see" things, it would need ways to recognize the appearances of objects and the relations between them, but how could you make a computer do such things? Well, at first, each of our researchers tried to develop some particular method. One such project might first try to find the edges of an object and then piece these together into a whole. The trouble was that usually some of the edges were hidden, or were of low optical contrast, or were not really the edges of the object itself but of some decoration on its surface. So edge-finding never worked well enough. Another researcher might try to locate, instead, the surfaces of each object — perhaps by classifying their textures and shadings. This method, too, would sometimes work, but never well enough to be dependable. Eventually, we concluded that no particular method would ever work well enough by itself and we'd have to find ways to combine them.

Why was it so difficult to combine different methods inside the same computer system? Because, I think, our community had never tried to think that way. To do such a thing seemed almost immoral — against the very spirit of "good programming." The accepted paradigm was, and still is, to find a good method for doing the job, and then work on it until you've removed the last bug! Sounds sensible, doesn't it? But eventually we had to conclude that it's a basically wrong idea. After all, even if you did manage to completely debug a program for some particular application, eventually someone would want to use it for some other purpose, in a new environment — and then new bugs would surely appear.

This has been the universal experience with computer programs. In fact, programmers are always joking about this; they often talk about "software rot," which is when a program has been working perfectly for years and then begins to make mistakes, although supposedly nothing has changed. Even today, programmers spend most of their time at trying to make programs work perfectly. The result has been a pervasive trend toward making everything more precise — to make programming into a science instead of an art, by doing everything with perfect logical precision. In my view, this is a misguided idea. What, after all, does it mean for anything to work perfectly? The very idea makes sense only in a rigid, unchanging, completely closed world, like the kinds that theorists make for themselves. Indeed, we can make flawless programs to work on abstract mathematical models based on assumptions that we specify once and then never change. The trouble is that you can't make such assumptions about the real world, because other people are always changing things.

Eventually we made that first robot more robust by installing a variety of methods, but neither it or any of its modern descendants ever were really reliable. We concluded that to make such systems more resourceful and dependable — in a word, more lifelike — we'd better try to understand how human minds manage to rarely get stuck. What are the differences between human thinking and what computers do today? To me, the most striking difference is how almost any error will completely paralyze a typical computer program, whereas a person whose brain has failed at some attempt will find some other way to proceed. We rarely depend upon any one method. We usually know several different ways to do something, so that if one of them fails, there's always another. For example, you can recognize your friends not just by their facial features but by the sound of their voices, by their posture, their gait, their hair color. Given all that variety, we rarely need to perfectly debug any single method. Instead, we learn to recognize the situations in which each of them usually works, and we also learn about conditions in which a method is likely to fail to work. And if all those fail, we can always try to invent a completely new approach.

A century ago, Sigmund Freud was already emphasizing the importance of "negative expertise" — of having knowledge about what not to do, a subject that has been entirely ignored by computer scientists and their programmers. Freud talked about censors and other mechanisms that keep us from doing things we've learned to avoid. I suspect that such systems have evolved in our brains, and that a typical brain center is equipped from birth with several different learning mechanisms, some of which accumulate negative knowledge. Thus, one part might accumulate knowledge about when to activate a particular method, another learns to construct "suppressors" or "censors" that can oppose the use of each method, and yet other mechanisms learn what to do when two or more methods come into conflict.

It really is a simple idea — that our minds have collections of different ways to do each of the things they do. Yet this challenges our more common and more ancient ideas about what we are and how we work. In particular, we all share the notion that inside each person there lurks another person, which we call "the self" and which does our thinking and feeling for us: it makes our decisions and plans for us, and later approves, or has regrets. This is much the same idea that Daniel Dennett, who is arguably the best living philosopher of mind, calls the Cartesian Theater — the universal fancy that somewhere deep inside the mind is a certain special central place where all mental events finally come together to be experienced. In that view, all the rest of your brain — all the known mechanisms for perception, memory, language processing, motor control — are mere accessories, which your "self" finds convenient to use for its own inner purposes.

Of course, this is an absurd idea, because it doesn't explain anything. Then why is it so popular? Answer: Precisely because it doesn't explain anything! This is what makes it so useful for everyday life. It helps you stop wondering why you do what you do, and why you feel how you feel. It magically relieves you of both the desire and the responsibility for understanding how you make your decisions. You simply say, "I decided to," and thereby transfer all responsibility to your imaginary inner self. Presumably, each person gets this idea in infancy, from the wonderful insight that you yourself are just another person, very much like the other people you see around you. On the positive side, that insight is profoundly useful in helping you to predict what you, yourself, are likely to do, based on your experience with those others.

The trouble with the single-self concept is that it's an obstacle to developing deeper ideas when we really do need better explanations. Then, when our internal models fail, we're forced to look elsewhere to seek help and advice about what to do in our real lives. Then we find ourselves going to parents, friends, or psychologists, or resorting to those self-help books, or falling into the hands of those folks who claim to have psychic powers. We're forced to look outside ourselves, because that single-self mythology doesn't account for what happens when a person experiences conflicts, confusions, mixed feelings — or for what happens when we enjoy pleasure or suffer pain, or feel confident or insecure, or become depressed or elated, or repelled or infatuated. It provides no clues about why we can sometimes solve problems but other times have trouble understanding things. It doesn't explain the natures of either our intellectual or our emotional reactions — or even why we make that distinction.

What are emotions, anyway? I am developing larger-scale theories of what emotions are, how they work, and how we might learn to control and exploit them. Psychologists have already proposed many smaller theories about different aspects of the mind, but no one since Freud has proposed plausible explanations of how all those systems might interact.

You might ask why this new theory should work, when so many other attempts to explain emotions have failed. My answer is that almost all other such attempts have been looking in the wrong direction. The psychological community suffers from a severe case of physics envy. They've all been searching for some minimal set of basic principles of psychology, some very small collection of amazingly powerful ideas that, all by themselves, can explain how the mind works. They'd like to imitate Isaac Newton, who discovered three simple laws of motion which solved an entire world of problems about mechanics. My method is just the opposite.

The brain's functions simply aren't based on any small set of principles. Instead, they're based on hundreds or perhaps even thousands of them. In other words, I'm saying that each part of the brain is what engineers call a kludge — that is, a jury-rigged solution to a problem, accomplished by adding bits of machinery wherever needed, without any general, overall plan: the result is that the human mind — which is what the brain does — should be regarded as a collection of kludges. The evidence for this is perfectly clear: If you look at the index of any large textbook of neuroscience, you'll see that a human brain has many hundreds of parts — that is, subcomputers — that do different things. Why do our brains need so many parts? Surely, if our minds were based on only a few basic principles, we wouldn't need so much complexity.

The answer is that our brains didn't evolve in accord with a few well-defined rules and requirements. Instead, we evolved opportunistically, by selecting mutations that favored our survival under the conditions and constraints of many different environments, over the course of at least half a billion years of variation and selection. What, precisely, do all those parts do? We're only beginning to find this out. I suspect that we have much more to learn. When we're all done we'll have found out that many of those mental organs have evolved to correct deficiencies of old ones — that is, deficiencies that did not appear until we got so much smarter. It's characteristic of evolution that after many new structures have developed, it's too late to go back and make much change in those older systems on which we still depend.

In a situation like that, it can be a mistake to focus too much on searching for basic principles. More likely, the brain is not based on any such scheme but is, instead, a great jury-rigged combination of many gadgets to do different things, with additional gadgets to correct their deficiencies, and yet more accessories to intercept their various bugs and undesirable interactions — in short, a great mess of assorted mechanisms that barely manage to get the job done.

When I was a kid, I was always compelled to find out how things worked, and I used to dissect all available machinery. I grew up in New York City. My father was an ophthalmologist, an eye surgeon, and our household was always full of interesting friends and visitors — scientists, artists, musicians, and writers. I read all sorts of books, but the ones I loved most were about mathematics, chemistry, physics, and biology. I was never tempted to waste much time at sports, politics, fiction, or gossip, and most of my friends had similar interests. Especially, I was fascinated with the writings of the early masters of science fiction, and I read all the stories of Jules Verne, H.G. Wells, and Hugo Gernsback. Later I discovered the magazines like Astounding Science Fiction, and consumed the works of such pioneers as Isaac Asimov, Robert Heinlein, Lester del Rey, Arthur C. Clarke, Harry Harrison, Frederick Pohl, Theodore Sturgeon — as well as the work of their great editor-writer, John Campbell. At first these thinkers were like mythical heroes to me, along with Galileo, Darwin, Pasteur, and Freud. But there was a difference: all those writers were still alive, and in later years I met them all, and they became good friends of mine, along with their successors — like Gregory Benford, David Brin, and Vernor Vinge, who are fellow scientists as well. What a profound experience it was to be able to collaborate with such marvelous imaginers!

Of course, I also read a great deal of technical literature. But aside from the science fiction, I find it tedious to read any ordinary writing at all. It all seems so conventional and repetitive. To me, the science-fiction writers are our culture's most important original thinkers, while the mainstream writers seem "stuck" to me, rewriting the same plots and subjects, reworking ideas that appeared long ago in Sophocles or Aristophanes, recounting the same observations about human conflicts, attachments, infatuations, and betrayals. Mainstream literature replays again and again all the same old stuff, whereas the science-fiction writers try to imagine what would happen if our technologies and societies — and our minds themselves — were differently composed.

Aside from this, most of my youth was involved in constructing things. Building gadgets. Composing music. Designing new machines. Imagining new processes. When I started at Harvard, in 1946, there was no temptation to play with computers, because there were none. None, that is, except for the Mark I relay computer, which was then being built. I paid almost no attention to it, but one of my Bronx High School of Science classmates, Anthony Oettinger, did. Before long, he became Harvard's first professor of computer science, and around 1952 he wrote one of the first programs to make a computer learn something.

At Harvard, my first concerns were with physics, especially mechanics and optics, and with abstract mathematics, but I soon also got interested in neurophysiology and the psychology of learning. I had the fortune to become attached first to the great B.F. Skinner and then to an extraordinary crew of young professors, including George Miller and Joseph Licklider, who were on the frontier of cybernetics — that great collision between traditional psychology and the new fields of control-engineering that evolved during the Second World War. Perhaps the most important thing that happened, though, was finding a book, Mathematical Biophysics, by Nicholas Rashevsky, while browsing through the stacks of science books in Widener Library. Rashevsky showed me how to make abstract models of real things. Then, in Rashevsky's own journal, the Bulletin of Mathematical Biophysics, I found the current work of Warren McCulloch and Walter Pitts. First was the original McCulloch and Pitts 1943 paper on threshold neurons and state machines, which suggested ways to make computerlike machines by interconnecting idealized neurons. Then there was the tremendously imaginative Pitts-McCulloch 1947 paper on vision and group theory, which was the precursor of the group-invariance theorem in Perceptrons, the book Seymour Papert and I wrote in 1969. I'm pretty sure that it was works like these, and the flurry of ideas in the early Macy Conference volumes, that kept me thinking about how to make machines that could learn. When Norbert Wiener's revolutionary book, Cybernetics, was published in 1949, most of it seemed like old stuff to me, although it taught me a great deal of mathematics.

In the course of thinking about how one might get "neural- network machines" to learn to solve problems, I conceived of what later came to be called Hebb synapses, after the Montreal psychologist Donald Hebb. This inspired me to design a machine in which a randomly connected network of such synapses would compute approximate correlations between stimuli and responses. George Miller got some money from the Air Force Office of Scientific Research, and gave me an account to use to build that machine, which I called the Stochastic Neural Analog Reinforcement Calculator — or SNARC, for short. The machine used about four hundred vacuum tubes and forty little magnetic clutch mechanisms, which would automatically adjust potentiometers, which would in turn control the probabilities that each synapse would transmit a signal from each simulated neuron to another one. The machine worked well enough to simulate a rat learning its way through a maze. I described it in my 1954 Ph.D. thesis, but I don't know how much influence the thesis had on the other researchers. I've never even seen a citation of it, although it has sections proposing other learning mechanisms that so far have never been used.

The SNARC machine was able to do certain kinds of learning, but it also seemed to have various kinds of limitations. It took longer to learn with harder problems, and it sometimes made things worse to use larger networks. For some problems it seemed not to learn at all. This led me to start thinking more about how to solve problems "from the top down," and to start formulating theories about representations and about heuristics for problem solving. In this period, there were only a few people thinking about random neural networks. Nothing very exciting happened in that field — that is, in the field of "general" neural networks, which included looping, time-dependent behavior — until the work of John Hopfield at Caltech, in the early 1980s. There were, however, important advances in the theory of loop-free or "feed forward" networks — notably the discoveries in the late 1950s by Frank Rosenblatt of a foolproof learning algorithm for the machines he called "perceptrons." One novel aspect of Rosenblatt's scheme was to make his machine learn only when correcting mistakes; it received no reward when it did the right thing. This idea has not been adequately appreciated in most of the subsequent work.

The most important other direction in research — of attempting to set down powerful heuristic principles for deliberate, serial problem solving — was already being pursued by Allen Newell, J.C. Shaw, and Herbert Simon. By 1956 they'd developed a system that was able to prove almost all of Russell's and Whitehead's theorems about the field of logic called "proposition calculus." I myself had found a small set of rules that was able to prove many of Euclid's theorems. In the same period, my graduate-school friend John McCarthy was making progress in finding logical formulations for a variety of commonsense reasoning concepts.

Soon the field of artificial intelligence began to make rapid progress, with the spectacular work of Larry Roberts on computer vision, and the work of Jim Slagle on symbolic calculus, and around 1963, ARPA — the Defense Department's Advanced Research Projects Agency — began to support several such laboratories on a reasonably generous scale. Rosenblatt's neural-network followers were also making ambitious proposals, and this led to a certain amount of polarization. This was partly because some of the neural-network enthusiasts were actually pleased with the idea that they didn't understand how their machines accomplished what they did. When Seymour and I managed to discover some of the reasons why those machines could solve certain problems but not others, many of those neovitalists interpreted this not as a mathematical contribution but as a political attack on their work. This evolved into a strange mythology about the nature of our research — but that's another story.

What can you do when you have a problem that you can't seem to solve in a single step? Then you have to find ways to break it up into subproblems and try to find ways to solve each of those. By the end of the 1960s, quite a few people were thinking about this, and I tried to pull the field together and write a book about it. The trouble was that we were discovering new methods faster than we could write them down, so in 1961 I pulled together as much as I could and published one very large paper titled "Steps Toward Artificial Intelligence." Although this was not strictly a synthesis, it did establish a fairly uniform terminology for the field, and establish the subject as a well- defined scientific enterprise. Some techniques proposed in that paper have still not yet been adequately explored.

After pursuing several different approaches to the problem of making artificial intelligence — and trying to decide which method might be best — I finally realized that there's no best way. Each particular method has advantages for particular kinds of situations. This means that they key to making a smart machine is inventing ways to manage a variety of resources — and this led to what Seymour and I called the society-of-mind theory. If you look at the brain, you see that there are hundreds of different kinds of neural nets there — hundreds of different kinds of structures. When you injure different pieces of the brain, you see different symptoms. That led to the idea that maybe you can't understand anything unless you understand it in several different ways, and that the search for the single truth — the pure, best way to represent knowledge — is wrongheaded.

The reason it's wrongheaded is that if you understand something in just one way, and the world changes a little bit and that way no longer works, you're stuck, you have nowhere to go. But if you have three or four different ways of representing the thing, then it would be very hard to find an environmental change that would knock them all up. People are always getting into situations that are a little bit different from old ones. You have to accumulate different viewpoints and different ways of doing things and different mechanisms. If you want to do learning with neural nets, you can't just use one kind of neural net; you'll probably have to design different types suited for remembering stories, for representing geometrical structures, for causal interactions, for making chains-of-reasoning steps, for the semantic relations of language expressions, for the sorts of two-dimensional representations needed for vision, and so on. The secret of intelligence is that there is no secret — no special, magical trick.


Back to Contents

Excerpted from The Third Culture: Beyond the Scientific Revolution by John Brockman (Simon & Schuster, 1995) . Copyright © 1995 by John Brockman. All rights reserved.