joshua_bongard's picture
Cyril G. Veinott Green and Gold Professor, Department of Computer Science, University of Vermont Author, How the Body Shapes the Way We Think
The Symbol Grounding Problem

I.J. Good, a collaborator of Alan Turing’s during World War II, once wrote that a machine more intelligent than humans would be the “last invention” we make. Many believe that our society is about to reach this point, as smart machines increasingly invade our daily lives and imaginations. This invasion is proceeding along two fronts. Such machines are increasingly to blame for economic change, leading some people to question the necessity of work altogether. But, more deeply, we wonder whether such ultraintelligent machines will be “like us”: they may think, but will they feel?

To help anchor these latter kinds of questions, Stevan Harnad formulated the symbol grounding problem. For a machine, it’s problematic to learn what the symbols that make up human languages mean by simply transforming heard words into spoken words. It’s like trying to learn Egyptian hieroglyphs using a dictionary in which each hieroglyph is defined using other hieroglyphs. Consider this exchange I just had with the most high-powered chatbot on the web:

Me:   Hi there.
It:               Hello, how are you?
Me:   Not too bad. You?
It:               Fine thank you.
Me:   I've been thinking lately.
It:               Thinking about what?
Me:   The symbol grounding problem.
It:               Can you tell me a happy story?

The solution to the symbol grounding problem is to ground symbols in the rich soil of physical experience. For example, a robot may learn that the symbol “jump” correlates with sudden loss of pressure of the soles of its feet. This robot may begin to empathize with humans by asking one of them to jump, observing her feet leaving the ground, and inferring that she feels the same things that it feels when it jumps. Finally, such a machine may recursively ground increasingly abstract symbols in these already-grounded symbols, until eventually it understands the English idiom “don’t jump to conclusions” the first time it hears it.

The public should care about this seemingly esoteric corner of cognitive science because machines that do not ground symbols in this way will be dangerous. Consider my exchange with the chatbot. Not only did it not know that its final comment was “wrong,” but it failed to predict that I would be frustrated or amused by the comment. Similarly, another machine may fail to predict my terrified response to its actions.

Current machines can now, after receiving a million photographs containing a human and another million that do not, tell you whether or not a new photograph contains a human, without having to ground symbols in experience. But consider another data set, composed of two million conversations: In the first million, the speakers are discussing how best to help Bob; in the second million, they are conspiring to harm him. Current state-of-the-art machines cannot tell you whether the speakers in a new conversation intend to help or harm Bob.

Most humans can listen to a conversation and predict whether the person being discussed is in danger. It may be that we can do so because we have heard enough discussions in real life, books, and movies to be able to generalize to the current conversation, not unlike computers that recognize humans in previously-unseen photographs. However, we can also empathize by connecting words, images, and physical experience: We can put ourselves in the shoes of the people talking about Bob, or into the shoes of Bob himself. If one speaker says “one good turn deserves another” and follows it with a sarcastic sneer, we can take those verbal symbols (“one,” “good,” …), combine them with the visual cue, and do some mental simulation.

First, we can go back in time to inhabit Bob’s body mentally, and imagine him/us acting in a way that lessens the speaker’s hunger or assuages another of her physical or emotional pains. We can then return to the present as ourselves and imagine saying what she said. We will not follow up the statement with a sneer, as she did. Our prediction has failed.

So, our brain will return to the past, inhabit Bob’s body again, but this time mentally simulate hurting the speaker in some way. During the act we transfer into the speaker’s body and suffer her pain. Back in the present, we would imagine ourselves saying the same words. Also, feelings of anticipated revenge would be bubbling up inside us, bringing a sneer to our lips, thus matching the speaker’s sneer. So: we predict that the speakers wish to harm Bob.

Growing evidence from neuroscience indicates that heard words light up most parts of the brain, not just some localized language module. Could this indicate a person twisting words, actions, their own former felt experiences, and mental body snatching into sensory/action/experiential braided cables? Might these cables support a bridge from the actions and feelings of others to our own actions and feelings, and back again?

These machines may be useful and even empathetic. But would they be conscious? Consciousness is currently beyond the reach of science, but one can wonder. If I “feel” your pain, the subject and the object are clear: I am the subject and you are the object. But if I feel the pain of my own stubbed toe, the subject and object are not as obvious. Or are they? If two humans can connect by empathizing with each other, cannot two parts of my brain empathize with each other when I hurt myself? Perhaps feelings are verbs instead of nouns: they may be specific exchanges between cell clusters. May consciousness then not simply be a fractal arrangement of ever smaller sensory/motor/experiential braids grounding the ones above them? If myths tell us that the Earth is flat and rests on the back of a giant turtle, we might ask what holds up the turtle. The answer, of course, is that it’s turtles all the way down. Perhaps consciousness is simply empathy between cell clusters, all the way down.