EPISTEMIC VIRTUES
PETER GALISON: I’m interested in the question of epistemic virtues, their diversity, and the epistemic fears that they’re designed to address. By epistemic I mean how we gain and secure knowledge. What I’d like to do here is talk about what we might be afraid of, where our knowledge might go astray, and what aspects of our fears about how what might misfire can be addressed by particular strategies, and then to see how that’s changed quite radically over time.
The place where Lorraine Daston and I focused in the study of objectivity, for example, was in these atlases, these compendia of scientific images that gave you the basic working objects of different domains—atlases of clouds, atlases of skulls, atlases of plants, atlases in the later period of elementary particles. These are volumes, literary objects, and eventually digital objects that were used to help classify and organize the ground objects of different scientific domains.
In the periods you might schematize by being 1730-1830—and these dates are arbitrary and overly precise—there was a desire above all to find the objects that were in back of the objects that we happen to see. In other words, not this clover outside the boardroom that's been half moth-eaten and half sunburned, but the plant form that exists behind that. That’s what Goethe meant when he talked about the "urpflanze." The advantage of that seemed obvious. The fear was that you would spend your time looking at particular defective clovers here or there and not understand that they were unified under a particular form that was the reality behind the curtain of mere appearances.
When William Cheselden in 1733 hung a skeleton and looked at it through a camera obscura, he wasn't looking to draw that particular skeleton; he was trying to use that and then correct the errors—the fact that it was too fat, or too thin, or had a cracked rib. When Albinus said, "I draw what I draw and then I fix the imperfections," it was because it seemed obvious that the images you would want of a skeleton—or a flower, or an insect, or whatever it was—was not the skull that belonged to me or you, but the skull that belonged behind all the particular skulls that we might see.
There was a fear of the multiplied variegated skulls, or clovers, or clouds that we might see, and the antidote was to draw something abstracted from that that was supposed to lie behind any particulars. Goethe would say, "I never draw any particular thing." There was a particular kind of person who was appropriate to doing this, and that was the genius. In the 18th century it was recognized that it was fine for an Albinus, or a Goethe, or a Cheselden to make that kind of argument.
In the 19th century that begins to proliferate. When everybody starts writing down, or drawing, or painting the objects that they thought should be there and they start to clash, there’s a new problem brought about by the conflict of the myriad depictions of the heart, or the skull, or the plant world, or the natural world, or crystals, or other things. The epistemic fear was of this contradictory multiplication of representations, each of which purported to be the urpflanze or the equivalent in other domains. The response to that was to seek out mechanical transfer of the world to the page. And by mechanical, that doesn’t mean just the levers and pencils, it could mean any kind of thing, including chemical-based photography. In the 19th century, mechanical meant all of those procedural developments.
This was labeled objectivity for the first time in a sense that’s continuous with the modern sense. When Descartes uses a term like "objective," he means more or less the opposite from what we do, but that’s another story. Starting around 1830, coming from a mix of literary and scientific sources, people start to talk about this as the mapping of the clover to the page—whether it’s tracing, or rubbing, or photographic representation—to minimize our intervention.
If Goethe, Cheselden, and Albinus were maximizing our intervention because they were the sort of people who could part the curtain of experience, the 19th century wanted to minimize that because people didn't trust the multiplied number of scientists in the world. They wanted to know what was actually there—the skull of this person in Case 23 in the Museum of Natural History in Berlin. So, that became a different response to a different fear that had swung the other way.
Then a new kind of problem arose in which there were lots of different skulls, each correctly or isomorphically represented, at least that was the ambition. People began to question how we know whether a skull has a tumor or whether it's just a normal variation. So they started to have atlases of normal variations. You can see how this leads to a regressive problem that could go on forever, because the space of possible variations of skulls just within this room is pretty large. Now think about extending that over all of humanity and all of time. It became very hard to work.
The way the doctors used these atlases was to identify what's normal and what's within the range of normal, so if they saw something that didn't look like that, it was pathological. What got me interested in this in the first place were these atlases of cloud chamber and bubble chamber images in particle physics, where it was used in a very interesting way. This is a literary form that the physicists borrowed from the doctors, even though physicists don’t like borrowing from doctors. They said, "If you see an image that departs from the range of the normal, what you have is a discovery, not a pathology." So, the bubble chamber scanners at Berkeley, or CERN, wherever it was, would study these compendia and then send an alert through the chain of command. Once it got up to a Luis Alvarez or somebody else, they could say they discovered something.
What then began to happen was people started to see the importance of using judgment. This pure mechanical objectivity was proliferating like crazy with all these variations. People needed to know the difference between a misfiring of the apparatus or the environment and what the real effect was. The people making magnetograms of the sun said, "We could print mechanically and objectively what we get out of our machines. You wouldn't be able to tell what’s an artifact of our machines, but we know." The implication was not because they’re geniuses, but because they were trained. That kind of trained judgment became a new objectivity.
People began to worry about how they would train people to recognize artifacts and to do it in a way that follows a course or a procedure. For instance, there is a famous atlas of electroencephalograms, and people said, "Do our course for several weeks, and we can train you to distinguish grand mal and petit mal seizures and various other things, not because you’re a genius but because we can train you."
That became a mantra in the 20th century, that you have all these atlases that explicitly extoled the human capacity to learn judgment. They could train anyone to look at electroencephalograms to make these kinds of distinctions in a way that was repeatable and therefore objective, but not mechanical. They didn't know how to do that. This is in the '40s, and '50s and '60s. They didn't know how to make it purely algorithmic.
The same was true in stellar spectra and other astronomical problems. Long before you could classify stars by a procedure or an algorithm, people became very good at classifying them by looking at the spectra and making judgments. These are shifts in response to fears. Epistemic virtue is the response; it’s the Rx to the Dx. The diagnosis of the problem was some fear, and these are responses of procedure, of judgment, of mechanical transfer to those difficulties.
There's a current project that I’m involved with in various ways, the Event Horizon Telescope. They’re trying to make images of very distant objects, like supermassive black holes and other objects in the sky. One of the problems is that the data is extremely sparse and noisy, and you have to extract an image from it.
There are two problems, one of which is the spring of Narcissus problem. The spring of Narcissus problem is that you can’t just print what you see because you don’t see anything. If I gave you a bunch of points and told you to draw the best curve through it, you would rightly tell me that's not a well-posed question. You can say, draw the best straight line through it, and that would be easy—every ninth grader can do that. If you want the best circle or the best hyperbole, whatever it is, you can solve the problem. You need to assume something, then you can get information out. These images have that character. You have to make some kind of Bayesian assumption prior, and then from that you can create an image. That leads to the problem of Narcissus. The worry is that you might impose so heavily your prior assumption about what you would see that you would see it when it wasn’t there, like when Narcissus looks into the spring and sees his face. If you don’t impose any prior knowledge, then you have the opposite problem—the problem of the helm of darkness, which means you that don’t see anything, so you can’t extract anything. You have this sparse, noisy image.
The question on this collaboration is, how do you get an objective image? There are various strategies that they’ve taken. They’re very interesting. For instance, one of them is to divide the team of about 120 or 200 people on this collaboration, but the imaging teams are divided into groups and they work under utter secrecy from each other within the collaboration. They produce their images and compare them. Another strategy is to vary the priors, and then the question is, have you varied the priors enough to give you an objective image?
There’s another possibility, which has been suggested by the AI folks. For example, the space telescope has a huge number of galaxies, more than the astronomers could cope with, to try to classify and understand. One of their first moves was to make this into a public game. There are hundreds of thousands of people who do this thing called "Galaxy Zoo," where you’re given images, you take a training program, you take a test, and then you start classifying galaxies. Some people didn’t like that, so they suggested training computers to classify these galaxies. So, they began to train the computer to classify the galaxies using these learning neural net arguments. So then they said, "Okay, we’ve classified these things, which is great, but we can’t interrogate the program as to what it’s doing." It had that obscurity that we’ve talked about here before. This is a different kind of problem, where you've gained opacity and capacity at the same time. You can classify a lot of things, you can show it overlaps in the restricted domain where you’ve got experts and it gives the right answer, but you don’t really know what it has done.
There are lots of interesting papers where people start to talk about AI and attributing to it a kind of human capacity. They say it mislearned, or started to act pathologically; it found a little bit of striping on the snail and has covered the snail completely with stripes, made it look like a zebra snail. This attribution of purpose and humanity to this program, partly in virtue of the fact that it seems to be making human kinds of errors, becomes a big issue because you can’t ask it what it’s doing. We'd like the AI to take over some of these tasks as a way of solving the objectivity problem, but then in response we have an opacity problem. That happens in a lot of domains.
In my contribution for the book, I talked a little bit about algorithmic sentencing. This is, for instance, if a judge wants to sentence people on their objective likelihood of committing another crime. The problem is that because of the proprietary secrecy of the company that makes the algorithms, or because the algorithms are so complicated that they can’t unwind them, they don’t know or won't be told how the decision is being made and whether it's using criteria that would violate our norms. So, if you live above 125th Street in Manhattan and you're given a higher sentence, this is just a proxy for race.
That’s in the moral, political, legal domain, but in the epistemic domain of the sciences, there are analog questions that you might ask. What kinds of criteria are being emphasized in this? What is the alternative to this opacity? We know what the gain could be: It could increase our capacity, give us objectivity beyond human judgment. But it costs us in what we can interrogate. Suppose that it worked, suppose we were completely happy with it. Would that be enough in the scientific applications of AI? I don't mean what you buy on Netflix or Amazon; I certainly don’t mind not knowing the algorithm by which it tells me I might like a movie if I like the movie. I just question whether even excellent prediction in the scientific domain would satisfy us.
I just want to end with a reflection that James Clerk Maxwell had back in the 19thcentury that I thought was rather beautiful. James Clerk Maxwell, just by way of background, had done these very mechanical representations of electromagnetism—gears and ball bearings, and strings and rubber bands. He loved doing that. He’s also the author of the most abstract treatise on electricity and magnetism, which used the least action principle and doesn’t go by the pictorial, sensorial path at all. In this very short essay, he wrote, "Some people gain their understanding of the world by symbols and mathematics. Others gain their understanding by pure geometry and space. There are some others that find an acceleration in the muscular effort that is brought to them in understanding, in feeling the force of objects moving through the world. What they want are words of power that stir their souls like the memory of childhood. For the sake of persons of these different types, whether they want the paleness and tenuity of mathematical symbolism, or they want the robust aspects of this muscular engagement, we should present all of these ways. It’s the combination of them that give us our best access to truth." What he was talking about in some ways was himself; this is what he wanted.
If you go back to one of the great Old English origins of the word "understanding," under doesn’t mean beneath, it actually meant "among." "Standing," was different forms of standing. It’s almost like you’re standing in a grove of different trees. That sense of being among these different ways of grasping the world—some predictive, some mathematical—even something as abstract as a black hole, there are models that use swirling water like a bathtub around a bathtub drain to understand the dynamics of the ergosphere, that ability to stand among these different things might be something that we want and whether we can make use in different ways of AI, or whether AI will only be part of that understanding seems to me to be known.
* * * *
NEIL GERSHENFELD: Peter, there’s one straightforward response. When you say the network is inscrutable, that’s an early, simple version. There's an interesting thing happening with what are called auto encoder networks, where you force the network through a constriction and you force it to have low-dimensional representation after it’s gone through this high-dimensional unpacking. There are been a lot of interesting results where you then look at these internal representations and find they’re interpretable. It's a simple version just to say it’s a big network and the output comes—you can ask the network to help you find a representation. There have been a number of interesting examples of what comes from those.
GALISON: Yes, I’ve seen them. There are a bunch of different ways of sampling in the space that can help you, but the people that do a lot of this imaging work find that they are unable to unwind those.
GERSHENFELD: What I’m saying isn’t sampling, it’s something different that’s a little more recent. As part of training the network, you train the network through an internal constriction where you ask it to find an interpretable representation. So, it’s a different architecture from just you look at the network and figure out what it’s doing. You train the network to teach you a representation you can understand. There are very interesting examples of that working.
W. DANIEL HILLIS: I understand the sense in which you say the networks are inscrutable, but I’m surprised that you think people are scrutable. If you ask somebody why they decided something, they will make up a story. There’s very good evidence that in many cases that story has nothing to do with what happened.
GALISON: That’s true. But you can sometimes get farther with it. At least that was the hope.
HILLIS: I would think there's a better hope, particularly if you want to have networks that have the property of understandability, which is the kind of thing Neil is talking about—to have AI that is truly understandable in how they made the decision. There's more hope with that than with humans.
ALISON GOPNIK: There are two orthogonal problems that are getting mixed up here. One of them has to do with how much access the system has to its own processes. The other one, which is the scientific problem, has to do with whether the system is outputting a representation of what’s actually out there in the world. If you think that all of human cognition, or at least a lot of it, is this inverse problem about a bunch of data that’s coming and you want to reconstruct what it was out there in the objective world that was creating that data, that’s the central problem of something like a visual system and it’s the central problem of science.
First of all, do you understand what the process is that is enabling you to solve that inverse problem? Secondly, do you have something that looks like a solution to the inverse problem? Do you have a representation, whether it’s accurate or not, about what’s going on in the world outside that’s leading to that pattern?
HILLIS: To answer that second question, though, you have to have some criterion of the quality of the solution. That’s a very well studied thing in classifier theory. There are many measures of the quality of the solution when you decide to basically cluster things. So, you can pick your measure, and you can measure how good it is under that measure.
GALISON: It's not that people said, "Humans make judgments. That’s fine." In fact, what led to the development of mechanical objectivity in the first place was that they didn’t like relying on those judgments, even by people like Goethe and Albinus, because they felt that it was obscure. When it began to be a real problem was when it proliferated in the 19th century.
IAN MCEWAN: In your early atlases, we see the power of Platonic thought and the extension of Neoplatonism.
GALISON: Yes, that’s how I think of it. It’s a sense that you can find a pure form that lies behind the myriad particularities that we encounter.
MCEWAN: This is Plato’s cave, in fact.
HILLIS: There is a way of representing clusters, which is to pick the center of the cluster and then pick all of the things that are closest to it. The better algorithms like vector support methods pick a bunch of outliers, so anything farther out than this won't be considered in the cluster.
GALISON: In the history of classifying images by scientists, there are a bunch of different strategies. One was to take the most perfect extant instance—the best skull—abstract from that, make it even more perfect, maybe geometrizing it in some way or making it into a perfect harmony of measures. Another was to take an extreme example or an average. There were atlases that would take like many livers and weigh them all and find the average weight of a liver, and then that became the notion. That’s more like your center choice that you were saying. Then in the biological domains, very often they the first-discovered instance becomes the type specimen, which is even stranger.
GERSHENFELD: Harvard has this amazing room of drawers in the museum. So, you pull open the drawer and it will look like little fur pelts, and the fur pelts might be for a beaver and a squirrel. But it’s not a beaver; it’s the beaver. It is the beaver that all beavers are defined by. That’s become much more important recently because they’re now sequenced, and they're used to do genotypes and phenotypes.
GALISON: You can see that there’s a struggle to try to figure out how to make a representation of a class of things that are different.
HILLIS: In clustering techniques, the ones that work less well in practice or the two that you mentioned before—pick the one that’s closest to the center, or make up an imaginary one that’s in the center—those don’t work very well.
It turns out that the ones that seem to behave the best in practice is something that was not in any of your lists. I’m not sure if this is ever done in atlases, but it’s basically what I call support vector, where the support vector is the set of things that are right on the edge. You define it by the things that are barely within the category.
GOPNIK: Danny, is it hat you have an objective measure of what the things are that are being clustered, independently of the cluster?
HILLIS: Yes.
GOPNIK: That’s the problem that Peter is raising. When you’re doing science, you’re in this situation in which you are trying to do the clustering and you’re trying to figure out what the thing is that’s generating the data that you want to cluster.
GALISON: When you’re looking at a candidate black hole with surround and you don’t know what it’s going to look like, it’s different than saying, "There are three kinds of galaxies. I want to classify them. This one, I know it looks like one of those, etc."
CAROLINE JONES: The problem seemed to proliferate off earth. Out of Plato's cave is fine when you can wander around and pick up the turtle or the beaver, but when you’re drawing the canals of Mars, you don’t have the symbols, you just have geometries that it turns out you’re imposing. You’re imposing on the mechanical artifact.
SETH LLOYD: I disagree. Neoplatonism is never fine.
JONES: As a historical progression, Platonism works for those people who have philosopher kings and can wander and pick up the beaver. It works less well when you’re relying on a telescope and looking at a surface of a distant planet.
LLOYD: You’re describing support vector machines, where it’s a mathematically well-defined process to make a cluster, or you have two clusters and you try to draw the hyperplane that separates them with the maximum margin, which is a good idea and works extremely well also in high dimensions. Then, of course, the stuff on the edge of this margin, the support vectors define it. Then you could also do k-means clustering, which is the other one. You pick a represented example and you say, "We can put these together," and you find that both of them work okay, but of course, there aren’t necessarily clusters. There’s no definition of what a cluster is.
These things will have examples of overlap with each other, so you’ll have things that impinge on the other clusters. There is no abstract ideal cluster that’s there, so you have to come up with some reasonable Bayesian prior to say, "Okay, here is how we’re going to deal with this situation."
Now, we have this nice feature where we admit these artificial intelligences, which we are not going to understand in roughly the same way we don’t understand humans, into our spectrum of models that we’re going to trust in order to do things like look at medical images. In addition to radiologists looking at medical images, we can also run them by the deep neural network and see what they say, too.
It’s rather nice that we have other artificial intelligences with whom to collaborate. We don’t know what they’re doing, but we also have other methods that we can compare, things like support vector machines and well-defined mathematical methods where we know what’s going on, which is incidentally is what happens in Netflix. Netflix does not have a deep neural network, they have a matrix completion algorithm, where it’s well defined mathematically. It’s very labor intensive, but we could walk through it and say, "Here’s what’s happening inside your computer exactly and here’s why it works."
HILLIS: In many examples, there is an outside way of measuring the volley, which is if you’re going to do something with the decision, it’s the utility of the success of doing whatever. There are many systems in which you can say, "Well, this clustering technique was better than that one because it corresponded more to the way that we use the decision."
GALISON: It’s easier if you have other independent tests and you could, say, go to a higher frequency.
HILLIS: There’s a great example of that, which was done by accident with the example you said about the type specimens. The grouping of animals, in terms of genus, and species, and things like that, was done by people deciding that the important character is the shape of the jaw, or the number of tailbones, or something like that. What was interesting was that was all done pre-DNA and even pre-necessarily everybody that was doing it believing in evolution. But when we got the ability to sequence mitochondrial DNA and got some insight into the process, it turns out almost exactly all those judgments were correct. They had picked the correct character and so on, so their clustering method, although it seemed very arbitrary, in fact, exactly reproduced the tree of life.
GALISON: If you’re an explorer hiking in the Amazon and you find this turtle that you think is a new turtle—no one has seen a turtle that’s of this species—you would probably not choose one that looked almost like the extreme example of the new turtle that looked a lot like an old turtle; you’d look for a turtle that was pretty different. So, you’re prejudiced towards a type specimen that is more distinct than the marginal one might be.
MCEWAN: I’m thinking here of Leeuwenhoek’s submissions to the Royal Society, where he is drawing a sperm and he inserts a homunculus.
GALISON: That seems to me an early example of what I call the spring of Narcissus problem. No one knows what a supermassive black hole, what the form of the shadow is going to be. There are models and simulations, but no one knows. It’s not like looking at a known galaxy and then saying, "How does my method match up with what we’ve already seen at not as good gradient telescopes."
GERSHENFELD: Something that is misleading in the way we’ve been talking about this is modern clustering algorithms don’t give you true force; they give you distributions. A hard clustering like k-means will miss something important just over the boundary. It’s probabilistic. Applied to this, you don’t get the image, you get PDFs over families of images. That’s how modern clusters work. In modern classification, you don’t get the classification, you get the probabilities of associations, and the classifier wouldn't tell you the difference between 49, 51 and 0, 100 likelihood. Modern classifiers give you the classification, but they also give you uncertainty on top of that.
GOPNIK: There’s an interesting contrast if you’re looking at humans, and especially if you’re looking at kids, in that one of the things that people have discovered that’s interesting is if you’re looking at kids categorizations, there certainly seemed to be some kinds of processes that are doing things in an associative way that are essentially looking for distributions. But by the time kids are linguistically categorizing things, they have something that is a much more essentialist, science-like category, a natural kind category. So, what they think is the thing that you’re pointing to when you say something about a dog, it has nothing to do with any distribution of the properties of a dog. They think, it’s whatever is the underlying causal category, whatever is the underlying causal set of properties, which is giving rise to some set of data, some set of things that you’re perceiving, which could turn out to be completely wrong.
That abstract notion about what a category is that comes in science, the natural kind idea that it’s whatever is out there in the world that’s causing this set of correlations among data, that seems to be what the four-year-olds think that a category is and not the idea of the distribution. You can show that they can detach the distribution from the time they’re infants, so they are in effect doing the clustering, but their conception of what’s going on with the clustering, even when you’re three or four years old, is that it’s this abstract underlying causal system that’s giving rise. The Bayesian picture seems to be very deeply built in to the way we were even thinking about categories.
GALISON: It may be in the trees that stand around us in understanding that AI will come in as more than one tree and that there may be different ways that AI will function in that. It won't be just the AI tree, and the differential equation tree, and the analog model tree, and so on; rather, AI may stand in different ways and in different forms of clustering, in particulars and probabilistic distributions and so on. We ought to remain open to that possibility, too, that more than one AI in the image context will help constitute what we mean by understanding in ten years.