The Pigeonhole Principle

Certain facts in mathematics feel as though they contain a kind of compressed power—they look innocuous and mild-mannered when you first meet them, but they're dazzling when you see them in action. One of the most compelling examples of such a fact is the Pigeonhole Principle.

Here's what the Pigeonhole Principle says. Suppose a flock of pigeons lands in a group of trees, and there are more pigeons than trees. Then after all the pigeons have landed, at least one of the trees contains more than one pigeon.

This fact sounds obvious, and it is: there are simply too many pigeons, and so they can't each get their own tree. Indeed, if this were the end of the story, it wouldn't be clear why this is a fact that even deserves to be named or noted down. But to really appreciate the Pigeonhole Principle, you have to see some of the things you can do with it.

So let's move on to a fact that doesn't look nearly as straightforward. The statement itself is intriguing, but what's more intriguing is the effortless way it will turn out to follow from the Pigeonhole Principle. Here's the fact: Sometime in the past 4000 years, there have been two people in your family tree—call them A and B—with the property that A was an ancestor of B's mother and also an ancestor of B's father. Your family tree has a "loop", where two branches growing upward from B come back together at A—in other words, there's a set of parents in your ancestry who are blood relatives of each other, thanks to this relatively recent shared ancestor A.

It's worth mentioning a couple of things here. First, the "you" in the previous paragraph is genuinely you, the reader. Indeed, one of the intriguing features of this fact is that I can blithely make such assertions about you and your ancestors, despite the fact that I don't even know who you are. Second, the statement doesn't rely on any assumptions about the evolution of the human race, or the geographic sweep of human history. Here, in particular, are the only assumptions I'll need. (1) Everyone has two biological parents. (2) No one has children after the age of 100. (3) The human race is at least 4000 years old. (4) At most a trillion human beings have lived in the past 4000 years. (Scientists' actual best estimate for (4) is that roughly a hundred billion human beings have ever lived in all of human history; I'm bumping this up to a trillion just to be safe.) All four assumptions are designed to be as uncontroversial as possible; and even then, a few exceptions to the first two assumptions and an even larger estimate in the fourth would only necessitate some minor tweaking to the argument.

Now back to you and your ancestors. Let's start by building your family tree going back 40 generations: you, your parents, their parents, and so on, 40 steps back. Since each generation lasts at most 100 years, the last 40 generations of your family tree all take place within the past 4000 years. (In fact, they almost surely take place within just the past 1000 or 1200 years, but remember that we're trying to be uncontroversial.)

We can view a drawing of your family tree as a kind of "org chart", listing a set of jobs or roles that need to be filled by people. That is, someone needs to be your mother, someone needs to be your father, someone needs to be your mother's father, and so forth, going back up the tree. We'll call each of these an "ancestor role"—it's a job that exists in your ancestry, and we can talk about this job regardless of who actually filled it. The first generation back in your family tree contains two ancestor roles, for your two parents. The second contains four ancestor roles, for your grandparents; the third contains eight roles, for your great-grandparents. Each level you go back doubles the number of ancestor roles that need to be filled, so if you work out the arithmetic, you find that 40 generations in the past, you have more than a trillion ancestor roles that need to be filled.

At this point it's time for the Pigeonhole Principle to make its appearance. The most recent 40 generations of your family tree all took place within the past 4000 years, and we decided that at most a trillion people ever lived during this time. So there are more ancestor roles (over a trillion) than there are people to fill these roles (at most a trillion). This brings us to the crucial point: at least two roles in your ancestry must have been filled by the same person. Let's call this person A.

Now that we've identified A, we're basically done. Starting from two different roles that A filled in your ancestry, let's walk back down the family tree toward you. These two walks downward from A have to first meet each other at some ancestor role lower down in the tree, filled by a person B. Since the two walks are meeting for the first time at B, one walk arrived via B's mother, and the other arrived via B's father. In other words, A is an ancestor of B's mother, and also an ancestor of B's father, just as we wanted to conclude.

Once you step back and absorb how the argument works, you can appreciate a few things. First, in a way, it's more a fact about simple mathematical structures than it is about people. We're taking a giant family tree—yours—and trying to stuff it into the past 4000 years of human history. It's too big to fit, and so certain people have to occupy more than one position in it.

Second, the argument has what mathematicians like to call a "non-constructive" aspect. It never really gave you a recipe for finding A and B in your family tree; it convinced you that they must be there, but very little more.

And finally, I like to think of it as a typical episode in the lives of the Pigeonhole Principle and all the other quietly powerful statements that dot the mathematical landscape—a band of understated little facts that seem to frequently show up at just the right time and, without any visible effort, clean up an otherwise messy situation.