E Pluribus Unum

If you used a personal computer 25 years ago, everything you needed to worry about was taking place in the box in front of you. Today, the applications you use over the course of an hour are scattered across computers all over the world; for the most part, we've lost the ability to tell where our data sits at all. We invent terms to express this lost sense of direction: our messages, photos, and on-line profiles are all somewhere in "The Cloud".

The Cloud is not a single thing; what you think of as your Gmail account or Facebook profile is in fact made possible by the teamwork of a huge number of physically dispersed components — a distributed system, in the language of computer science. But we can think of it as a single thing, and this is the broader point: The ideas of distributed systems apply whenever we see many small things working independently but cooperatively to produce the illusion of a single unified experience. This effect takes place not just on the Internet, but in many other domains as well. Consider for example a large corporation that is able to release new products and make public announcements as though it were a single actor, when we know that at a more detailed level it consists of tens of thousands of employees. Or a massive ant colony engaged in coordinated exploration, or the neurons of your brain creating your experience of the present moment.

The challenge for a distributed system is to achieve this illusion of a single unified behavior in the face of so much underlying complexity. And this broad challenge, appropriately, is in fact composed of many smaller challenges in tension with each other.

One basic piece of the puzzle is the problem of consistency. Each component of a distributed system sees different things and has a limited ability to communicate with everyone else, so different parts of the system can develop views of the world that are mutually inconsistent. There are numerous examples of how this can lead to trouble, both in technological domains and beyond. Your handheld device doesn't sync with your e-mail, so you act without realizing that there's already been a reply to your message. Two people across the country both reserve seat 5F on the same flight at the same time. An executive in an organization "didn't get the memo" and so strays off-message. A platoon attacks too soon and alerts the enemy.

It is natural to try "fixing" these kinds of problems by enforcing a single global view of the world, and requiring all parts of the system to constantly refer to this global view before acting. But this undercuts many of the reasons why one uses a distributed system in the first place. It makes the component that provides the global view a massive bottleneck, and a highly dangerous single point of potential failure. The corporation doesn't function if the CEO has to sign off on every decision.

To get a more concrete sense for some of the underlying design issues, it helps to walk through an example in a little detail, a basic kind of situation in which we try to achieve a desired outcome with information and actions that are divided over multiple participants. The example is the problem of sharing information securely: imagine trying to back up a sensitive database on multiple computers, while protecting the data so that it can only be reconstructed if a majority of the backup computers cooperate. But since the question of secure information sharing ultimately has nothing specifically to do with computers or the Internet, let's formulate it instead using a story about a band of pirates and a buried treasure.

Suppose that an aging Pirate King knows the location of a secret treasure, and before retiring he intends to share the secret among his five shiftless sons. He wants them to be able to recover the treasure if three or more of them work together, but he also wants to prevent a "splinter group" of one or two from being able to get the treasure on their own. To do this, he plans to split the secret of the location into five "shares," giving one to each son, in such a way that he ensures the following condition. If at any point in the future, at least three of the sons pool their shares of the secret, then they will know enough to recover the treasure. But if only one or two pool their shares, they will not have enough information.

How to do this? It's not hard to invent ways of creating five clues so that all of them are necessary for finding the treasure. But this would require unanimity among the five sons before the treasure could be found. How can we do it so that cooperation among any three is enough, and cooperation among any two is insufficient?

Like many deep insights, the answer is easy to understand in retrospect. The Pirate King draws a secret circle on the globe (known only to himself) and tells his sons that he's buried the treasure at the exact southernmost point on this circle. He then tells each son a different point on this circle. Three points are enough to uniquely reconstruct a circle, so any three pirates can pool their information, identify the circle, and find the treasure. But for any two pirates, an infinity of circles pass through their two points, and they cannot know which is the one they need for recovering the secret. It's a powerful trick, and broadly applicable; in fact, versions of this secret-sharing scheme form a basic principle of modern data security, discovered by the cryptographer Adi Shamir, where arbitrary data can be encoded using points on a curve, and reconstructed from knowledge of other points on the same curve.

The literature on distributed systems is rich with ideas in this spirit. More generally, the principles of distributed systems give us a way to reason about the difficulties inherent in complex systems built from many interacting parts. And so to the extent that we sometimes are fortunate enough to get the impression of a unified Web, a unified global banking system, or a unified sensory experience, we should think about the myriad challenges involved in keeping these experiences whole.