emanuel_derman's picture
Professor, Financial Engineering, Columbia University; Author, Models.Behaving.Badly
The Power of Statistics

I grew up among physicists, whose modus operandi is to observe the world, experiment with it, develop hypotheses and theories and models, suggest further experiments, and use statistics to analyze the results, thereby comparing mental imaginings with actual events. Statistics is simply their tool for confirmation or denial. 

But nowadays the world, and especially the world of the social sciences, is increasingly in love with statistics and data science as a source of knowledge and truth itself. Some people have even claimed that computer-aided statistical analysis of patterns will replace our traditional methods of discovering the truth, not only in the social sciences and medicine, but in the natural sciences too.

I believe we must be careful not to get too enamored of statistics and data science and thereby abandon the classical methods of discovering the great truths about nature (and man is nature too). A good example of the classical power is Kepler's 17th Century discovery of his second law of planetary motion, which is in fact less a law than the recognition and description of a pattern. Kepler's second law states that the line between the sun and a moving planet sweeps out equal areas in equal times. This deep symmetry of planetary motion implies that the closer a planet is to the sun, the more rapidly it moves along its orbit. But notice that there is no line between a planet and the sun. Kepler's still astonishing insight required examining Tycho Brahe's data, a long mental struggle, a burst of intuition—use an invisible line!—and then checking his hypothesis. Data, intuition, hypothesis, and finally comparison with data is the time-honored process.

Kepler's second law is in fact a statement of the conservation of angular momentum that followed later from Newton's theories of motion and gravitation. Newton's theories were so readily and immediately accepted because Kepler's three verified laws could be derived from them. John Maynard Keynes wrote of Newton three hundred years later: "I fancy his pre-eminence is due to his muscles of intuition being the strongest and most enduring with which a man has ever been gifted."

Statistics—the field itself—is a kind of Caliban, sired somewhere on an island in the region between mathematics and the natural sciences. It is neither purely a language nor purely a science of the natural world, but rather a collection of techniques to be applied, I believe, to test hypotheses. Statistics in isolation can seek only to find past tendencies and correlations, and assume that they will persist. But in a famous unattributed phrase, correlation is not causation.

Science is a battle to find causes and explanations amidst the confusion of data. Let us not get too enamored of data science, whose great triumphs so far are mainly in advertising and persuasion. Data alone has no voice. There is no "raw" data, as Kepler's saga shows. Choosing what data to collect and how to think about it takes insight into the invisible; making good sense of the data collected requires the classic conservative methods: intuition, modeling, theorizing, and then, finally, statistics.