bart_kosko's picture
Information Scientist and Professor of Electrical Engineering and Law, University of Southern California; Author, Noise, Fuzzy Thinking
Statistical Independence

It is time for science to retire the fiction of statistical independence. 

The world is massively interconnected through causal chains. Gravity alone causally connects all objects with mass. The world is even more massively correlated with itself. It is a truism that statistical correlation does not imply causality. But it is a mathematical fact that statistical independence implies no correlation at all. None. Yet events routinely correlate with one another. The whole focus of most big-data algorithms is to uncover just such correlations in ever larger data sets. 

Statistical independence also underlies most modern statistical sampling techniques. It is often part of the very definition of a random sample. It underlies the old-school confidence intervals used in political polls and in some medical studies. It even underlies the distribution-free bootstraps or simulated data sets that increasingly replace those old-school techniques. 

White noise is what statistical independence should sound like. 

The hisses and pops and crackles of true white-noise samples are all statistically independent of one another. This holds no matter how close the noise samples are in time. That means the frequency spectrum of white noise is flat across the entire spectrum. Such a process does not exist because it would require infinite energy. That has not stopped generations of scientists and engineers from assuming that white noise contaminates measured signals and communications. 

Real noise samples are not independent. They correlate to some degree. Even the thermal noise that bedevils electronic circuits and radar devices has only an approximately flat frequency spectrum and then over only part of the spectrum. Real noise does not have a flat spectrum. Nor does it have infinite energy. So real noise is colored pink or brown or some other strained color metaphor that depends on how far the correlation reaches among the noise samples. Real noise is not and cannot be white.

A revealing problem is that there are few tests for statistical independence. Most tests tell at most whether two variables (not the data itself) are independent. And most scientists would be hard pressed to name even them. 

So the overwhelming common practice is simply to assume that sampled events are independent. Just assume that the data is white. Just assume that the data are not only from the same probability distribution but that the data are statistically independent. An easy justification for this is that almost everyone else does it and it's in the textbooks. This assumption has to be one of the most widespread instances of groupthink in all of science.

The reason we so often assume statistical independence is not its real-world accuracy. We assume statistical independence because of its armchair appeal: It makes the math easy. It often makes the intractable tractable.  

Statistical independence splits compound probabilities into products of individual probabilities. (Then often a logarithm converts the probability product into a sum because it is easier still to work with sums than products). And it is far easier to lecture would-be gamblers that successive coin flips are independent than to conduct the fairly extensive experiments with conditional probabilities required to factually establish such a remarkable property. That holds because in general a compound or joint probability always splits into a product of conditional probabilities. The so-called multiplication rule guarantees this factorization. Independence further reduces the conditional probabilities to unconditional ones. Removing the conditioning removes the statistical dependency. 

Andrei Markov made the first great advance over independence or whiteness when he studied events that statistically depend on only the immediate past. That was over a century ago. 

We still wrestle with the math of such Markov chains and find surprises. The Google search algorithm rests in large part on finding the equilibrium eigenvector of a finite Markov chain. The search model assumes that Internet surfers jump at random from web page to web page much as a frog hops from lily pad to lily pad. The jumps and hops are not statistically independent. But they are probabilistic. The next web page you choose depends on the page you are now looking at. Real web surfing may well involve probabilistic dependencies that reach back to several visited web sites. It is a good bet that the human mind is not a Markov process. Yet relaxing independence to even one-step or two-step Markov dependency has proven a powerful way to model diverse streams of data from molecular diffusion to speech translation.

It takes work to go beyond the simple Markov property where the future depends only the present and not on the past. But we have ever more powerful computers that do just such work. And many more insights will surely come from the brains of motivated theoreticians. Giving up the crutch of statistical independence can only spur more such results.

Science needs to take seriously its favorite answer: It depends.