The End of Theory: The Data Deluge Makes the Scientific Method Obsolete


The Petabyte Age is different because more is different. Kilobytes
were stored on floppy disks. Megabytes were stored on hard disks.
Terabytes were stored in disk arrays. Petabytes are stored in the
cloud. As we moved along that progression, we went from the folder
analogy to the file cabinet analogy to the library analogy to — well,
at petabytes we ran out of organizational analogies.

At the petabyte scale, information is not a matter of simple three-
and four-dimensional taxonomy and order but of dimensionally agnostic
statistics. It calls for an entirely different approach, one that
requires us to lose the tether of data as something that can be
visualized in its totality. It forces us to view data mathematically
first and establish a context for it later. For instance, Google
conquered the advertising world with nothing more than applied
mathematics. It didn’t pretend to know anything about the culture and
conventions of advertising — it just assumed that better data, with
better analytical tools, would win the day. And Google was right.

Google’s founding philosophy is that we don’t know why this page is
better than that one: If the statistics of incoming links say it is,
that’s good enough. No semantic or causal analysis is required.


This is a world where massive amounts of data and applied mathematics
replace every other tool that might be brought to bear. Out with every
theory of human behavior, from linguistics to sociology. Forget
taxonomy, ontology, and psychology. Who knows why people do what they
do? The point is they do it, and we can track and measure it with
unprecedented fidelity.