How will we ever make sense of scientific topics that are too big to know? The short answer: by transforming what it means to know something scientifically.
Scientific knowledge is taking on properties of its new medium, becoming like the network in which it lives.
There are three basic reasons scientific data has increased to the point that the brickyard metaphor now looks 19th century. First, the economics of deletion have changed. We used to throw out most of the photos we took with our pathetic old film cameras because, even though they were far more expensive to create than today’s digital images, photo albums were expensive, took up space, and required us to invest considerable time in deciding which photos would make the cut. Now, it’s often less expensive to store them all on our hard drive (or at some website) than it is to weed through them.
Second, the economics of sharing have changed. The Library of Congress has tens of millions of items in storage because physics makes it hard to display and preserve, much less to share, physical objects. The Internet makes it far easier to share what’s in our digital basements. When the datasets are so large that they become unwieldy even for the Internet, innovators are spurred to invent new forms of sharing. For example, Tranche, the system behind ProteomeCommons, created its own technical protocol for sharing terabytes of data over the Net, so that a single source isn’t responsible for pumping out all the information; the process of sharing is itself shared across the network. And the new Linked Data format makes it easier than ever to package data into small chunks that can be found and reused. The ability to access and share over the Net further enhances the new economics of deletion; data that otherwise would not have been worth storing have new potential value because people can find and share them.
Third, computers have become exponentially smarter.
The brickyard has grown to galactic size. It’s not simply that there are too many brickfacts and not enough edifice-theories. Rather, the creation of data galaxies has led us to science that sometimes is too rich and complex for reduction into theories. As science has gotten too big to know, we’ve adopted different ideas about what it means to know at all.
For example, the biological system of an organism is complex beyond imagining. Even the simplest element of life, a cell, is itself a system. A new science called systems biology studies the ways in which external stimuli send signals across the cell membrane. Some stimuli provoke relatively simple responses, but others cause cascades of reactions. These signals cannot be understood in isolation from one another.
The result of having access to all this data is a new science that is able to study not just “the characteristics of isolated parts of a cell or organism” (to quote Kitano) but properties that don’t show up at the parts level. For example, one of the most remarkable characteristics of living organisms is that we’re robust — our bodies bounce back time and time again, until, of course, they don’t. Robustness is a property of a system, not of its individual elements, some of which may be nonrobust and, like ants protecting their queen, may “sacrifice themselves” so that the system overall can survive. In fact, life itself is a property of a system.
The problem — or at least the change — is that we humans cannot understand systems even as complex as that of a simple cell. It’s not that were awaiting some elegant theory that will snap all the details into place. The theory is well established already: Cellular systems consist of a set of detailed interactions that can be thought of as signals and responses. But those interactions surpass in quantity and complexity the human brains ability to comprehend them. The science of such systems requires computers to store all the details and to see how they interact. Systems biologists build computer models that replicate in software what happens when the millions of pieces interact. It’s a bit like predicting the weather, but with far more dependency on particular events and fewer general principles.
Models this complex — whether of cellular biology, the weather, the economy, even highway traffic — often fail us, because the world is more complex than our models can capture. But sometimes they can predict accurately how the system will behave. At their most complex these are sciences of emergence and complexity, studying properties of systems that cannot be seen by looking only at the parts, and cannot be well predicted except by looking at what happens.
With the new database-based science, there is often no moment when the complex becomes simple enough for us to understand it. The model does not reduce to an equation that lets us then throw away the model. You have to run the simulation to see what emerges. For example, a computer model of the movement of people within a confined space who are fleeing from a threat–they are in a panic–shows that putting a column about one meter in front of an exit door, slightly to either side, actually increases the flow of people out the door. Why? There may be a theory or it may simply be an emergent property. We can climb the ladder of complexity from party games to humans with the single intent of getting outside of a burning building, to phenomena with many more people with much more diverse and changing motivations, such as markets. We can model these and perhaps know how they work without understanding them. They are so complex that only our artificial brains can manage the amount of data and the number of interactions involved.
Model-based knowing has many well-documented difficulties, especially when we are attempting to predict real-world events subject to the vagaries of history; a Cretaceous-era model of that eras ecology would not have included the arrival of a giant asteroid in its data, and no one expects a black swan. Nevertheless, models can have the predictive power demanded of scientific hypotheses. We have a new form of knowing.
This new knowledge requires not just giant computers but a network to connect them, to feed them, and to make their work accessible. It exists at the network level, not in the heads of individual human beings.