Monday, December 20, 2010

A new tool for intellectual history

Google's NGram Viewer is a really amazing new tool for researchers in literature and the humanities (linklinklink, link).  What is perhaps not quite so evident is the power it may have for people interested in the evolution of the social science disciplines.

Basically the concept is a simple one.  The Google Book project has now scanned and OCR'd millions of books published over the past several centuries.  With the announcement of NGram Viewer, Google has shared a vast indexed database of words and phrases included in this full corpus.  And its Viewer tool allows users to provide specific phrases and see a graph of the frequency of that phrase over time (1800-2000 by default).  It is important to bear in mind that the project measures the frequency of the phrase, not the absolute number of occurrences; so as the volume of words increases decade by decade, a phrase with a .005% share has an increasing number of occurrences.

Now consider this: to what extent is it possible to track themes, concepts, and theories in the social sciences by making use of the tool and a well-chosen list of terms or phrases?  For example, the classical economists (Smith, Ricardo, Marx) referred to their science as "political economy," whereas the economists defining the marginalist revolution (Jevons, Walras, Menger) tended to refer to their work as "economics." Classical economists operated under a fairly distinctive set of assumptions, including the labor theory of value, that differed sharply from the assumptions associated with the marginalist revolution and the creation of neoclassical economics.  So is it possible that we can we track the decline of classical economics and the rise of neoclassical economics by tracking "political economy" and "economics"?  Here is what that graph looks like:

The "political economy" line shows a gradually rising trend from 1800 to 1890, and then declines through about 1950.  Only in the 1980s does the term become more popular again -- perhaps corresponding to a resurgence of interest in the classical foundations of economic thought (Smith, Ricardo, Marx, Sraffa).  The term "economics" has virtually no use prior to 1880; it then begins to increase in frequency, passing "political economy" in frequency in 1910, and climbing rapidly into the 1980s.  The frequency of "economics" appears to be roughly eight times the frequency of "political economy" by 1980.

So how about other areas of the social sciences?  We might take a look at the development of anthropology using this approach; for example, how have the terms "anthropology" and "ethnography" changed in frequency in books during the past two centuries?  Here's the graph:

Through 1860 both terms are very infrequent.  After 1860, however, "anthropology" takes off, and "ethnography" continues at a low level through 1960, roughly at the time when Geertz and other path-breaking anthropologists began redefining the field in the direction of "local knowledge" or ethnography. From that point on both terms increase in frequency (with a large dip for "anthropology in the 1970s), with anthropology remaining a 3:1 favorite.

So now let's try out a somewhat more general question: how did political theory develop during the twentieth century?  We might probe this question by picking out a few important political theorists and examine the pattern of frequency that they demonstrate throughout the century; perhaps this will tell us something about them individually, as well as something interesting about the waxing and waning of interest in political theory more generally during the period.  Suppose we look at Hobbes, Rousseau, and Locke as the central founders of modern political theory.  Here's what that graph looks like:

This is interesting in many ways.  First, individually: Rousseau comes on very strongly in the 1930s, whereas Locke is fairly flat during the period.  Second, comparatively: Locke and Rousseau are consistently more frequent in the book literature than Hobbes -- in spite of our tendency to think of Hobbes as the ultimate founder of modern political theory.  ("Life in the state of nature is nasty, brutish, solitary, and short.")  And finally, there is a striking periodicity in the three graphs.  There are long waves of rise and fall of frequency over the century, and even more surprisingly, these waves are fairly closely correlated.  This implies one of two things: either political theory as a field waxed and waned through the century, with the frequency of references to the founders following along; or else this correlation is a symptom of a large artifact of the method itself.  We could test this latter idea by examining a larger set of authorities, including other areas of intellectual work.  So let's look at a final graph, including the three political theorists as well as Darwin, Newton, and Einstein:

Naturally enough, Einstein starts slowly in the early twentieth century, but then begins to rise in frequency after 1918 or so.  He passes Hobbes for good in 1938 or so.  Darwin, surprisingly enough, shows a falling trend through the century, recovering a bit after 1975.  But we were looking for evidence of artifact, and this graph seems to provide it; to the eye, there is a similar periodicity in the frequency of all these authors throughout the century, with a low point in 1945 (the end of World War II), and for all but Newton, another peak around 1965 followed by a trough around 1980.  So it isn't "political theory" that is oscillating in the ocean of words included in the universe of books, but intellectuals, philosophers, and scientists generally.  And this suggests that there's something going on that may have more to do with the measurements than the reality; more investigation is needed.

In short, it seems promising to think that there are some very interesting patterns to be teased out using this tool, and they are patterns that never could have been identified without the databases that Google has created.  A new form of our ability to probe and represent the growth of knowledge is made possible.


scritic said...

The anthropology-ethnography plot is excellent, seems like a good picture of the discipline.

I tried one of my own: plotting "efficiency" vs. "productivity." I was trying to see if I could see some reflection of what David Harvey calls a change from Keynesian Fordism to a regime of flexible accumulation. And the relative amounts of "productivity" seemed to rise in the 1970s while that of "efficiency" seemed to drop. See here

Siddharth said...

The tool is amazing! I did read about it in the newspapers but just tried it now after reading your post. I think it can be very insightful in many cases.

There is just a small thing though - it doesn't take the context right? When you plotted Newton vs Einstein vs Darwin, my first guess would be that Newton would be totally overshadowed. However, it turns out the other way - Newton trumps both of them. I am assuming there could be some other reasons for this, the first that comes to mind is that newton is also a unit of force, which should be used commonly. Do you think Newton as used in the context of talking about the scientist was more popular than Einstein even during the 50's and 60's?