Quantitative sociology attempts, among other things, to establish causal connections between large social factors (race, socio-economic status, residential status) and social outcomes of interest (rates of delinquency). Is this type of inquiry analogous in any way to the use of large disease databases to attempt to identify risk factors? In other words, is there a useful analogy between sociology and epidemiology?
Suppose that the divorce rate for all American men is 30%. Suppose the rate for New York City males with income greater than $200K is 60%. We might want to draw the inference that something about being a high-income male resident of New York causes a higher risk of divorce for these persons. And we might want to justify this inference by noticing that it is similar to a parallel statistical finding relating smoking to lung cancer. So sociology is similar to epidemiology. Certain factors can be demonstrated to cause an elevated risk of a certain kind of outcome. There are "risk factors" for social outcomes such as divorce, delinquency, or drug use.
Is this a valid analogy? I think it is not. Epidemiological reasoning depends upon one additional step -- a background set of assumptions about the ontology and etiology of disease. A given disease is a specific physiological condition within a complex system of cells and biochemical processes. We may assume that each of these physiological abnormalities is caused by some specific combination of external and internal factors through specific causal mechanisms. So the causal pathways of normal functioning are discrete and well-defined, and so are the mechanisms that cause disruption of these normal causal pathways. Within the framework of these guiding assumptions, the task of the statistics of epidemiology is to help sort out which factors are causally associated with the disease. The key, though, is that we can be confident that there is a small number of discrete causal mechanisms that link the factor to the disease.
The case is quite different in the social world. Social processes are not similar to physiological processes, and social outcomes are not similar to diseases. In each case the failure of parallel derives from the fact that there are not unique and physiologically specific causal systems at work. Cellular reproduction has a specific biochemistry. Cancerous reproduction is a specific deviation from these cellular processes. And specific physical circumstances cause these deviations.
Now think about the social world. A process like "urbanization" is not a homogeneous social process. Rather, it is a heterogeneous mix of social developments and events; and these components are different in different times and places. And outcomes that might be considered the social equivalent of disease -- a rising murder rate, for example -- is a also composite of many distinct social happenings and processes. So social systems and outcomes lack the simple, discrete causal uniformity that is a crucial part of epidemiological reasoning.
This is not to say that there are not underlying causal mechanisms whose workings bring about a sharp increase in, say, the population's murder rate. Rather, it is to say that there are numerous, heterogeneous and cross-cutting such mechanisms. So the resultant social outcome is simply the contingent residue of the multiple middle-level processes that were in play in the relevant time period. And the discovery that "X, Y, Z factors are correlated with a rise in the incidence of O" isn't causally irrelevant. But the effects of these factors must be understood as working through their influence on the many mid-level causal mechanisms.