Let's suppose that this statement about urban-rural differences in educational levels summarizes census data by calculating the average number of years of formal schooling for people in cities and people in rural areas. Is this brute fact about two large populations a valid description of the two populations? Is it an important or illuminating sociological fact?
There are several types of questions we need to ask about this fact. First, there are questions to address about the statistical features of the data themselves. How are these two populations distributed around the mean? How wide is the variance around the mean? Do we get a different result if we measure the median number of years of schooling as opposed to the mean? Which of these measures is a more meaningful description of the population as a whole -- median or mean?
Consider a hypothetical set of results. If the rural population is quite homogeneous around a mean of 10 years of schooling, while the urban population is widely distributed in a range from 5 years to 20 years, the fact that the urban mean is 11 years is somewhat misleading; the majority of urban people in this hypothetical case have less than 9 years (so urban is less well educated), while at the same time 20 percent of urban people have at least 14 years (so urban population is better educated). The point is this: the brute fact of a difference in the means is not particularly insightful in estimating the educational resources of the two populations.
Second, there are important sociological questions about the internal differentiation of the population into groups with very different educational patterns. Do women and men show different profiles in the two large populations? How about members of ethnic or racial groups? How about groups identified by income, wealth, or home ownership? What about groups defined by whether a parent had attended college? Arriving at this set of questions requires sociological imagination. The investigator needs to consider what internal differentiations within the large population might affect the sub-group's educational characteristics.
Finally there is the question of finding possible causal explanations of the differences that are discovered across major populations (urban and rural) and within sub-populations (ethnic, gender, or class-defined groups), and tracing out some of the ways in which these patterns in turn cause other social outcomes (future inequalities of income, for example). Those causal mechanisms might be various: differences in the opportunities that are presented to members of different social groups (including the possibility of discrimination), differences in values and cultures, within families, differences in gender treatment, differences in religious traditions and practices, and differences in access to resources, to name a few.
Now suppose we have done quite a bit of empirical, theoretical, and causal analysis along these lines. Suppose we have found that the internal structures of eduational attainment statistics are quite different between urban and rural populations; that the taxonomy of sub-groups is different; and that the causes and social mechanisms of the differences in attainment across the rural-urban divide are substantially different as well. What salience does the original brute fact continue to have (that the means are different in the two populations)?
We might say that the brute fact is in fact a valid empirical observation; that it needs to be substantially further analyzed; and that the genuinely valuable and insightful sociological findings only emerge once we have further disaggregated the statistics across salient groups and have provided some hypotheses about the mechanisms that influence the educational attainment profiles of the various sub-groups. At that point we have some idea of the underlying sociology that produces the brute fact. But the brute fact itself is largely unilluminating.