By Sierra Rayne —— Bio and Archives July 15, 2013
Comments | Print This | Subscribe | Email Us
“Voluntary participation, however, conflicts with the methodological principle of representative sampling. Given the choice, certain types of people (e.g. those with lower levels of education, from non-English-speaking backgrounds) are more likely than others to decline to participate in surveys and can result in biased samples. However, compulsory participation is not the solution. Although compulsion might minimize bias it will undermine the quality of the responses.”This is a very real problem in statistics. For verifiable variables, we can effectively sample the population using voluntary and mandatory surveys and then compare the results to fully audited investigations. Such is feasible for details such as income, age, sex, etc. But what about very personal details of people’s lives? How will we ever truly know how much housework a certain segment of the population does in a particular region, or whether they have difficulty dressing or bathing? Without an in-depth monitoring study, we may never know, and such an audit would be prohibitively time consuming and expensive. Imagine what the government audit for your claims about having difficulty dressing and bathing may look like? Do they want a dressing and bathing demonstration and a test to verify my census reply? What penalties will I receive if I lie about having difficulty dressing and bathing on a mandatory long-form census? Consequently, many (i.e., almost all) social scientists fail to address the looming elephant in the room. Much of the data collected by census agencies and their analogs in other government departments is unreliable for the simple reason that a substantial portion of this data is entirely unverifiable. Regardless of whether a survey is mandatory or voluntary, if you ask someone a question to which you cannot reasonably verify the answer, then you have no idea as to the accuracy of the data. In some cases, it would take exceedingly large amounts of time and money to verify/audit the data acquired, and in other cases (e.g., subjective questions), verification/auditing is impossible. Thus, we have absolutely no idea whether our resulting dataset is accurate. People lie (because they ‘just want to’, because they see personal benefits from artificially tilting a survey’s results, etc.), people often just make answers up because they are either too lazy or simply do not want to think about the answer long enough to provide an accurate one, and -- of course -- people often do not know how to accurately characterize the answer to a question. Nobody appears to be rigorously accounting for these issues when using much of the government’s data, or when the government decides to spend taxpayer money to acquire these types of flawed data. It’s a fantasyland of assuming the underlying data is valid, when it may not be. So, when we ask individuals -- even in a mandatory survey -- what the status of their plumbing is, or how many hours per week they play with their children in a park, or whether they experience food insecurity (whatever that really means in the West), we have no idea whether the dataset we’ve acquired is reliable or not and/or which parts of it are unreliable or not. It’s essentially junk data, and census agencies and other government departments -- well, almost all researchers in the social sciences and humanities -- have been generating it for decades. So when researchers speak of "information-rich surveys" by census agencies, we must not get confused. There is a lot of data, but often little information as the data is unreliable. Now is also the right time to address this type of comment that proponents of the long-form census often make:
“Evidence-based policy-making requires just that -- evidence -- standard, reliable metrics whose quantification and legitimacy is widely agreed upon. In their absence, policy-making at all levels and in every sector will be as expensive as it is hopeful, while policy actors are forced to gingerly ‘guess and check’ over time. In the absence of good data, our ability to fully comprehend complex policy issues will grow anecdotal and inconsistent.”Guess what? We’ve been making poke-and-pray policies for a long time using census datasets and other surveys. Why? Because this so-called evidence is unreliable -- it is most often just heresay, and as such, it should not be admissible in public policy formulation. An absence of data is better than a wealth of bad data. I am reminded by the following quotes that appropriately describe how to approach a real lack of accurate information, and which are lessons to be heeded by long-form census proponents: “Knowledge of non-knowledge is power” by Deaner in Fubar 2 and Donald Rumsfeld’s “Unknown unknowns” statement. When knowledge limitations are ignored, we end up with the following claims that probably also apply to studies founded upon much of the problematic long-form census data: “60% of the time it works all the time” by Brian Fantana in Anchorman: The Legend of Ron Burgundy. Thus, among all these calls for evidence-based policy making and how conservative administrations are purportedly anti-science, one actually finds the shoe is on the other foot, and that many of the so-called pro-science individuals are actually anti-science and/or pro-junk science. It is, in fact, the Death of Evidence (or, more accurately, The Proliferation of Junk Evidence) that we are finally climbing out of in some nations. Long-form census proponents are correct in stating that “[i]n the absence of good data, our ability to fully comprehend complex policy issues will grow anecdotal and inconsistent.” That has been the situation for some time (i.e., we’ve had bad data that many have claimed as good data), and it undercuts the proponents’ claims regarding the necessity of much of the long-form census data. Policy actors have always had to “gingerly ‘guess and check’ over time,” and unless we develop and implement mass-scale mind-reading devices and/or a Big Brother-esque state that is all-knowledgeable at a proven factual level, this will likely always be the case. What’s truly ridiculous is that we’ve been generating and using this junk data for so long. When coupled with its threats to privacy and liberty, hopefully the bad science behind much of the long-form census will help us put a stake through the heart of this government mandated nonsense.
Sierra Rayne holds a Ph.D. in Chemistry and writes regularly on environment, energy, and national security topics. He can be found on Twitter at @srayne_ca