WhatFinger

The left now dominates politics, academia and the media, Whether the issue is global warming, racism, political polls or whatever, whenever the left presents data, you can be sure it's been manipulated to promote their agenda

Simpson's Paradox, Statistics can be Creative


By Dr. Alexander Nussbaum ——--July 7, 2019

American Politics, News | CFP Comments | Reader Friendly | Subscribe | Email Us


Simpson's Paradox, Statistics can be CreativeMy intent here is not to teach statistics, but rather to alert readers that statistics are being misused to offer evidence for desired results by the liberal-marxist establishment which does not recognize the existence of objective facts. Both in 1995 and 1996, David Justice, then of the Atlanta Braves, had a higher batting average than Yankee Derek Jeter. So it stands to reason that if the at bats and hits for the two season are combined, Justice would have a higher overall batting average than Jeter. Would you not agree? But if you did, you would be wrong! Jeter vs. Justice
This example, discovered by Ken Ross, a mathematics professor, who has written on sabermetrics, has become a common illustration of Simpson's paradox. In this case the seeming paradox is a fluke resulting from the nature of percentages. Note that most of Jeter's at bats came from the year he hit .314 and most of Justice's at bats came from the year he hit .253. Here there is no confounding third variable to the hit- at bat relationship. It makes perfect sense to combine results to give the more accurate result as to these ballplayers' batting averages. Jeter batted .310 lifetime, and Justice batted .279 lifetime. Non-statisticians should find the above result puzzling. Statisticians know that this is simply Simpson's paradox, also and perhaps more properly called Simpson's reversal. Far from being a paradox, Simpson's reversal is a well understood attribute of correlation not proving causation, and familiar to every statistician. As will be seen below, in cases where a Simpson's reversal occurs, whether we should base our conclusion on the the combined table or the disaggregated tables (i.e. marginals or conditionals) depends on our knowledge of additional relevant variables. Simpson's Paradox is caused by a a confounding third variable and/or data from unequal sized groups being combined into a single data set. Unlike in the baseball example above, the way to avoid it is often not to combine data sets from different sources.

History of Simpson's Paradox

Like many concepts in statistics, "Simpson's paradox" was repeatedly discovered by different statisticians and at different times. Edward Simpson (1922-2019) was a British statistician who wrote about the phenomenon in 1951. At not yet quite 20 years of age, he was recruited into the famous Bletchley Park codebreaking team which was instrumental in winning Word War II. The name in his honor was coined in 1972. But the phenomenon was also described in 1903 by British statistician Udny Yule (1871-1951) using an imaginary example where a worthless "cure" could be seen as effective due to a sex-related difference in mortality rates. The concept of correlation itself dates only from 1888, so it didn't take much time for the effect that would become known as Simpson's paradox to be noticed. Note how recent all the dates here are. Statistics, a requirement for science, was invented just yesterday, and had to wait for the acceptance of randomness as a real phenomena. In contrast sophisticated mathematical notions are four millennia old. In about 1800 B.C. the ancient Babylonians knew the concept of Pi and had a very good approximation for it.

Support Canada Free Press

Donate

Bias at Berkeley?

UC Berkeley is the very symbol of Anti-American, Anti-Semitic, leftist extremism in academia. But in 1973, UC Berkeley feared a suit for gender bias (that it was actually sued is a much propagated urban myth). Its graduate school admission figures seemed to clearly show bias against women. Just read these damning figures. In fall 1973, 8,442 males applied, of which 44 percent were admitted, but of 4,351 female applicants, a mere 35% were accepted. An open and shut case of sexism! However when looked at department by department, more departments were biased in favor of woman than for men. And when the totals were corrected for confounding there was a small but statistically significant overall bias in favor of women. The misleading aggregate results were due to Simpson's paradox. Women disproportionately applied to departments that had a low acceptance rate, and men disproportionately applied to departments that had a high acceptance rate.

Everything's Bigger in Texas, even Salaries

Leftist economist and former secretary of labor Robert Reich has often knocked Texas, saying it has "among the nation's lowest taxes, least regulations and lowest wages...the median hourly wage there was $11.20, compared to the national median of $12.50 an hour". This was his argument that taxes and regulations ought to be increased across the country. Texas may have some of the nation's lowest taxes and least regulations, but the data on wages he used actually shows quite the opposite when we understand Simpson's paradox. The facts were that in Texas, Whites, Hispanics and African Americans, in other words every racial group, each have higher wages than Whites, Hispanics and African Americans in the rest of the country. The lower overall median wage was totally due to Texas having a much larger percentage of recently arrived Hispanic workers than the nation as a whole. Reich's claim was a classic example of drawing the wrong conclusion due to confounding and a classic Simpson's paradox.

Are Salaries Falling?

Between 2000-2013, even adjusted for inflation, the median US salary rose. But within every educational level subgroup, the median wage has actually fallen. This was true for high-school dropouts, high-school graduates, those with some college education and those with a college degree. And this was how many outlets reported the story. But the key fact was that the percentage of the workforce with a college degree increased over those 13 years. As having a college degree became less "special" the median income for those with a degree did fall, but as they still earned more than those without a degree, and their rising percentage propelled the overall US salary to rise. Let us use simple hypothetical data that will clearly illustrate what happened above. A package delivery company has 100 employees. One year wages totaled 3.5 million dollars for an average salary of 35 thousand. The following year wages totaled 5 million dollars for an average salary of 50 thousand. While median is generally used with salary data, for simplicity we are using the average. It appears, and appears correctly, that average salary greatly increased. But say this company has two types of employees, senior staff and trainees.

Subscribe

Are Salaries Falling? Almost all the trainee employees were promoted to senior status, which in fact accounted for the great pay raises. Note that average salary for both senior employees and trainees declined from the first year to the second year. But that is just a fluke due to confounding by employee status. Anybody using this data to prove the company is lowering salaries is propagating a falsehood. And if they have any statistical knowledge they are deliberately propagating a false hood.

Only One View is Tolerated in Academia

The following warning is on the Public Health Institute website:
"...ominously, knowledge of Simpson's paradox can be intentionally used to present or emphasize results that support a desired conclusion, when that conclusion is not valid."
Psychology and the social sciences have been hard hit by the "replication crisis" . The most interesting and seemingly important findings fail to replicate upon retesting. Voter registration data show that among psychologists, Democrats outnumber Republicans 17.4 to 1, and research has found that in college psychology departments liberal academics outnumber conservatives by a 14 to 1. Given the liberal creed to posit the correct-think conclusion first, and dig up evidence afterwards, could this be one of the reasons for the replication crisis? A study in the British Psychological Society's Research Digest found evidence for a "ideological extremity effect." "Ideologically slanted research, with conclusions that fit neatly into a political ideology, whether liberal or conservative, is less likely to replicate. It would be naive to believe that conservatives are immune to letting ideology contaminate data, but they are simply not in the halls of academia, especially in the areas of psychology or the social sciences, to begin with.

Epilogue

The left now dominates politics, academia and the media, Whether the issue is global warming, racism, political polls or whatever, whenever the left presents data, you can be sure it's been manipulated to promote their agenda.

View Comments

Dr. Alexander Nussbaum——

Dr. Alexander Nussbaum has had articles in a number of magazines including articles on intelligent design and on the history of statistics and is a contributor to a personality textbook


Sponsored