|Don's Home Science Statistics|
We are constantly inundated by studies that show some medical supplement is good for you and a couple of years later another study showing it is bad for you.
There are a lot of bad methodologies in these studies let alone bias depending on who is funding the study.
One common problem is using correlation to imply cause and effect.
A common example of cause and effect is smoking causes an increase in the risk of developing lung cancer.
However correlation does not imply cause and effect.
The common example of the logical fallacy is a correlation between drowning deaths (or shark attacks) and ice cream consumption.
These are a special case of Multicollinearity (also collinearity) where two or more predictor variables e.g. obesity and lack of exercise, in a multiple regression model are highly correlated.
You can't tell if the dependent variable e.g. heart disease is caused by one or the other or both.
I took an econometrics course from one of the leading econometrics experts. He had a mission to reduce these kind of mistakes in research in a variety of fields, medicine, economics, business, ...
As a graduate student I was a research assistant for several professors who frequently misused statistics, but still got their papers published.
As data becomes more readily available via the Internet it is tempting to run multiple regressions on many data sets and then concoct a theory for two or more variables that are correlated.
Spurious correlation is especially likely to occur with time series data, where two variables trend upward over time because of increases in population, income, prices, or other factors.
My personal gripe is people telling me, a widower for many years, that I would live longer if I got married again. Statistics show this.
Someday I'm going to get the data on health, which is omitted in these studies, and see what it shows.
See Marriage and Longevity
Another one I'd like to research is the correlation between evangelical Christians and people who voted for Donald Trump. Trump got 81% of white evangelical or born-again Christian vote.
Correlation computes the value of the Pearson correlation coefficient, r. Its value ranges from -1 to +1.
Linear regression finds the best line that predicts Y from X.
It quantifies goodness of fit with R2, which ranges from 0 to 1.
"How to Lie with Statistics" Darrell Huff, Irving Geis, 1954M
In the 1960s and 1970s, "How to Lie with Statistics" became a standard textbook introduction to the subject of statistics for many college students.