We are constantly inundated by studies that show some medical supplement is good for you and a couple of years later another study showing it is bad for you.

There are a lot of bad methodologies in these studies let alone bias depending on who is funding the study.

One common problem is using correlation to imply cause and effect.
Two or more variables considered to be correlated, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction).
There are statistical tests to measure the amount of correlation. See below.

A common example of cause and effect is smoking causes an increase in the risk of developing lung cancer.

However correlation does not imply cause and effect.

The common example of the logical fallacy is a correlation between drowning deaths (or shark attacks) and ice cream consumption.
A better example is a hypothetical study that shows a correlation between ice cream consumption and hay fever. Does ice cream cause hay fever? No ice cream consumption and pollen count are both correlated with warm weather. Warm weather causes higher pollen count which in turn causes hay fever.

These are a special case of Multicollinearity (also collinearity) where two or more predictor variables e.g. obesity and lack of exercise, in a multiple regression model are highly correlated. You can't tell if the dependent variable e.g. heart disease is caused by one or the other or both.
In the case of the hay fever example the problem was the real causal variable, pollen count, was not even included in the correlation model.

I took an econometrics course from one of the leading econometrics experts. He had a mission to reduce these kind of mistakes in research in a variety of fields, medicine, economics, business, ...

As a graduate student I was a research assistant for several professors who frequently misused statistics, but still got their papers published.

As data becomes more readily available via the Internet it is tempting to run multiple regressions on many data sets and then concoct a theory for two or more variables that are correlated.

Spurious correlation is especially likely to occur with time series data, where two variables trend upward over time because of increases in population, income, prices, or other factors.


My personal gripe is people telling me, a widower for many years, that I would live longer if I got married again. Statistics show this.
Someday I'm going to get the data on health, which is omitted in these studies, and see what it shows.
See Marriage and Longevity

Another one I'd like to research is the correlation between evangelical Christians and people who voted for Donald Trump. Trump got 81% of white evangelical or born-again Christian vote.
See:
2016 election demographics
Christian Right and Republicans


Statistical Tests:
Correlation computes the value of the Pearson correlation coefficient, r. Its value ranges from -1 to +1.
Linear regression finds the best line that predicts Y from X.
It quantifies goodness of fit with R2, which ranges from 0 to 1.


Books:
"How to Lie with Statistics" Darrell Huff, Irving Geis, 1954M
In the 1960s and 1970s, "How to Lie with Statistics" became a standard textbook introduction to the subject of statistics for many college students.

Links:
Examples for teaching: Correlation does not mean causation - Cross Validated
Marriage and Longevity
Introduction to Correlation and Regression Analysis | bu.edu
Granger causality - Wikipedia
Correlation, causation and forecasting | OTexts
Statistical Language - Correlation and Causation
GraphPad - FAQ 1141 - What is the difference between correlation and linear regression? Experimental Design

last updated 23 Mar 2017