Mathematical Malpractice Watch: Torturing the Data

There’s been a kerfuffle recently about a supposed CDC whistleblower who has revealed malfeasance in the primary CDC study that refuted the connection between vaccines and autism. Let’s put aside that the now-retracted Lancet study the anti-vaxxers tout as the smoking gun was a complete fraud. Let’s put aside that other studies have reached the same conclusion. Let’s just address the allegations at hand, which include a supposed cover up. These allegations are in a published paper (now under further review) and a truly revolting video from Andrew Wakefield — the disgraced author of the fraudulent Lancet study that set off this mess — that compares this “cover-up” to the Tuskegee experiments.

According to the whistle-blower, his analysis shows that while most children do not have an increased risk of autism (which, incidentally, discredits Wakefield’s study), black males vaccinated before 36 months show a 240% increased risk (not 340, as has been claimed). You can catch the latest from Orac. Here’s the most important part:

So is Hooker’s result valid? Was there really a 3.36-fold increased risk for autism in African-American males who received MMR vaccination before the age of 36 months in this dataset? Who knows? Hooker analyzed a dataset collected to be analyzed by a case-control method using a cohort design. Then he did multiple subset analyses, which, of course, are prone to false positives. As we also say, if you slice and dice the evidence more and more finely, eventually you will find apparent correlations that might or might not be real.

In other words, what he did was slice and dice the sample to see if one of those slices would show a correlation. But by pure chance, one of those slices would show a correlation, even there wasn’t one. As best illustrated in this cartoon, if you run twenty tests for something that has no correlation, statistics dictate that at least one of those will show a spurious correlation at the 95% confidence level. This is one of the reasons many scientists, especially geneticists, are turning to Bayesian analysis, which can account for this.

If you did a study of just a few African-American boys and found a connection between vaccination and autism, it would be the sort of preliminary shaky result you would use to justify looking at a larger sample … such as the full CDC study that the crackpot’s own analysis shows refutes such a connection. To take a large comprehensive study, narrow it down to a small sample and then claim the result of this small sample override those of the large one is ridiculous. It’s the opposite of how epidemiology works (and there is no suggestion that there is something about African American males that makes them more susceptible to vaccine-induced autism).

This sort of ridiculous cherry-picking happens a lot, mostly in political contexts. Education reformers will pore over test results until they find that fifth graders slightly improved their reading scores and claim their reform is working. When the scores revert back the next year, they ignore it. Drug warriors will pore over drug stats and claim that a small drop in heroine use among people in their 20’s indicates that the War on Drugs is finally working. When it reverts back to normal, they ignore it.

You can’t pick and choose little bits of data to support your theory. You have to be able to account for all of it. And you have to be aware of how often spurious results pop up even in the most objective and well-designed studies, especially when you parse the data finer and finer.

But the anti-vaxxers don’t care about that. What they care about is proving that evil vaccines and Big Pharma are poisoning us. And however they have to torture the data to get there, that’s what they’ll do.