One thought on “Mistakes?”

  1. The article refers specifically to research in epidemiology, the methods of which have been subject to growing criticism recently. The argument goes something like this: Suppose we do 1000 studies, and suppose that there are 100 real effects to be found. A typical experiment would have a probability of type-II error (beta) of 0.5, so there will be 50 real results reported. Standard practice in epidemiology is to accept a probability of type-I error (alpha – also sometimes called a “p-value”) of 0.05. That means there will be 950*0.05 = 48 cases where there was nothing to find, but the experiment returned a “statistically significant” result nevertheless. So, in this simple example nearly half of the reported results are wrong.

    This sort of analysis depends critically on the fraction of real effects we assume to be lurking in the data. If we put on our Bayesian hat, this is just our old friend the prior. Nobody really knows the prior, but it seems fairly clear that it depends on the way you choose the experiments. If you have some sort of reasonable theory that predicts a relationship between two variables, and you perform studies to validate only the relationships predicted by your theory, then the prior is likely much higher than the 0.1 we assumed above. If you choose relationships at random, 0.1 might be wildly optimistic. With the advent of large health databases, it has become very inexpensive to choose a lot of variables at random to study for correlations, and a cottage industry has sprung up doing just that. The result is zero credibility for most epidemiology studies.

    Interestingly enough, if you look at the *strength* of these correlations (i.e., how much more likely does factor A make you to develop disease B), they are usually very weak. A 40% increase in risk sounds terrible until you realize that it takes your odds from, perhaps 1:10,000 to 1.4:10,000. Contrast that with serious health risks, which might make you 40 *times* more likely to develop the disease. Moreover, if you compute a confidence interval on these weak studies, you wind up with something like 1.4 +/- 0.5, which means that the likely posterior distribution includes the “no effect” result.

    This all leads to my personal rule of thumb on these matters, which is that if you have to use Serious Statistics (TM) to pull out a result, you probably don’t have the data to justify any conclusion. At best, you’ve identified a promising candidate for further study. Real results with good data tend to leap out of the analysis so prominently that the statistical analysis is just a formality.

    -rpl

Comments are closed.