|Statistics Done Wrong|
Author: Alex Reinhart
Statistics - we all know they are close to lies but could it be that it is because they are just "done wrong".
This is a really interesting book, but not everyone is going to get much out of it. The problem with the book is that it doesn't go in for explaining core statistical ideas. It does tackle some ideas like the all important idea of significance, but only in a round about way. You won't find a point-by-point explanation of anything at all in this book.
There is a lot of discursive narrative and examples that are great fun and educational if your grounding in statistics is fairly reasonable. In fact the better your grounding, the more you are likely to get from reading the book. The only problem with this is that the book is clearly aimed at practitioners whose first subject isn't stats. It has a lot of examples from biological sciences, medicine and social stats - this is not a problem if you don't have any of these subjects as your background because the important ideas are in the higher level details of the experiments.
Chapter 1 kicks off with the all-important topic of significance and p values in general. The explanation is very informal and basically comes down to "p is the probability that you got the result by chance alone". It does mention the Neyman-Pearson and explains significance in this context, but not the other side of the coin power - even though this is going to be a major topic in the rest of the book. Power is discussed in the next chapter but it would be better to explain the two together as you can trade one off for the other and together they characterize the "goodness" of an experiment. Next up we have the idea of a confidence interval and this is discussed in detail, but the exact definition is given in just one line for a 95% confidence interval:
"If you run 100 identical experiments about 95 of the confidence intervals will include the true value you're trying to measure."
After this, confidence intervals are praised as being better than significance tests, but the actual idea of the confidence interval is not explored. For example the middle value in a confidence interval isn't any more important than any other point, there are lots and lots of ways of defining confidence intervals but we generally want the smallest. In short there are lots of problems with confidence intervals but these are not the subject of this book.
From here we move on to look at power. Basically Alex Reinhart discusses the mistake of taking too small a sample to be likely to detect an effect even though it is there. Initially the idea is explained using coin tossing, which is good. The rest of the chapter is a collection of ways in which using under-powered experiments can lead us astray. If you don't have a clear understanding of significance and power you might well miss the important point of failure in each one. In particular, you might fail to follow the general idea that low power tends to give rise to truth inflation. Which roughly corresponds to the idea that for a low powered test to be significant you have to have some big values just by chance. This underlying idea is contained in the explanation, but it is difficult to extract it.
Chapter 3 is on the perils of replication. Basically, is doing the same experiment 10 times on one subject as good as doing it once on 10 subjects? No of course it isn't, but it is very easy to find situations where things aren't so clear and obvious. At this point any statistician will be crying - "ANOVA!" and perhaps "Experimental Design". A good knowledge of either and the whole idea of simple replication and pseudo replication go away and you can't make the mistake. The chapter does explain some of these ideas but not in sufficient depth to make the reader see the possibility of the rich deep theory that gives rise to models like A+B+A.B.
Chapter 4 take up the attack on significance values once again. This time the enemy is the base rate fallacy. In this case this is suppose to be the degradation in significance values when multiple tests are performed. I can remember a psychologist asking me why they seemed to get more significant results when the correlation matrix was large - they didn't like the answer. In this case the crude solution is to use the Bonferroni correction of using s/n where n is the number of tests for the significance level. No researcher likes this method because it reduces the number of significant results to nothing. Of course the reason is that it doesn't take the correlations between the test statistics into account and the correct way to deal with the problem is to use multivariate testing, but who uses multivariate testing!? There are some nice examples in this chapter but it is difficult to follow the arguments.
Chapter 5 is weird. It is about ways in which naive thinking can make intelligent researchers conclude something is significant when it isn't - for example overlapping or non-overlapping confidence intervals. Interesting but not generalizable, more a collection of silly mistakes.
Chapter 6 is an informal discussion of performing statistical tests after you have looked at the data or using a stopping rule to limit any damage a trial might be creating. It also points out the problems with picking subjects based on some performance criterion and the effect that regression to the mean has on future results - a very common design problem. For example, the best performing schools tend to show a decline in performance in the coming years.
Chapter 7 is about continuity errors referring to the fact that when you change an interval or metric variable to a categorical variable you lose information. Reinhard discusses "dichotomization" which is the case for two variables. Extending it to n variables, the problem could be termed "discretization".
Chapter 8 is the final stats-heavy chapter in that it deals with the misuse of models such as regression, the problem of over fitting and the use of cross validation to detect it. As you might expect, step-wise regression comes in for a particular bashing.
The final four chapters are on what you might call the sociology of statistics. How researchers wider experimental procedures make things difficult for statistics. How mistakes happen and how to keep your results unchallengeable by hiding the data. The final chapter describes what can be done about the poor situation. The answer is, of course, obvious - teach an understanding of statistics and probability. Easy to say, difficult to do.
For a small book this covers a lot of territory, often not in a clear and analytic way. The examples are interesting but the are often not clear enough for an innocent reader to be sure that they get the point. If you don't know stats then you are likely to be worried by the content but not really be confident enough to know why and not informed enough to avoid similar problems. Notice that the book is about the frequentest Neyman-Pearson approach to stats - there is nothing at all about Bayesian methods.
If you already know enough stats to follow, it is a good read and should convince you that we need to teach statistics much better than we do.
|Last Updated ( Friday, 23 June 2017 )|