|Bayesian Methods for Hackers|
Author: Cameron Davidson-Pilon
Probabilistic programming with Bayesian inference could be the next ground breaking technology, so a book on the topic is welcome.
The book's subtitle is fairly accurate "Probabilistic Programming and Bayesian Inference". The problematic part of the title is in the use of the term "Hackers". What exactly is a "hacker" when used in connection with advanced statistics and programming? From the contents of the book I have to deduce that it means a reader wanting to use advanced ideas without the need to dig deep into the theory - and especially into the math. In fact there is a lot of math in this book but it doesn't take a leading role in anything. Instead it is introduced where it cannot be avoided. This is a shame as personally I find clear mathematics to be the best way of explaining and understanding any complicated or difficult idea. The author clearly could have taken a different approach as math seems to leak out more and more as the book goes on.
The most important thing to say about this book is that it is on an important topic. Bayesian statistics is an alternative approach to the problem of inference in uncertain conditions. There are huge philosophical difficulties with the Bayesian approach. Many Bayesians are happy to work with probability as if it was a measure of belief.
Adherents of the most quoted alternative approach, the frequentists, only use probability in its more simple and direct physical interpretation. A probability is the long run frequency of an event. A frequentist would never apply a probability to an event that can only occur once. To them the question "what is the probability of life on other planets" is not about probability, but to a Bayesian it is.
There are lots of other problem with the Bayesian approach. For example, you start with how much you believe something and convert this into an estimate of your new belief after incorporating some data that either makes your belief stronger or weaker. The problem is how much is your initial belief? This is the problem of selecting a "prior" and it is both practically and philosophically difficult.
All of this deep thinking and worrying about the philosophical difficulty of the Bayesian approach is described in just a few pages. This isn't adequate, but if you are a "hacker" it is just possible that all you want to get on with using the techniques rather than worrying about the deeper idea.
What the first chapter does is to give you a small experience of using Bayesian reasoning along with PyMC - a Python library for probabilistic programming. Providing just a flavor of a topic is typical of the book, even if it is sometimes hard to detect. If you put the philosophical problems aside, the real problem with Bayesian stats is working things out. Probabilistic programming, which is based on a fairly recent approach called Monte Carlo Markov Chain or MCMC, makes general Bayesian modelling much, much easier. In fact, you could say that it has, and is, bringing about a revolution in the usefulness of the Bayesian approach. The example used is interesting, especially if you are a frequentist, because of the ease with which a complex question is quickly formulated as a Bayesian model. However, if you are hoping for a clear explanation of how PyMC works, or should be used, then you are going to be disappointed. You are given the flavor, but this is not a clear systematic explanation.
Chapter 2 promises more on PyMC but it is still confused. Part of the problem is terminology. When the term "variable" is used it could mean statistical variable, Python variable or PyMC "variable" which is in fact an object/class. I'm sure it is possible to explain the logic that is inherent in PyMC, but this chapter, and overall this book, doesn't do it. At best you are going to be getting a rough idea about PyMC from using it.
By Chapter 2 you also encounter another problem with the book's approach. Most of the descriptions try to avoid deep mathematical explanations by making general observations about how things work and yet there are lots of equations in the text which you are expected to understand. I think that if you can understand the equations you could also benefit from having them used in explanations of how it all works. Math doesn't have to be obscure.
Chapter 3 moves on the inner workings of MCMC - only it doesn't. The description of MCMC is recognizable as such only if you already have some idea what MCMC is all about. Trying to fathom in what sense things "converge" to a solution, or even what generating samples is all for, is difficult. Then to move on and discuss ways of improving convergence or diagnosing convergence is a bit of a waste of time. Of course, if you do have an idea of what MCMC is all about then it's, surprisingly useful information.
From here the book becomes increasingly idiosyncratic and personal. Chapter 4 introduces the law of large numbers as a guiding principle. Then on to matters that connect the frequentist approach to Bayesian practice and how small samples can mislead. Chapter 5 introduces the idea of a loss function, but in a half-hearted way with most of the math stripped out - but not enough of the math to make it easy to read.
Chapter 6 deals with the problem of what prior to use and the distinction between objective and subjective priors is introduced. The emphasis seems to be on how to pick a good prior but the discussion doesn't go far enough for you to solve the problem - that's because there isn't a solution that everyone agrees on. The end of the chapter suggests that the prior you actually choose doesn't matter as long as the sample size is large. The idea that Bayesian models are often used in an iterative context, and in this case you can think of the prior as a sort of transient behavior that dies away, isn't really discussed.
The final chapter focuses on the very specific application of A/B testing and the Bayesian t test in particular.
You might think that I hated this book. Not so. There was a lot I enjoyed reading and it made me think. In particular, the examples did succeed in making me understand how powerful PyMC is and probabilistic programming in general. However, I start out being reasonably well versed in statistical theory - I'm a frequentist who is happy to use Bayes methods. I am not so clear on MCMC and PyMC is new to me. As a result I got a lot from reading this book, but it was very clear that if I knew nothing about statistics then I would be struggling to understand the important ideas.
For a potential practitioner with a good background in stats, this book gives a taste of what it is like to go down the probabilistic programming road. The sad thing is that if it had embraced the math more whole-heartedly the book would be much more useful. Only buy a copy of this book if you already know a lot of statistics and want to move into Bayesian probabilistic programming and still be prepared to do a lot of additional reading around the topics mentioned.
To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.
|Last Updated ( Tuesday, 02 February 2016 )|