Author: Drew Conway and John Myles White
Aimed at: "hackers"
Pros: Practical approach to stats
Cons: Mostly off topic and lacks explanation.
Reviewed by: Mike James
Did any machines undertake any learning for the purposes of this book on a current hot topic?
Machine Learning seems to be the next big thing - well the next most talked about thing at least. A record breaking 104,000 students enrolled in last autumn's free online class taught by Stanford University Professor Andrew Ng and the same course, now under the auspices of Coursera is being repeated starting April 23, 2012.
This book seems to promise to make Machine Learning accessible to the rest of us if you interpret "hackers" to be uneducated but otherwise skilled programmers.
Perhaps the most important thing to say about this book is that it uses R. If you don't know the statistical language R then it is worth getting to know but if you were hoping for algorithms in Python or some other mainstream language you are out of luck.
Chapter 1 is all about getting, installing and generally using R. In this chapter you learn to load some data and draw a few charts. Chapter 2 carries on in the same way with a look at data exploration. This is often an important first phase in any machine learning project. If you don't know your data then picking a machine learning algorithm is more difficult. However what we have here is a fairly standard exposition of basic statistics - mean, media and mode, percentiles and a range of data visualization technique. You need to know this stuff but it isn't machine learning and it is very, very basic statistics.
Chapter 3 seems to be where things get started with a look at a binary classification problem as part of spam filtering. However what we are presented with is a very basic introduction to conditional probability and then the construction of an estimated Bayes classifier. This is all explained at a very functional level and all seems very obvious. So much so that Bayes Theory isn't explained nor is there any mention of optimal behaviour of the population Bayes classifier, bias in testing or anything that might enlarge the readers world view. This isn't just empirical Bayes it's also empirical classification.
Chapter 4 moves on to ranking which I have to admit left me puzzled. It seems to be a heuristic approach to constructing a ranking function applied to emails to determine their importance. It was interesting but I'm not sure what general point is being made or how the idea generalizes.
The next chapter moves deeper into standard statistics - regression. I don't know many AI people who would consider regression to be machine learning. It is, of course something you need to know about but it is classical statistics. The chapter goes into the standard ideas of least squares and fitting a model. It avoids any real theoretical presentation of linear regression - presumably in an attempt to keep things simple but then goes and spoils it by using terms such as "monotonicity"! Towards the end of the chapter it goes into p values and F statistics and other parts of the machinery of significance testing, but doesn't even attempt to explain what these are about. The chapter closes with a quick look at logistic regression. None of this would be out of place in a statistics textbook.
Chapter7 is about optimization and it shows how optimization routines could solve the classical least squares problem and then the more modern ridge regression situation. Next we use optimization to solve a coding problem. For this we need to apply the Metropolis algorithm - but no where is it made clear what this is.
Next we have another chapter on classical multivariate statistics in the form of Principle Components Analysis or PCA. In this PCA is put forward as an archetypal unsupervised learning method - well that's not the way I think of it. As dimensional reduction yes, as feature extraction yes - but unsupervised learning, only if I have to.
Chapter 9 moves into multidimensional scaling which is related to PCA in that it is another dimensional reduction technique. In this case it is presented as a clustering method which is a bit of a sledgehammer to crack a walnut. There are simpler and more direct clustering techniques that the reader needs to know about first. The next chapter introduces the k-Nearest Neighbor classifier as a recommendation system. The idea is introduced simplistically and there is no discussion of how k effects the smoothness of the decision surface.
The next chapter uses the Google Social Graph API which is now no longer operational. You can download some test data to see how it all would have worked however.
The final chapter is a bit of a revelation. It compares methods and introduces the Support Vector Machine. Now this is a real machine learning tool, but why have we had to wait until the end?
Put simply, this book isn't about core machine learning. It is about classical statistical techniques applied in a naive way i.e. without a full understanding of the theory that makes it all work. This is not a bad approach and if you want to learn some practical stats in R then you might get something out of the book. However if you end up thinking that these are the main techniques of machine learning you would be missing out on - cluster analysis, discriminant analysis, the perceptron, neural networks, EM algorithm, Bayes networks and many more. It isn't that these techniques are any more difficult than what is presented in the book - as other books on the same topic demonstrate.
In other words there isn't a lot of "machine" learning going on in this book.
It also doesn't give you any sort of framework to place your ideas in. It mentions supervised and unsupervised learning but never really explains what sort of approaches there are. It is a book that mostly explains how specific problems were solved rather than illustrating grand principles that you can use to solve new problems. For example, in the code optimization example the Metropolis algorithm is used but no good explanation of what it is, what it does or why you might need it is given. How can you possible notice that you have a problem that might benefit from its application?
Overall give this book a miss unless you want something that provides a lot of practical examples of statistical analysis in R. If you fit the niche audience of R programmers who do want to do things that are vaguely AI oriented then this might be a worthwhile read.So to a large extent this is a book with a very misleading title.