Author: Robert Kabacoff
Audience: Statistical practitioners
Reviewer: Janet Swift
With everyone wanting to be a data scientist now is the time to learn R. Is R in Action the way to do it?
This is the second edition of a book that we thought a lot of when it was first reviewed back in 2011. It is still good!
This book could just as easily be reviewed in the Mathematics books section because it is very heavy on statistical practice. This isn't a bad thing, but if you are a programmer looking for a primer on R as a language you might need to look elsewhere.
This is not to say that the book is light on details of the R language, but the account is angled towards the practitioner wanting to get to grips with actually doing some statistics as quickly as possible. It really does answer the question of "how do I" and especially so if you are familiar with any other stats package.
The book is divided into four sections - Getting Started, Basic Methods, Intermediate Methods and Advanced Methods - titles that don't tell you much about what they contain!
Getting Started goes over the basics of working with R. At its end you will know what R is all about, how to install some data and plot some charts. The chapter on creating a dataset takes you quickly through the data structures that R supports, but from a user's, rather than programmer's, point of view. There are some nice asides aimed at the programmer such as telling you that R makes use of terminology in ways that you might not be used to. The final two chapters in the section expand on this to include details such as transforming and generally manipulating the data - which is a big part of any statistical analysis.
The book has a slight tendency to assume that you will remember anything that has been introduced earlier and doesn't often bother to make a reference either forward or back to where you can find out more about something. It's not a big problem, however, because you can always go hunting for yourself.
Once you get out of the first part the emphasis is very much on the statistics. Part II describes how to create basic charts and perform basic statistical tests - t-tests and equivalent non-parametric tests. My guess is that this is the section of most use to the stats beginner who is trying to solve a homework problem using R.
Part III of the book begins to get more advanced and shows you how to perform regression and an Anova using R. This is to be welcome because R doesn't do these things in the same way as a package like SPSS say. The final part of the section deals with power analysis - i.e. how big a sample do I need, and resampling - both of which are unusual as "intermediate" topics but it is good to see them introduced so early. There is also a chapter on intermediate graphs and one on resampling procedures.
If regression and Anova are intermediate topics what can be waiting for the reader in an advanced section?
The answer is the generalized linear model and principle components/factor analysis, which are indeed advanced statistics even if they are basic tools in some disciplines.
Chapter 15 is all new and covers a topic that was ignored in the first edition - time series. This starts simple with getting time series data into R and moves up though smoothing and seasonal decomposition finishing up with ARIMA models.
Also new are the chapters on clustering and classifying. They cover the basic techniques that you should know hierarchical clustering, k-means, logistic regression, decision trees and support vector machines but not classical discriminant analysis.
The final two chapters are also interesting - advanced missing values and advanced graphics. The treatment of missing values is worth buying the book for on its own - it is a topic that is all too often ignored.
Don't buy this book if you really don't have a clue about statistics - it isn't a statistics primer.
What it does is to assume that you know much of the theory, filling in some gaps for you, and then shows you how to do the job using R. Generally the description of how to achieve some task or other includes enough asides and comments to make you think about the task and perhaps even invent your own way of doing the job.
It also isnt' a good book to buy if you are looking for something to solve your programming problems - it isn't good on the specifics of importing data from a particular application, for example - but there are cookbooks for this and you can always search the web for a pre-programmed solution.
If you want to discover how to do statistics using R then this is a really good place to start. Highly recommended if you know stats and not R.
To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.