Author: Paul Teetor
Aimed at: Users of R - programmers and statisticians
Pros: Good explanations with simple recipes
Cons: Title misleading, it's more than recipes
Reviewed by: Mike James
R is a statistical programming language and so you would expect a cookbook using it to be a lot of stats. However, R is also a general purpose language and so there are all the usual questions about how do you manage data, and convert this to that and so on and given that R is a list-oriented functional language perhaps more so.
There are lots of places in this cookbook where the author feels the need to abandon the cookbook format - and overall the book is better for it. There really isn't any need to invent recipe titles just so you can cover some topic or other.
Chapter 1 explains what R is and provides recipes for downloading and installing R. At the end you should be able to enters simple commands and find help. Chapter 2 follows on with a range of simple tasks - printing, managing variables, sequences, vectors and so on. All very simple and more tutorial than recipe.
Chapter 3 deals with the untidy side of R - OS interaction, packages, configuration and so on. Chapter 4 is also down to earth with a look at how to read and save data in various formats. Again not really complicated recipes, more tutorial introductions or hints and tips on how to do things. Welcome and yes I certainly learned a few things that I'd been too lazy to find out by reading the documentation.
It is Chapter 5 where the book really won me over. This is about data structures which is a big topic and something that confuses beginners. Here the author drops the recipe format and starts a tutorial on the subject. Starting from Lists the author explains the idea of mode and class and this is where a lot of R programmers fail to cope with the way R does data. Then on to scalars matrices and arrays in general. There is an attitude of surprise that R does things in this way and yes if you are a statistician or a mathematician it might be a surprise. If you are a programmer however you are probably going to be more impressed than surprised it is a neat way of working. When we reach the all-important data frames the author gives a list of why they are important to various types of user including an accurate but insulting jibe at executives:
To an executive
You can put names and numbers into a data frame. ... your staff will enjoy using data frames.
From this point we return to the recipe format and work through standard tasks such as appending data, working with factors, converting data types, merging tables and so on. Chapter 7 continues the theme of data types and deals with strings and dates - all fairly basic recipes but useful if you are new to R.
Chapter 8 is a slight change of direction and we start to look at the functions that let you work with probability distributions, combinations and generating random values from a distribution. Again the recipes aren't complicated but you might find them useful if your stats and probability theory is rusty. Chapter 9 moves on to general statics and starts off with a look at hypothesis testing. We run though the usual basics of z-scores, t-tests, two sample test and so on.
Chapter 10 tackles the big subject of graphics. Starting from a recipe for creating a scatter plot it adds additional elements until we have multiple plots, regression lines and labels. It then repeats the exercise for the bar chart, shows how to use box plots, histograms. Again all very basic recipes of the sort you would consult if you've forgotten how to do something or needed to do it for the first time.
Chapter 11 moves back to hardcore statistics with a look at regression and Anova.. The recipes start from simple bivariate regression and work though multiple regression in all its forms. Then on to one-way Anova and then diagnostic tests. The level of stats in this section is quite high but it does a good job of explaining what it all means.
Chapter 12 returns to general programming with a collection of useful tricks. Chapter 13 deals with more advanced topics such as optimization, principle components, clustering and resampling techniques. Depending on your background you might find some of these techniques more core than advanced. Chapter 14 focuses on time series including computing moving averages, autocorrelations and fitting an ARIMA model.
This is a good introduction to R and its basic use. The programming might be a bit too simple for the advanced programmer and the statistics sections a bit too simple for the advanced statistician - but this probably means that there is a big audience who will find some aspect of the book really useful. None of the recipes are advanced or obscure and you could find out how to achieve the same result by reading the documentation - but the book presents things in a much more digestible form. It is worth getting just to have R's approach to data types sorted clearly.
Put simply this book isn't essential for the R programmer or the statistician using R but it really will make your life easier when ever you need to do something new. So get a copy.