Author: Richard Cotton
Audience: Statisticians; novice programmers with a stats background
Reviewer: Mike James
Books on R often cover both programming and statistics. This one is only about the language.
R is a language that is most used for statistical computing and any book on the subject has an important choice to make about whether or not to include teaching statistics. Doing stats with R isn't difficult and mostly just a matter of finding the appropriate built in functions or packages to perform the analysis. There is an argument that says if you are trying to learn stats and R then better to learn the stats first and then master R - it will make a lot more sense.
This book doesn't make any attempt to teach you statistics via using R. It is about R as a programming language and nothing else. If you are looking for a book that teaches stats and R you are going to think it isn't very good.
If you know the stats and want to learn R as if it was a standard programming language then this might well be the book for you. However you do also need to take into account the fact that the approach is very slow and very detailed. If you are an expert programmer it might not be fast paced enough and might not go deep enough.
The first chapter starts off with a look at what R is how to install it and what IDE to use to create your first program. Chapter 2 continues in the same very basic way and shows how to use R as a scientific calculator. This is where you first start to discover some of the things that makes R different as a programming language - its data types and vector operations. The next chapter carries on with a look at the basic data types in the form of vectors, matrices and arrays. Chapter 5 introduces lists and, R's most important data type the Frame. Chapter 6 deals with environments and functions and Chapter 7 with strings and the special role they play as factors, i.e. categorical variables in statistics.
The next two chapters are about the flow of control conditionals and loops. The problem here for many programmers is that R uses vector operations that make the need to use loops much less. In fact you could say that one of the problems in switching to R is figuring out how to do things you used to do with loops as vector operations.
This first part of the book ends with a chapter on packages and one on dates and times - always a problem in any language. At the end of this first part you should have a good idea how to use R to do many standard tasks.
Part II of the book is called "The Data Analysis Workflow" and this attempts to show you have what you have learned can be used in a typical data analysis situation. Chapter 12 deals with reading data of various types - CSV, XML etc. Chapter 13 is about cleaning data including missing values and sorting.
Chapters 14 and 15 are the most statistical in the entire book. The first shows how to create plots of the data to start you exploration and next shows how to work with distributions and modeling - but simple linear regression is about as far as it gets. This is probably just enough statistics for you to recognize that R is indeed a language for statistics and after this you are going to have to find out how to do the sort of analysis you are interested in for your self.
The final two chapters of the book returns to programming matters. Chapter 16 is just called "programming" and it explains very briefly some of the standard ideas of programming - debugging, testing, object oriented programming and functional programming. Chapter 17 explains how to create your own packages, which is a bit advanced for the complete beginner.
This is quite a good book on R as long as you don't really want a book on statistics. It also doesn't really explain R in a way that would help a programmer familiar with another language to get to grips with it. For example, it doesn't explain early on that in R class isn't quite what it means in other languages and that it isn't really an object-oriented language at all - but we could debate that point for many hours. The point is R is slightly different and the book doesn't point out these differences clearly enough. Of course, if you aren't already a programmer, none of this matters.
This book is suitable for a reader well versed in statistics (or at least willing to find out about stats somewhere else) and for the novice programmer wanting a very slow and methodical explanation of how R works.
A Programmer's guide to R - Data and Objects
Getting Started with R (book review)
R Cookbook (book review)
The Art of R Programming (book review)
Beginning R: The Statistical Programming Language (book review)
R 3.0 - the Masked Marvel