Author: Larry Pace
Audience: Not suitable for programmers
Reviewer: MIke James
An introduction to statistical programming sounds like a really good idea. You know your stats, so now let's program...
R really is a full programming language and you can do things with it that make it a general purpose statistics tool. With the help of some programming skills you can use it to transform your raw data into something that can be analyzed and do it automatically so that if you get some more data or need to repeat the process you can do it with a single command. In the same way the power of programming is to take you beyond the standard analyses and create composite systems that process your data - again automatically. Programming and R have a lot to offer, but for the beginner in statistics perhaps all that is of interest is getting the standard tests performed.
This book isn't written by a programmer. The author has clearly been exposed to a lot of programming ideas but programming isn't his main concern. The first chapter gives you a brief introduction to R, its history and how to get and install it. It takes you though a first session using R and you need to follow this quite carefully because a lot of ideas are first introduced here. The book has a tendency to introduce something and explain it later. However, the explanations are usually quite good so the key thing is to keep reading. One problem is that the font used is very small and the line length on each page (of the printed book) is very long. Long lines make text difficult to read but it packs in the information.
The first important things to learn in R are the data structures - the vector, matrix, list and data frame. These are introduced mostly by example, complete with equations for standard deviation and so you need to be happy with the, not too demanding, math.
Chapter 2 is called "Programming in R" and it's where you would expect to find the process of converting you into a programmer starting. At first it seems very hopeful because there is a nice discussion of how to learn programming. Then we are introduced to the flow of control - a key idea and to looping and conditionals. Unfortunately after this good start things go a little wrong. Next we have the idea of an arithmetic expression and the use of basic I/O statements but then we have a table of object types in R - what are objects and what are types?
Next we have a look at loop statements in R - starting off with the for loop. The big problem occurs when the While and Repeat loop are introduced:
"The while loop is more useful than the repeat loop, but everything you can do with while could also be accomplished with for with fewer steps."
This is simply wrong and doesn't do the reader any service. The simple fact that the while and repeat are conditional loops and the for is an enumeration loop is never mentioned. I don't know about the while loop being more useful than a repeat, but the essential difference between the two is that one has its exit point at the start and the other the exit point anywhere you care to place it. It all comes down to the minimum number of conditional repeats, not some value judgement about how useful they are. This is core programming and its completely missing.
The chapter does go on to explain how vectorization is preferable to loops and this is very true but it takes time to learn how to do it well. Its final example makes use of functions, even though they haven't yet been introduced - they are the subject of the next chapter and the chapter closes with a look at R and object-oriented ideas. This is going to go completely over the head of any beginner and R's approach to objects is so strange that it is probably going to be misunderstood by an experienced programmer. This topic would have been best left to a later chapter or dropped altogether as no further use is made of it.
The next chapter introduces functions which are central to any attempt to build any sort of program in R. Rather than introduce some simple functions with the emphasis on the all important way parameters work we have an example from the R system of a complex looking function. In fact the chapter offers no guidance on using parameters or the scope of variables which are the main things you have to master if you are going to use functions well enough to build programs. There isn't even a discussion of what determines the return value. In short this really doesn't help with understanding functions.
Chapter 4 is where the book leaves the topic of programming in R and becomes a "how to do simple stats in R" book. The topics covered, one per chapter, are: Summary Stats, Tables and Graphs, Normal probabilities, confidence intervals, hypothesis testing, One-way Anova, more advanced Anova, correlation and regression, multiple regression, logistic regression, chi-squared, nonparametric tests, sampling and resampling and bootstrapping. The final chapters are on packages and commander packages.
Overall the treatment of the stats is simple and straightforward. It is enough for you to follow the ideas if your statistics is a bit rusty, but not enough to become an expert. You are shown how to use R to perform each of the tests or model fitting, but there is very little programming with R - you simply use the provided functions.
This is not a book about programming in R. It doesn't cover most of the ideas you need to program in any language and it covers topics that would be best ignored until later. It does do a reasonably good job of introducing the use of R at the command line and for some small programming examples. It doesn't tackle the most pressing task that most R users face, i.e. reading in and cleaning up data. It does make references to making use of Excel at various point to generate or modify data, but no real advice is given on general "data processing".
Most of the book is about implementing standard statistical tests using the standard R functions. As well as some very simple things, you will also discover that the book has a tendency to dip into more complex, and perhaps controversial, topics without really providing enough discussion for the reader to follow, let along make up their own mind on the issue.
This is not a good book for the programmer or the non-programmer. At best it would serve as a haphazard introduction to R for the statistician in need of a brief refresher course on stats.